From zocca.marco at gmail.com Sun Jan 3 05:59:41 2016 From: zocca.marco at gmail.com (Marco Zocca) Date: Sun, 3 Jan 2016 12:59:41 +0100 Subject: [petsc-users] installation on cloud platform Message-ID: Dear all, has anyone here tried/managed to install PETSc on e.g. Amazon AWS or the Google Compute Engine? I believe some extra components are needed for coordination, e.g. Kubernetes or Mesos (in turn requiring that the library be compiled within some sort of container, e.g. Docker), but I'm a bit lost amid all the options. Are the MPI functions (e.g. broadcast, scatter, gather ..?) used by PETSc compatible with those platforms? Thank you in advance, Marco From knepley at gmail.com Sun Jan 3 07:07:46 2016 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 3 Jan 2016 07:07:46 -0600 Subject: [petsc-users] installation on cloud platform In-Reply-To: References: Message-ID: On Sun, Jan 3, 2016 at 5:59 AM, Marco Zocca wrote: > Dear all, > > has anyone here tried/managed to install PETSc on e.g. Amazon AWS or > the Google Compute Engine? > > I believe some extra components are needed for coordination, e.g. > Kubernetes or Mesos (in turn requiring that the library be compiled > within some sort of container, e.g. Docker), but I'm a bit lost amid > all the options. > I have no idea what those even do. > Are the MPI functions (e.g. broadcast, scatter, gather ..?) used by > PETSc compatible with those platforms? > There are a bunch of papers documenting MPI performance on AWS. We just use vanilla MPI, so you request a configuration that has it installed. Matt > Thank you in advance, > > Marco > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From stali at geology.wisc.edu Mon Jan 4 09:48:05 2016 From: stali at geology.wisc.edu (Tabrez Ali) Date: Mon, 04 Jan 2016 09:48:05 -0600 Subject: [petsc-users] installation on cloud platform In-Reply-To: References: Message-ID: <568A9435.40103@geology.wisc.edu> Or you can install everything yourself. On vanilla Debian based AMIs (e.g., Ubuntu 14.04 LTS) just make sure to add "127.0.1.1 ip-x-x-x-x" to your /etc/hosts followed by $ cd $ ssh-keygen -t rsa $ cat .ssh/id_rsa.pub >> .ssh/authorized_keys After that the usual stuff, e.g., $ sudo apt-get update $ sudo apt-get upgrade $ sudo apt-get install gcc gfortran g++ cmake wget $ wget http://ftp.mcs.anl.gov/pub/petsc/release-snapshots/petsc-3.6.3.tar.gz $ ./configure --with-cc=gcc --with-fc=gfortran --download-mpich --download-fblaslapack --with-metis=1 --download-metis=1 --with-debugging=0 $ export PETSC_DIR=/home/ubuntu/petsc-3.6.3 $ export PETSC_ARCH=arch-linux2-c-opt $ make all $ export PATH=$PATH:$PETSC_DIR/$PETSC_ARCH/bin Tabrez On 01/03/2016 07:07 AM, Matthew Knepley wrote: > On Sun, Jan 3, 2016 at 5:59 AM, Marco Zocca > wrote: > > Dear all, > > has anyone here tried/managed to install PETSc on e.g. Amazon AWS or > the Google Compute Engine? > > I believe some extra components are needed for coordination, e.g. > Kubernetes or Mesos (in turn requiring that the library be compiled > within some sort of container, e.g. Docker), but I'm a bit lost amid > all the options. > > > I have no idea what those even do. > > Are the MPI functions (e.g. broadcast, scatter, gather ..?) used by > PETSc compatible with those platforms? > > > There are a bunch of papers documenting MPI performance on AWS. We > just use vanilla MPI, > so you request a configuration that has it installed. > > Matt > > Thank you in advance, > > Marco > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From zocca.marco at gmail.com Mon Jan 4 14:07:05 2016 From: zocca.marco at gmail.com (Marco Zocca) Date: Mon, 4 Jan 2016 21:07:05 +0100 Subject: [petsc-users] installation on cloud platform Message-ID: Hello Tabrez, thank you for the walkthrough; I'll give it a try as soon as possible. My main doubt was indeed regarding the host resolution and security; what does the special hostfile line do? What about that "ip-x-x-x-x" construct? Thank you and kindest regards, Marco > > Or you can install everything yourself. > > On vanilla Debian based AMIs (e.g., Ubuntu 14.04 LTS) just make sure to > add "127.0.1.1 ip-x-x-x-x" to your /etc/hosts followed by > > $ cd > $ ssh-keygen -t rsa > $ cat .ssh/id_rsa.pub >> .ssh/authorized_keys > > After that the usual stuff, e.g., > > $ sudo apt-get update > $ sudo apt-get upgrade > $ sudo apt-get install gcc gfortran g++ cmake wget > $ wget http://ftp.mcs.anl.gov/pub/petsc/release-snapshots/petsc-3.6.3.tar.gz > $ ./configure --with-cc=gcc --with-fc=gfortran --download-mpich > --download-fblaslapack --with-metis=1 --download-metis=1 --with-debugging=0 > $ export PETSC_DIR=/home/ubuntu/petsc-3.6.3 > $ export PETSC_ARCH=arch-linux2-c-opt > $ make all > $ export PATH=$PATH:$PETSC_DIR/$PETSC_ARCH/bin > > Tabrez > > > On 01/03/2016 07:07 AM, Matthew Knepley wrote: >> On Sun, Jan 3, 2016 at 5:59 AM, Marco Zocca > > wrote: >> >> Dear all, >> >> has anyone here tried/managed to install PETSc on e.g. Amazon AWS or >> the Google Compute Engine? >> >> I believe some extra components are needed for coordination, e.g. >> Kubernetes or Mesos (in turn requiring that the library be compiled >> within some sort of container, e.g. Docker), but I'm a bit lost amid >> all the options. >> >> From balay at mcs.anl.gov Mon Jan 4 14:29:11 2016 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 4 Jan 2016 14:29:11 -0600 Subject: [petsc-users] installation on cloud platform In-Reply-To: References: Message-ID: Are you interested in using 1 node [with multiple cores] - or multiple nodes - aka cluster on amazon? For a single node - I don't think any additional config should be necesary [/etc/hosts - or ssh keys]. It should be same as any laptop config. Its possible that 'hostname' is not setup properly on amazon nodes - and MPICH misbehaves. In this case - you might need any entry to /etc/hosts. Perhaps something like: echo 127.0.0.1 `hostname` >> /etc/hosts If cluster - then there might be a tutorial to setup a proper cluster with AWS. Googles gives http://cs.smith.edu/dftwiki/index.php/Tutorial:_Create_an_MPI_Cluster_on_the_Amazon_Elastic_Cloud_%28EC2%29 BTW: --download-mpich is a convient way to install MPI. [we default to device=ch3:sock]. But you might want to figureout if there is a better performing MPI for the amazon config. [perhaps mpich with nemesis works well. Or perhaps openmpi. Both are available prebuit on ubunutu...] Satish On Mon, 4 Jan 2016, Marco Zocca wrote: > Hello Tabrez, > > thank you for the walkthrough; I'll give it a try as soon as possible. > > My main doubt was indeed regarding the host resolution and security; > > what does the special hostfile line do? What about that "ip-x-x-x-x" > construct? > > Thank you and kindest regards, > > Marco > > > > > > > > Or you can install everything yourself. > > > > On vanilla Debian based AMIs (e.g., Ubuntu 14.04 LTS) just make sure to > > add "127.0.1.1 ip-x-x-x-x" to your /etc/hosts followed by > > > > $ cd > > $ ssh-keygen -t rsa > > $ cat .ssh/id_rsa.pub >> .ssh/authorized_keys > > > > After that the usual stuff, e.g., > > > > $ sudo apt-get update > > $ sudo apt-get upgrade > > $ sudo apt-get install gcc gfortran g++ cmake wget > > $ wget http://ftp.mcs.anl.gov/pub/petsc/release-snapshots/petsc-3.6.3.tar.gz > > $ ./configure --with-cc=gcc --with-fc=gfortran --download-mpich > > --download-fblaslapack --with-metis=1 --download-metis=1 --with-debugging=0 > > $ export PETSC_DIR=/home/ubuntu/petsc-3.6.3 > > $ export PETSC_ARCH=arch-linux2-c-opt > > $ make all > > $ export PATH=$PATH:$PETSC_DIR/$PETSC_ARCH/bin > > > > Tabrez > > > > > > On 01/03/2016 07:07 AM, Matthew Knepley wrote: > >> On Sun, Jan 3, 2016 at 5:59 AM, Marco Zocca >> > wrote: > >> > >> Dear all, > >> > >> has anyone here tried/managed to install PETSc on e.g. Amazon AWS or > >> the Google Compute Engine? > >> > >> I believe some extra components are needed for coordination, e.g. > >> Kubernetes or Mesos (in turn requiring that the library be compiled > >> within some sort of container, e.g. Docker), but I'm a bit lost amid > >> all the options. > >> > > >> > From stali at geology.wisc.edu Mon Jan 4 14:33:15 2016 From: stali at geology.wisc.edu (Tabrez Ali) Date: Mon, 04 Jan 2016 14:33:15 -0600 Subject: [petsc-users] installation on cloud platform In-Reply-To: References: Message-ID: <568AD70B.9010707@geology.wisc.edu> Its just the hostname. Without it mpiexec was hanging for n>1. Might not be an issue with non Debian AMIs (I didn't try). Tabrez On 01/04/2016 02:07 PM, Marco Zocca wrote: > Hello Tabrez, > > thank you for the walkthrough; I'll give it a try as soon as possible. > > My main doubt was indeed regarding the host resolution and security; > > what does the special hostfile line do? What about that "ip-x-x-x-x" > construct? > > Thank you and kindest regards, > > Marco > > > > >> Or you can install everything yourself. >> >> On vanilla Debian based AMIs (e.g., Ubuntu 14.04 LTS) just make sure to >> add "127.0.1.1 ip-x-x-x-x" to your /etc/hosts followed by >> >> $ cd >> $ ssh-keygen -t rsa >> $ cat .ssh/id_rsa.pub>> .ssh/authorized_keys >> >> After that the usual stuff, e.g., >> >> $ sudo apt-get update >> $ sudo apt-get upgrade >> $ sudo apt-get install gcc gfortran g++ cmake wget >> $ wget http://ftp.mcs.anl.gov/pub/petsc/release-snapshots/petsc-3.6.3.tar.gz >> $ ./configure --with-cc=gcc --with-fc=gfortran --download-mpich >> --download-fblaslapack --with-metis=1 --download-metis=1 --with-debugging=0 >> $ export PETSC_DIR=/home/ubuntu/petsc-3.6.3 >> $ export PETSC_ARCH=arch-linux2-c-opt >> $ make all >> $ export PATH=$PATH:$PETSC_DIR/$PETSC_ARCH/bin >> >> Tabrez >> >> >> On 01/03/2016 07:07 AM, Matthew Knepley wrote: >>> On Sun, Jan 3, 2016 at 5:59 AM, Marco Zocca>> > wrote: >>> >>> Dear all, >>> >>> has anyone here tried/managed to install PETSc on e.g. Amazon AWS or >>> the Google Compute Engine? >>> >>> I believe some extra components are needed for coordination, e.g. >>> Kubernetes or Mesos (in turn requiring that the library be compiled >>> within some sort of container, e.g. Docker), but I'm a bit lost amid >>> all the options. >>> From bsmith at mcs.anl.gov Mon Jan 4 14:49:51 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 4 Jan 2016 14:49:51 -0600 Subject: [petsc-users] installation on cloud platform In-Reply-To: <568A9435.40103@geology.wisc.edu> References: <568A9435.40103@geology.wisc.edu> Message-ID: <4E2A2D98-E904-47F9-A2A8-CF3611C54EC2@mcs.anl.gov> Tabrez, This is great, thanks for sending it. Do you mind if Satish adds it to the http://www.mcs.anl.gov/petsc/documentation/installation.html file as an example? Barry > On Jan 4, 2016, at 9:48 AM, Tabrez Ali wrote: > > Or you can install everything yourself. > > On vanilla Debian based AMIs (e.g., Ubuntu 14.04 LTS) just make sure to add "127.0.1.1 ip-x-x-x-x" to your /etc/hosts followed by > > $ cd > $ ssh-keygen -t rsa > $ cat .ssh/id_rsa.pub >> .ssh/authorized_keys > > After that the usual stuff, e.g., > > $ sudo apt-get update > $ sudo apt-get upgrade > $ sudo apt-get install gcc gfortran g++ cmake wget > $ wget http://ftp.mcs.anl.gov/pub/petsc/release-snapshots/petsc-3.6.3.tar.gz > $ ./configure --with-cc=gcc --with-fc=gfortran --download-mpich --download-fblaslapack --with-metis=1 --download-metis=1 --with-debugging=0 > $ export PETSC_DIR=/home/ubuntu/petsc-3.6.3 > $ export PETSC_ARCH=arch-linux2-c-opt > $ make all > $ export PATH=$PATH:$PETSC_DIR/$PETSC_ARCH/bin > > Tabrez > > > On 01/03/2016 07:07 AM, Matthew Knepley wrote: >> On Sun, Jan 3, 2016 at 5:59 AM, Marco Zocca wrote: >> Dear all, >> >> has anyone here tried/managed to install PETSc on e.g. Amazon AWS or >> the Google Compute Engine? >> >> I believe some extra components are needed for coordination, e.g. >> Kubernetes or Mesos (in turn requiring that the library be compiled >> within some sort of container, e.g. Docker), but I'm a bit lost amid >> all the options. >> >> I have no idea what those even do. >> >> Are the MPI functions (e.g. broadcast, scatter, gather ..?) used by >> PETSc compatible with those platforms? >> >> There are a bunch of papers documenting MPI performance on AWS. We just use vanilla MPI, >> so you request a configuration that has it installed. >> >> Matt >> >> Thank you in advance, >> >> Marco >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener > From stali at geology.wisc.edu Mon Jan 4 15:08:59 2016 From: stali at geology.wisc.edu (Tabrez Ali) Date: Mon, 04 Jan 2016 15:08:59 -0600 Subject: [petsc-users] installation on cloud platform In-Reply-To: <4E2A2D98-E904-47F9-A2A8-CF3611C54EC2@mcs.anl.gov> References: <568A9435.40103@geology.wisc.edu> <4E2A2D98-E904-47F9-A2A8-CF3611C54EC2@mcs.anl.gov> Message-ID: <568ADF6B.4050804@geology.wisc.edu> Yes, of course. Although additional steps might be needed for enabling GPU support on GPU enabled instances (hard to find otherwise). Regards, Tabrez On 01/04/2016 02:49 PM, Barry Smith wrote: > Tabrez, > > This is great, thanks for sending it. Do you mind if Satish adds it to the http://www.mcs.anl.gov/petsc/documentation/installation.html file as an example? > > Barry > >> On Jan 4, 2016, at 9:48 AM, Tabrez Ali wrote: >> >> Or you can install everything yourself. >> >> On vanilla Debian based AMIs (e.g., Ubuntu 14.04 LTS) just make sure to add "127.0.1.1 ip-x-x-x-x" to your /etc/hosts followed by >> >> $ cd >> $ ssh-keygen -t rsa >> $ cat .ssh/id_rsa.pub>> .ssh/authorized_keys >> >> After that the usual stuff, e.g., >> >> $ sudo apt-get update >> $ sudo apt-get upgrade >> $ sudo apt-get install gcc gfortran g++ cmake wget >> $ wget http://ftp.mcs.anl.gov/pub/petsc/release-snapshots/petsc-3.6.3.tar.gz >> $ ./configure --with-cc=gcc --with-fc=gfortran --download-mpich --download-fblaslapack --with-metis=1 --download-metis=1 --with-debugging=0 >> $ export PETSC_DIR=/home/ubuntu/petsc-3.6.3 >> $ export PETSC_ARCH=arch-linux2-c-opt >> $ make all >> $ export PATH=$PATH:$PETSC_DIR/$PETSC_ARCH/bin >> >> Tabrez >> >> >> On 01/03/2016 07:07 AM, Matthew Knepley wrote: >>> On Sun, Jan 3, 2016 at 5:59 AM, Marco Zocca wrote: >>> Dear all, >>> >>> has anyone here tried/managed to install PETSc on e.g. Amazon AWS or >>> the Google Compute Engine? >>> >>> I believe some extra components are needed for coordination, e.g. >>> Kubernetes or Mesos (in turn requiring that the library be compiled >>> within some sort of container, e.g. Docker), but I'm a bit lost amid >>> all the options. >>> >>> I have no idea what those even do. >>> >>> Are the MPI functions (e.g. broadcast, scatter, gather ..?) used by >>> PETSc compatible with those platforms? >>> >>> There are a bunch of papers documenting MPI performance on AWS. We just use vanilla MPI, >>> so you request a configuration that has it installed. >>> >>> Matt >>> >>> Thank you in advance, >>> >>> Marco >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>> -- Norbert Wiener From zocca.marco at gmail.com Mon Jan 4 16:48:57 2016 From: zocca.marco at gmail.com (Marco Zocca) Date: Mon, 4 Jan 2016 23:48:57 +0100 Subject: [petsc-users] installation on cloud platform In-Reply-To: References: Message-ID: Hi Satish, thank you for the input; I was really looking for something that lets one abstract out "where" the code lives, so as to possibly work both in a single-node and cluster setting. This is why a "container" approach sounds meaningful. Configure once, run many. For message-passing codes such as our case, there's this `docker-compose` [ https://docs.docker.com/compose ] which aggregates the compilation and network setup steps. I have found this approach [ http://qnib.org/2015/04/14/qnibterminal-mpi-hello-world/ ] that runs an MPI benchmark using `docker-compose`. The "Consul" library takes care of the DNS resolution as far as I can tell, and SLURM is the queue manager. The downsides: it's yet another third party tool (albeit a widespread one), with yet another scripting syntax (very much similar but incompatible with shell script). Latencies will be much larger, I expect, and also one should pay a much higher attention to security (building on top of someone else's images, freely available from the Docker Hub, is tantamount to running arbitrary code at compile time). There are however trusted Docker images, containing various combinations of Linux distributions and software. I'm very much looking forward to continuing this discussion; Kind regards, Marco > Are you interested in using 1 node [with multiple cores] - or multiple > nodes - aka cluster on amazon? > > For a single node - I don't think any additional config should be > necesary [/etc/hosts - or ssh keys]. It should be same as any laptop > config. > > Its possible that 'hostname' is not setup properly on amazon nodes - and > MPICH misbehaves. In this case - you might need any entry to > /etc/hosts. Perhaps something like: > > echo 127.0.0.1 `hostname` >> /etc/hosts > > If cluster - then there might be a tutorial to setup a proper cluster with AWS. Googles gives > http://cs.smith.edu/dftwiki/index.php/Tutorial:_Create_an_MPI_Cluster_on_the_Amazon_Elastic_Cloud_%28EC2%29 > > BTW: --download-mpich is a convient way to install MPI. [we default to > device=ch3:sock]. But you might want to figureout if there is a better > performing MPI for the amazon config. [perhaps mpich with nemesis > works well. Or perhaps openmpi. Both are available prebuit on > ubunutu...] > > Satish > >> >> >> >> has anyone here tried/managed to install PETSc on e.g. Amazon AWS or >> >> the Google Compute Engine? >> >> >> >> I believe some extra components are needed for coordination, e.g. >> >> Kubernetes or Mesos (in turn requiring that the library be compiled >> >> within some sort of container, e.g. Docker), but I'm a bit lost amid >> >> all the options. From bsmith at mcs.anl.gov Mon Jan 4 17:10:42 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 4 Jan 2016 17:10:42 -0600 Subject: [petsc-users] installation on cloud platform In-Reply-To: References: Message-ID: <49D88394-EF0B-405E-B1F8-342FA35948B8@mcs.anl.gov> Marco, There are competitors to the "regular" cloud machines like Amazon focused specifically on HPC that "come with" MPI and use high speed networks so are much like if you built a custom HPC machine. For example http://www.rescale.com/software/ I don't have direct experiences with any of these systems but suspect that if you really want to scale to a bunch of nodes you are likely far better off with the HPC cloud servers than with general purpose systems even if they have a higher cost (you get what you pay for). Barry > On Jan 4, 2016, at 4:48 PM, Marco Zocca wrote: > > Hi Satish, > > thank you for the input; > > I was really looking for something that lets one abstract out "where" > the code lives, so as to possibly work both in a single-node and > cluster setting. > > This is why a "container" approach sounds meaningful. Configure once, run many. > For message-passing codes such as our case, there's this > `docker-compose` [ https://docs.docker.com/compose ] which aggregates > the compilation and network setup steps. > I have found this approach [ > http://qnib.org/2015/04/14/qnibterminal-mpi-hello-world/ ] that runs > an MPI benchmark using `docker-compose`. The "Consul" library takes > care of the DNS resolution as far as I can tell, and SLURM is the > queue manager. > > The downsides: it's yet another third party tool (albeit a widespread > one), with yet another scripting syntax (very much similar but > incompatible with shell script). > Latencies will be much larger, I expect, and also one should pay a > much higher attention to security (building on top of someone else's > images, freely available from the Docker Hub, is tantamount to running > arbitrary code at compile time). > > There are however trusted Docker images, containing various > combinations of Linux distributions and software. > > I'm very much looking forward to continuing this discussion; > > Kind regards, > > Marco > > > >> Are you interested in using 1 node [with multiple cores] - or multiple >> nodes - aka cluster on amazon? >> >> For a single node - I don't think any additional config should be >> necesary [/etc/hosts - or ssh keys]. It should be same as any laptop >> config. >> >> Its possible that 'hostname' is not setup properly on amazon nodes - and >> MPICH misbehaves. In this case - you might need any entry to >> /etc/hosts. Perhaps something like: >> >> echo 127.0.0.1 `hostname` >> /etc/hosts >> >> If cluster - then there might be a tutorial to setup a proper cluster with AWS. Googles gives >> http://cs.smith.edu/dftwiki/index.php/Tutorial:_Create_an_MPI_Cluster_on_the_Amazon_Elastic_Cloud_%28EC2%29 >> >> BTW: --download-mpich is a convient way to install MPI. [we default to >> device=ch3:sock]. But you might want to figureout if there is a better >> performing MPI for the amazon config. [perhaps mpich with nemesis >> works well. Or perhaps openmpi. Both are available prebuit on >> ubunutu...] >> >> Satish >> >>>>> >>>>> has anyone here tried/managed to install PETSc on e.g. Amazon AWS or >>>>> the Google Compute Engine? >>>>> >>>>> I believe some extra components are needed for coordination, e.g. >>>>> Kubernetes or Mesos (in turn requiring that the library be compiled >>>>> within some sort of container, e.g. Docker), but I'm a bit lost amid >>>>> all the options. From zonexo at gmail.com Mon Jan 4 19:28:32 2016 From: zonexo at gmail.com (TAY wee-beng) Date: Tue, 5 Jan 2016 09:28:32 +0800 Subject: [petsc-users] Segmentation error when calling PetscBarrier Message-ID: <568B1C40.30600@gmail.com> Hi, I am trying to debug my CFD Fortran MPI code. I tried to add: call PetscBarrier(PETSC_NULL_OBJECT); if (myid==0) print *, "xx" to do a rough check on where the error is. xx is a different number for each line. I found that whenever I add this line, the code aborts with segmentation error. I am using the Intel compiler. Is there any error with my usage? -- Thank you Yours sincerely, TAY wee-beng From knepley at gmail.com Mon Jan 4 19:31:30 2016 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 4 Jan 2016 19:31:30 -0600 Subject: [petsc-users] Segmentation error when calling PetscBarrier In-Reply-To: <568B1C40.30600@gmail.com> References: <568B1C40.30600@gmail.com> Message-ID: On Mon, Jan 4, 2016 at 7:28 PM, TAY wee-beng wrote: > Hi, > > I am trying to debug my CFD Fortran MPI code. I tried to add: > > call PetscBarrier(PETSC_NULL_OBJECT); if (myid==0) print *, "xx" > > to do a rough check on where the error is. xx is a different number for > each line. > > I found that whenever I add this line, the code aborts with segmentation > error. > > I am using the Intel compiler. Is there any error with my usage? I don't think this makes sense since it will try to pull a communicator out of the NULL_OBJECT. Matt > > -- > Thank you > > Yours sincerely, > > TAY wee-beng > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From zonexo at gmail.com Mon Jan 4 19:41:28 2016 From: zonexo at gmail.com (TAY wee-beng) Date: Tue, 5 Jan 2016 09:41:28 +0800 Subject: [petsc-users] Segmentation error when calling PetscBarrier In-Reply-To: References: <568B1C40.30600@gmail.com> Message-ID: <568B1F48.1050505@gmail.com> Hi Matt, In that case, what would be a good or accurate way to debug the MPI code? I'm trying to determine where the fault lies. Thank you Yours sincerely, TAY wee-beng On 5/1/2016 9:31 AM, Matthew Knepley wrote: > On Mon, Jan 4, 2016 at 7:28 PM, TAY wee-beng > wrote: > > Hi, > > I am trying to debug my CFD Fortran MPI code. I tried to add: > > call PetscBarrier(PETSC_NULL_OBJECT); if (myid==0) print *, "xx" > > to do a rough check on where the error is. xx is a different > number for each line. > > I found that whenever I add this line, the code aborts with > segmentation error. > > I am using the Intel compiler. Is there any error with my usage? > > > I don't think this makes sense since it will try to pull a > communicator out of the NULL_OBJECT. > > Matt > > > -- > Thank you > > Yours sincerely, > > TAY wee-beng > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Jan 4 19:51:47 2016 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 4 Jan 2016 19:51:47 -0600 Subject: [petsc-users] Segmentation error when calling PetscBarrier In-Reply-To: <568B1F48.1050505@gmail.com> References: <568B1C40.30600@gmail.com> <568B1F48.1050505@gmail.com> Message-ID: On Mon, Jan 4, 2016 at 7:41 PM, TAY wee-beng wrote: > Hi Matt, > > In that case, what would be a good or accurate way to debug the MPI code? > I'm trying to determine where the fault lies. > Is there a problem with -start_in_debugger? Also valgrind --trace-children=yes is great. Thanks, Matt > Thank you > > Yours sincerely, > > TAY wee-beng > > On 5/1/2016 9:31 AM, Matthew Knepley wrote: > > On Mon, Jan 4, 2016 at 7:28 PM, TAY wee-beng wrote: > >> Hi, >> >> I am trying to debug my CFD Fortran MPI code. I tried to add: >> >> call PetscBarrier(PETSC_NULL_OBJECT); if (myid==0) print *, "xx" >> >> to do a rough check on where the error is. xx is a different number for >> each line. >> >> I found that whenever I add this line, the code aborts with segmentation >> error. >> >> I am using the Intel compiler. Is there any error with my usage? > > > I don't think this makes sense since it will try to pull a communicator > out of the NULL_OBJECT. > > Matt > > >> >> -- >> Thank you >> >> Yours sincerely, >> >> TAY wee-beng >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Mon Jan 4 20:32:05 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 4 Jan 2016 20:32:05 -0600 Subject: [petsc-users] Segmentation error when calling PetscBarrier In-Reply-To: <568B1C40.30600@gmail.com> References: <568B1C40.30600@gmail.com> Message-ID: <26BF756B-7B4E-4630-B2BC-B95A68F60EDC@mcs.anl.gov> You are missing the ,ierr which BTW you would have caught immediately if you used the debugger. Barry The debugger is not a scary monster, it is one of your best friends. > On Jan 4, 2016, at 7:28 PM, TAY wee-beng wrote: > > Hi, > > I am trying to debug my CFD Fortran MPI code. I tried to add: > > call PetscBarrier(PETSC_NULL_OBJECT); if (myid==0) print *, "xx" > > to do a rough check on where the error is. xx is a different number for each line. > > I found that whenever I add this line, the code aborts with segmentation error. > > I am using the Intel compiler. Is there any error with my usage? > > -- > Thank you > > Yours sincerely, > > TAY wee-beng > From zonexo at gmail.com Mon Jan 4 21:34:23 2016 From: zonexo at gmail.com (TAY wee-beng) Date: Tue, 5 Jan 2016 11:34:23 +0800 Subject: [petsc-users] Segmentation error when calling PetscBarrier In-Reply-To: <26BF756B-7B4E-4630-B2BC-B95A68F60EDC@mcs.anl.gov> References: <568B1C40.30600@gmail.com> <26BF756B-7B4E-4630-B2BC-B95A68F60EDC@mcs.anl.gov> Message-ID: <568B39BF.7020108@gmail.com> Hi, Ya sorry, that should be the tool to use. Was having some problems using MPI with the debugger. I managed to run it as a serial code now. My problem is that on the cluster, it works with the gnu fortran. But using Intel compiler, I get segmentation error at some point when running the opt ver. The debug ver works fine. I am trying to find if the error is due to a bug in Intel, or it's my own problem. Another thing is that on another cluster, the Intel opt ver works, but that's using a newer ver of the compiler. I hope to get the Intel one working if possible, because it's about 30% faster. So now coming back to the gdb, it worked fine using the debug ver of the code. But when using the opt ver, it only shows segmentation fault. When the X-window appears, the code has already exited. I am already using -g during compile. So how should I debug it? The error seems to be when I tried to call DMDAVecRestoreArrayF90, although I still need to be more certain. Thank you Yours sincerely, TAY wee-beng On 5/1/2016 10:32 AM, Barry Smith wrote: > You are missing the ,ierr which BTW you would have caught immediately if you used the debugger. > > Barry > > The debugger is not a scary monster, it is one of your best friends. > >> On Jan 4, 2016, at 7:28 PM, TAY wee-beng wrote: >> >> Hi, >> >> I am trying to debug my CFD Fortran MPI code. I tried to add: >> >> call PetscBarrier(PETSC_NULL_OBJECT); if (myid==0) print *, "xx" >> >> to do a rough check on where the error is. xx is a different number for each line. >> >> I found that whenever I add this line, the code aborts with segmentation error. >> >> I am using the Intel compiler. Is there any error with my usage? >> >> -- >> Thank you >> >> Yours sincerely, >> >> TAY wee-beng >> From bsmith at mcs.anl.gov Mon Jan 4 22:16:19 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 4 Jan 2016 22:16:19 -0600 Subject: [petsc-users] Segmentation error when calling PetscBarrier In-Reply-To: <568B39BF.7020108@gmail.com> References: <568B1C40.30600@gmail.com> <26BF756B-7B4E-4630-B2BC-B95A68F60EDC@mcs.anl.gov> <568B39BF.7020108@gmail.com> Message-ID: <348325AD-BEDD-4F06-A040-0A17B8C461BA@mcs.anl.gov> You could try instead -on_error_attach_debugger and see if that is better at catching the code when the error occurs If the code runs valgrind clean (when it runs) then I would say it is reasonable for you to conclude that the current trouble is due to a Intel optimization error and not debug further, Barry > On Jan 4, 2016, at 9:34 PM, TAY wee-beng wrote: > > Hi, > > Ya sorry, that should be the tool to use. Was having some problems using MPI with the debugger. > > I managed to run it as a serial code now. > > My problem is that on the cluster, it works with the gnu fortran. But using Intel compiler, I get segmentation error at some point when running the opt ver. The debug ver works fine. > > I am trying to find if the error is due to a bug in Intel, or it's my own problem. > > Another thing is that on another cluster, the Intel opt ver works, but that's using a newer ver of the compiler. > > I hope to get the Intel one working if possible, because it's about 30% faster. > > So now coming back to the gdb, it worked fine using the debug ver of the code. But when using the opt ver, it only shows segmentation fault. When the X-window appears, the code has already exited. I am already using -g during compile. > > So how should I debug it? The error seems to be when I tried to call DMDAVecRestoreArrayF90, although I still need to be more certain. > > > Thank you > > Yours sincerely, > > TAY wee-beng > > On 5/1/2016 10:32 AM, Barry Smith wrote: >> You are missing the ,ierr which BTW you would have caught immediately if you used the debugger. >> >> Barry >> >> The debugger is not a scary monster, it is one of your best friends. >> >>> On Jan 4, 2016, at 7:28 PM, TAY wee-beng wrote: >>> >>> Hi, >>> >>> I am trying to debug my CFD Fortran MPI code. I tried to add: >>> >>> call PetscBarrier(PETSC_NULL_OBJECT); if (myid==0) print *, "xx" >>> >>> to do a rough check on where the error is. xx is a different number for each line. >>> >>> I found that whenever I add this line, the code aborts with segmentation error. >>> >>> I am using the Intel compiler. Is there any error with my usage? >>> >>> -- >>> Thank you >>> >>> Yours sincerely, >>> >>> TAY wee-beng >>> > From balay at mcs.anl.gov Mon Jan 4 22:23:45 2016 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 4 Jan 2016 22:23:45 -0600 Subject: [petsc-users] Segmentation error when calling PetscBarrier In-Reply-To: <348325AD-BEDD-4F06-A040-0A17B8C461BA@mcs.anl.gov> References: <568B1C40.30600@gmail.com> <26BF756B-7B4E-4630-B2BC-B95A68F60EDC@mcs.anl.gov> <568B39BF.7020108@gmail.com> <348325AD-BEDD-4F06-A040-0A17B8C461BA@mcs.anl.gov> Message-ID: verylikely valgrind will find bugs in code.. http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > When the X-window appears, the code has already exited try --debugger_pause 60 [or higher if it takes longer to swawn the xterms] Satish On Mon, 4 Jan 2016, Barry Smith wrote: > > You could try instead -on_error_attach_debugger and see if that is better at catching the code when the error occurs > > If the code runs valgrind clean (when it runs) then I would say it is reasonable for you to conclude that the current trouble is due to a Intel optimization error and not debug further, > > Barry > > > On Jan 4, 2016, at 9:34 PM, TAY wee-beng wrote: > > > > Hi, > > > > Ya sorry, that should be the tool to use. Was having some problems using MPI with the debugger. > > > > I managed to run it as a serial code now. > > > > My problem is that on the cluster, it works with the gnu fortran. But using Intel compiler, I get segmentation error at some point when running the opt ver. The debug ver works fine. > > > > I am trying to find if the error is due to a bug in Intel, or it's my own problem. > > > > Another thing is that on another cluster, the Intel opt ver works, but that's using a newer ver of the compiler. > > > > I hope to get the Intel one working if possible, because it's about 30% faster. > > > > So now coming back to the gdb, it worked fine using the debug ver of the code. But when using the opt ver, it only shows segmentation fault. When the X-window appears, the code has already exited. I am already using -g during compile. > > > > So how should I debug it? The error seems to be when I tried to call DMDAVecRestoreArrayF90, although I still need to be more certain. > > > > > > Thank you > > > > Yours sincerely, > > > > TAY wee-beng > > > > On 5/1/2016 10:32 AM, Barry Smith wrote: > >> You are missing the ,ierr which BTW you would have caught immediately if you used the debugger. > >> > >> Barry > >> > >> The debugger is not a scary monster, it is one of your best friends. > >> > >>> On Jan 4, 2016, at 7:28 PM, TAY wee-beng wrote: > >>> > >>> Hi, > >>> > >>> I am trying to debug my CFD Fortran MPI code. I tried to add: > >>> > >>> call PetscBarrier(PETSC_NULL_OBJECT); if (myid==0) print *, "xx" > >>> > >>> to do a rough check on where the error is. xx is a different number for each line. > >>> > >>> I found that whenever I add this line, the code aborts with segmentation error. > >>> > >>> I am using the Intel compiler. Is there any error with my usage? > >>> > >>> -- > >>> Thank you > >>> > >>> Yours sincerely, > >>> > >>> TAY wee-beng > >>> > > > > From amneetb at live.unc.edu Tue Jan 5 17:14:16 2016 From: amneetb at live.unc.edu (Bhalla, Amneet Pal S) Date: Tue, 5 Jan 2016 23:14:16 +0000 Subject: [petsc-users] Calling MatDestroy on MatGetDiagonalBlock Message-ID: Hi Folks, Is it safe to call MatDestroy on the sequential matrix returned by MatGetDiagonalBlock() after it?s no longer used? Thanks, ? Amneet ===================================================== Amneet Bhalla Postdoctoral Research Associate Department of Mathematics and McAllister Heart Institute University of North Carolina at Chapel Hill Email: amneet at unc.edu Web: https://abhalla.web.unc.edu ===================================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.mayhem23 at gmail.com Tue Jan 5 17:20:14 2016 From: dave.mayhem23 at gmail.com (Dave May) Date: Wed, 6 Jan 2016 00:20:14 +0100 Subject: [petsc-users] Calling MatDestroy on MatGetDiagonalBlock In-Reply-To: References: Message-ID: The manpage http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetDiagonalBlock.html indicates the reference counter on the returned matrix (a) isn't incremented. This statement would imply that in the absence of calling PetscObjectReference() yourself, you should not call MatDestroy() on the matrix returned. If you do call MatDestroy(), a double free will occur when you call MatDestroy() on the parent matrix from which you pulled the block matrix out of. Cheers, Dave On 6 January 2016 at 00:14, Bhalla, Amneet Pal S wrote: > Hi Folks, > > Is it safe to call MatDestroy on the sequential matrix returned by > MatGetDiagonalBlock() after it?s no longer used? > > > Thanks, > > ? Amneet > ===================================================== > Amneet Bhalla > Postdoctoral Research Associate > Department of Mathematics and McAllister Heart Institute > University of North Carolina at Chapel Hill > Email: amneet at unc.edu > Web: https://abhalla.web.unc.edu > ===================================================== > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Tue Jan 5 17:21:01 2016 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 5 Jan 2016 17:21:01 -0600 Subject: [petsc-users] Calling MatDestroy on MatGetDiagonalBlock In-Reply-To: References: Message-ID: Looking at example usages in src/ksp/pc/impls/bjacobi/bjacobi.c or src/ksp/pc/impls/gasm/gasm.c - there is no call to MatDestroy.. [or MatRestoreDiagonalBlock] Satish On Tue, 5 Jan 2016, Bhalla, Amneet Pal S wrote: > Hi Folks, > > Is it safe to call MatDestroy on the sequential matrix returned by MatGetDiagonalBlock() after it?s no longer used? > > > Thanks, > > ? Amneet > ===================================================== > Amneet Bhalla > Postdoctoral Research Associate > Department of Mathematics and McAllister Heart Institute > University of North Carolina at Chapel Hill > Email: amneet at unc.edu > Web: https://abhalla.web.unc.edu > ===================================================== > > From amneetb at live.unc.edu Tue Jan 5 17:24:33 2016 From: amneetb at live.unc.edu (Bhalla, Amneet Pal S) Date: Tue, 5 Jan 2016 23:24:33 +0000 Subject: [petsc-users] Calling MatDestroy on MatGetDiagonalBlock In-Reply-To: References: Message-ID: <99C59215-FEAD-4696-AE56-F00DF1819FF5@ad.unc.edu> On Jan 5, 2016, at 3:20 PM, Dave May > wrote: This statement would imply that in the absence of calling PetscObjectReference() yourself, you should not call MatDestroy() on the matrix returned Got it. Thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue Jan 5 17:32:18 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 5 Jan 2016 17:32:18 -0600 Subject: [petsc-users] Calling MatDestroy on MatGetDiagonalBlock In-Reply-To: <99C59215-FEAD-4696-AE56-F00DF1819FF5@ad.unc.edu> References: <99C59215-FEAD-4696-AE56-F00DF1819FF5@ad.unc.edu> Message-ID: In general XXXGetYYY() do not increase the reference count and you should not destroy. Some XXXGetYYY() have a corresponding XXXRestoreYYY(). XXXCreateYYY() DO increase the reference count and should have destroy called. So get -> no destroy create -> destroy Barry In the past we were not consistent between the usages but now I think it is consistent. > On Jan 5, 2016, at 5:24 PM, Bhalla, Amneet Pal S wrote: > > > >> On Jan 5, 2016, at 3:20 PM, Dave May wrote: >> >> This statement would imply that in the absence of calling PetscObjectReference() yourself, you should not call MatDestroy() on the matrix returned > > Got it. Thanks! > From amneetb at live.unc.edu Tue Jan 5 17:46:58 2016 From: amneetb at live.unc.edu (Bhalla, Amneet Pal S) Date: Tue, 5 Jan 2016 23:46:58 +0000 Subject: [petsc-users] Calling MatDestroy on MatGetDiagonalBlock In-Reply-To: References: <99C59215-FEAD-4696-AE56-F00DF1819FF5@ad.unc.edu> Message-ID: <8A228187-1D01-4EEA-8607-46B4AC87A2AE@ad.unc.edu> On Jan 5, 2016, at 3:32 PM, Barry Smith > wrote: So get -> no destroy create -> destroy Is MatGetSubMatrices() exception to this rule? The manual says to call the destroy() function after done with it. http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetSubMatrices.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue Jan 5 17:53:40 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 5 Jan 2016 17:53:40 -0600 Subject: [petsc-users] Calling MatDestroy on MatGetDiagonalBlock In-Reply-To: <8A228187-1D01-4EEA-8607-46B4AC87A2AE@ad.unc.edu> References: <99C59215-FEAD-4696-AE56-F00DF1819FF5@ad.unc.edu> <8A228187-1D01-4EEA-8607-46B4AC87A2AE@ad.unc.edu> Message-ID: <487B7BAC-B727-45BA-8B7C-3EF33145E60F@mcs.anl.gov> Yeah, looks like MatGetSubMatrix() and MatGetSubMatrices() didn't get renamed to the "current" approach. Barry > On Jan 5, 2016, at 5:46 PM, Bhalla, Amneet Pal S wrote: > > > >> On Jan 5, 2016, at 3:32 PM, Barry Smith wrote: >> >> So get -> no destroy >> create -> destroy > > Is MatGetSubMatrices() exception to this rule? The manual says to call the destroy() function after done with it. > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetSubMatrices.html From knepley at gmail.com Tue Jan 5 19:12:32 2016 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 5 Jan 2016 19:12:32 -0600 Subject: [petsc-users] Calling MatDestroy on MatGetDiagonalBlock In-Reply-To: <487B7BAC-B727-45BA-8B7C-3EF33145E60F@mcs.anl.gov> References: <99C59215-FEAD-4696-AE56-F00DF1819FF5@ad.unc.edu> <8A228187-1D01-4EEA-8607-46B4AC87A2AE@ad.unc.edu> <487B7BAC-B727-45BA-8B7C-3EF33145E60F@mcs.anl.gov> Message-ID: How devastating would it be for Deal.II if we renamed them MatCreateSubMatrix()? ;) Matt On Tue, Jan 5, 2016 at 5:53 PM, Barry Smith wrote: > > Yeah, looks like MatGetSubMatrix() and MatGetSubMatrices() didn't get > renamed to the "current" approach. > > Barry > > > On Jan 5, 2016, at 5:46 PM, Bhalla, Amneet Pal S > wrote: > > > > > > > >> On Jan 5, 2016, at 3:32 PM, Barry Smith wrote: > >> > >> So get -> no destroy > >> create -> destroy > > > > Is MatGetSubMatrices() exception to this rule? The manual says to call > the destroy() function after done with it. > > > > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetSubMatrices.html > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Tue Jan 5 19:29:51 2016 From: jed at jedbrown.org (Jed Brown) Date: Tue, 05 Jan 2016 18:29:51 -0700 Subject: [petsc-users] Calling MatDestroy on MatGetDiagonalBlock In-Reply-To: References: <99C59215-FEAD-4696-AE56-F00DF1819FF5@ad.unc.edu> <8A228187-1D01-4EEA-8607-46B4AC87A2AE@ad.unc.edu> <487B7BAC-B727-45BA-8B7C-3EF33145E60F@mcs.anl.gov> Message-ID: <87twmr3374.fsf@jedbrown.org> Matthew Knepley writes: > How devastating would it be for Deal.II if we renamed them > MatCreateSubMatrix()? ;) I know it's consistent with respect to reference counting semantics, but it might be harder for new users to find when searching the docs. I have no data either way. I recall discussing years ago that having paired MatGetSubMatrix/MatRestoreSubMatrix would simplify bookkeeping in PCFieldSplit and some common user code. If the name is changed, it's easy to support the deprecated name from C. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From bsmith at mcs.anl.gov Tue Jan 5 19:36:47 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 5 Jan 2016 19:36:47 -0600 Subject: [petsc-users] Calling MatDestroy on MatGetDiagonalBlock In-Reply-To: <87twmr3374.fsf@jedbrown.org> References: <99C59215-FEAD-4696-AE56-F00DF1819FF5@ad.unc.edu> <8A228187-1D01-4EEA-8607-46B4AC87A2AE@ad.unc.edu> <487B7BAC-B727-45BA-8B7C-3EF33145E60F@mcs.anl.gov> <87twmr3374.fsf@jedbrown.org> Message-ID: <84C0B347-61B0-4031-A1AB-341C9C6B0B35@mcs.anl.gov> > On Jan 5, 2016, at 7:29 PM, Jed Brown wrote: > > Matthew Knepley writes: > >> How devastating would it be for Deal.II if we renamed them >> MatCreateSubMatrix()? ;) > > I know it's consistent with respect to reference counting semantics, but > it might be harder for new users to find when searching the docs. I > have no data either way. I recall discussing years ago that having > paired MatGetSubMatrix/MatRestoreSubMatrix would simplify bookkeeping in > PCFieldSplit and some common user code. Hmm, I don't recall this at all but it sounds like an intriguing idea. Barry > > If the name is changed, it's easy to support the deprecated name from C. From jychang48 at gmail.com Tue Jan 5 20:53:03 2016 From: jychang48 at gmail.com (Justin Chang) Date: Tue, 5 Jan 2016 19:53:03 -0700 Subject: [petsc-users] Array of SNES's Message-ID: Hi all, Is it possible to create an array of SNES's? If I have a problem size N degrees of freedom, I want each dof to have its own SNES solver (basically a pointer to N SNES's). Reason for this is because I am performing a "post-processing" step where after my global solve, each entry of my solution vector of size N will go through some algebraic manipulation. If I did a standard LU solve for these individual SNES's, I could use the same snes and this issue would be moot. But i am using the Variational Inequality, which basically requires a fresh SNES for each problem. Thanks, Justin -------------- next part -------------- An HTML attachment was scrubbed... URL: From jychang48 at gmail.com Tue Jan 5 22:24:11 2016 From: jychang48 at gmail.com (Justin Chang) Date: Tue, 5 Jan 2016 21:24:11 -0700 Subject: [petsc-users] Array of SNES's In-Reply-To: References: Message-ID: Timothee, No i haven't tried, mainly because I don't know how. Btw I am not doing this in C or FORTRAN, I want to do this in python (via petsc4py) since I am trying to make this compatible with Firedrake (which is also python-based). Thanks, Justin On Tue, Jan 5, 2016 at 8:57 PM, Timoth?e Nicolas wrote: > Hello and happy new year, > > Have you actually tried ? I just declared an array of 10 snes and created > them, and there is no complaint whatsoever. Also, something I do usually is > that I declare a derived type which contains some Petsc Objects (like SNES, > KSP, matrices, vectors, whatever), and create arrays of this derived types. > This works perfectly fine in my case (I use FORTRAN btw). > > Best wishes > > Timothee > > > 2016-01-06 11:53 GMT+09:00 Justin Chang : > >> Hi all, >> >> Is it possible to create an array of SNES's? If I have a problem size N >> degrees of freedom, I want each dof to have its own SNES solver (basically >> a pointer to N SNES's). Reason for this is because I am performing a >> "post-processing" step where after my global solve, each entry of my >> solution vector of size N will go through some algebraic manipulation. >> >> If I did a standard LU solve for these individual SNES's, I could use the >> same snes and this issue would be moot. But i am using the Variational >> Inequality, which basically requires a fresh SNES for each problem. >> >> Thanks, >> Justin >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From timothee.nicolas at gmail.com Tue Jan 5 22:28:50 2016 From: timothee.nicolas at gmail.com (=?UTF-8?Q?Timoth=C3=A9e_Nicolas?=) Date: Wed, 6 Jan 2016 13:28:50 +0900 Subject: [petsc-users] Array of SNES's In-Reply-To: References: Message-ID: (Sorry I forgot to answer all in the first message) Well I have never used Petsc in python, but in FORTRAN, it seems to work like any array. So why not use a python list for instance ? You would start with SNESs = [], create a new snes and append it to the list with SNESs.append(snes). Then you can use your list. That would not do it ? Timothee 2016-01-06 13:24 GMT+09:00 Justin Chang : > Timothee, > > No i haven't tried, mainly because I don't know how. Btw I am not doing > this in C or FORTRAN, I want to do this in python (via petsc4py) since I am > trying to make this compatible with Firedrake (which is also python-based). > > Thanks, > Justin > > On Tue, Jan 5, 2016 at 8:57 PM, Timoth?e Nicolas < > timothee.nicolas at gmail.com> wrote: > >> Hello and happy new year, >> >> Have you actually tried ? I just declared an array of 10 snes and created >> them, and there is no complaint whatsoever. Also, something I do usually is >> that I declare a derived type which contains some Petsc Objects (like SNES, >> KSP, matrices, vectors, whatever), and create arrays of this derived types. >> This works perfectly fine in my case (I use FORTRAN btw). >> >> Best wishes >> >> Timothee >> >> >> 2016-01-06 11:53 GMT+09:00 Justin Chang : >> >>> Hi all, >>> >>> Is it possible to create an array of SNES's? If I have a problem size N >>> degrees of freedom, I want each dof to have its own SNES solver (basically >>> a pointer to N SNES's). Reason for this is because I am performing a >>> "post-processing" step where after my global solve, each entry of my >>> solution vector of size N will go through some algebraic manipulation. >>> >>> If I did a standard LU solve for these individual SNES's, I could use >>> the same snes and this issue would be moot. But i am using the Variational >>> Inequality, which basically requires a fresh SNES for each problem. >>> >>> Thanks, >>> Justin >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From patrick.sanan at gmail.com Tue Jan 5 22:30:30 2016 From: patrick.sanan at gmail.com (Patrick Sanan) Date: Tue, 5 Jan 2016 20:30:30 -0800 Subject: [petsc-users] Array of SNES's In-Reply-To: References: Message-ID: <20160106043030.GA15679@Patricks-MacBook-Pro-4.local> Do all the SNES's need to be constructed at the same time? It will obviously require a lot of memory to store N SNES objects (or perhaps your N is small), and if they don't all need to exist simultaneously, then do you have the option to create and destroy one at a time as you loop over your grid points? On Tue, Jan 05, 2016 at 09:24:11PM -0700, Justin Chang wrote: > Timothee, > > No i haven't tried, mainly because I don't know how. Btw I am not doing > this in C or FORTRAN, I want to do this in python (via petsc4py) since I am > trying to make this compatible with Firedrake (which is also python-based). > > Thanks, > Justin > > On Tue, Jan 5, 2016 at 8:57 PM, Timoth?e Nicolas > wrote: > > > Hello and happy new year, > > > > Have you actually tried ? I just declared an array of 10 snes and created > > them, and there is no complaint whatsoever. Also, something I do usually is > > that I declare a derived type which contains some Petsc Objects (like SNES, > > KSP, matrices, vectors, whatever), and create arrays of this derived types. > > This works perfectly fine in my case (I use FORTRAN btw). > > > > Best wishes > > > > Timothee > > > > > > 2016-01-06 11:53 GMT+09:00 Justin Chang : > > > >> Hi all, > >> > >> Is it possible to create an array of SNES's? If I have a problem size N > >> degrees of freedom, I want each dof to have its own SNES solver (basically > >> a pointer to N SNES's). Reason for this is because I am performing a > >> "post-processing" step where after my global solve, each entry of my > >> solution vector of size N will go through some algebraic manipulation. > >> > >> If I did a standard LU solve for these individual SNES's, I could use the > >> same snes and this issue would be moot. But i am using the Variational > >> Inequality, which basically requires a fresh SNES for each problem. > >> > >> Thanks, > >> Justin > >> > > > > From jychang48 at gmail.com Wed Jan 6 01:29:04 2016 From: jychang48 at gmail.com (Justin Chang) Date: Wed, 6 Jan 2016 00:29:04 -0700 Subject: [petsc-users] Array of SNES's In-Reply-To: <20160106043030.GA15679@Patricks-MacBook-Pro-4.local> References: <20160106043030.GA15679@Patricks-MacBook-Pro-4.local> Message-ID: Okay so i think there's no need for this in my case. Doing a standard NewtonLS and using the same SNES was no issue at all. My original issue was dealing with the Variational Inequality at each grid point which seemed to break down unless I "reset" the SNES. But when I use these options: -snes_fd -ksp_type preonly -pc_type lu, it works perfectly if I use the same snes for all N grid points. Yes I sacrifice some time forming a FD, but it's not as great as creating new SNES objects each time. Strange Justin On Tue, Jan 5, 2016 at 9:30 PM, Patrick Sanan wrote: > Do all the SNES's need to be constructed at the same time? It will > obviously require a lot of memory to store N SNES objects (or perhaps > your N is small), and if they don't all need to exist simultaneously, then > do you have the option to > create and destroy one at a time as you loop over your grid points? > On Tue, Jan 05, 2016 at 09:24:11PM -0700, Justin Chang wrote: > > Timothee, > > > > No i haven't tried, mainly because I don't know how. Btw I am not doing > > this in C or FORTRAN, I want to do this in python (via petsc4py) since I > am > > trying to make this compatible with Firedrake (which is also > python-based). > > > > Thanks, > > Justin > > > > On Tue, Jan 5, 2016 at 8:57 PM, Timoth?e Nicolas < > timothee.nicolas at gmail.com > > > wrote: > > > > > Hello and happy new year, > > > > > > Have you actually tried ? I just declared an array of 10 snes and > created > > > them, and there is no complaint whatsoever. Also, something I do > usually is > > > that I declare a derived type which contains some Petsc Objects (like > SNES, > > > KSP, matrices, vectors, whatever), and create arrays of this derived > types. > > > This works perfectly fine in my case (I use FORTRAN btw). > > > > > > Best wishes > > > > > > Timothee > > > > > > > > > 2016-01-06 11:53 GMT+09:00 Justin Chang : > > > > > >> Hi all, > > >> > > >> Is it possible to create an array of SNES's? If I have a problem size > N > > >> degrees of freedom, I want each dof to have its own SNES solver > (basically > > >> a pointer to N SNES's). Reason for this is because I am performing a > > >> "post-processing" step where after my global solve, each entry of my > > >> solution vector of size N will go through some algebraic manipulation. > > >> > > >> If I did a standard LU solve for these individual SNES's, I could use > the > > >> same snes and this issue would be moot. But i am using the Variational > > >> Inequality, which basically requires a fresh SNES for each problem. > > >> > > >> Thanks, > > >> Justin > > >> > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From orxan.shibli at gmail.com Wed Jan 6 06:21:40 2016 From: orxan.shibli at gmail.com (Orxan Shibliyev) Date: Wed, 6 Jan 2016 14:21:40 +0200 Subject: [petsc-users] Shared library error Message-ID: I got the following error after my code worked until some time. I don't know why the error was given while the code was working properly but not in the beginning. Error message: ./out: /usr/lib64/libcrypto.so.10: no version information available (required by /truba/sw/centos6.4/lib/petsc/3.6.3-gcc-5.1.0-openmpi-1.8.5/lib/libpetsc.so.3.6) ./out: /usr/lib64/libssl.so.10: no version information available (required by /truba/sw/centos6.4/lib/petsc/3.6.3-gcc-5.1.0-openmpi-1.8.5/lib/libpetsc.so.3.6) ./out: /usr/lib64/libcrypto.so.10: no version information available (required by /truba/sw/centos6.4/lib/petsc/3.6.3-gcc-5.1.0-openmpi-1.8.5/lib/libpetsc.so.3.6) ./out: /usr/lib64/libssl.so.10: no version information available (required by /truba/sw/centos6.4/lib/petsc/3.6.3-gcc-5.1.0-openmpi-1.8.5/lib/libpetsc.so.3.6) ./out: /usr/lib64/libcrypto.so.10: no version information available (required by /truba/sw/centos6.4/lib/petsc/3.6.3-gcc-5.1.0-openmpi-1.8.5/lib/libpetsc.so.3.6) ./out: /usr/lib64/libssl.so.10: no version information available (required by /truba/sw/centos6.4/lib/petsc/3.6.3-gcc-5.1.0-openmpi-1.8.5/lib/libpetsc.so.3.6) -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Jan 6 07:43:14 2016 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 6 Jan 2016 07:43:14 -0600 Subject: [petsc-users] Shared library error In-Reply-To: References: Message-ID: Does this prevent it from running? It just looks like those libraries have a documentation problem. Matt On Wed, Jan 6, 2016 at 6:21 AM, Orxan Shibliyev wrote: > I got the following error after my code worked until some time. I don't > know why the error was given while the code was working properly but not in > the beginning. > > Error message: > > ./out: /usr/lib64/libcrypto.so.10: no version information available > (required by > /truba/sw/centos6.4/lib/petsc/3.6.3-gcc-5.1.0-openmpi-1.8.5/lib/libpetsc.so.3.6) > ./out: /usr/lib64/libssl.so.10: no version information available (required > by > /truba/sw/centos6.4/lib/petsc/3.6.3-gcc-5.1.0-openmpi-1.8.5/lib/libpetsc.so.3.6) > ./out: /usr/lib64/libcrypto.so.10: no version information available > (required by > /truba/sw/centos6.4/lib/petsc/3.6.3-gcc-5.1.0-openmpi-1.8.5/lib/libpetsc.so.3.6) > ./out: /usr/lib64/libssl.so.10: no version information available (required > by > /truba/sw/centos6.4/lib/petsc/3.6.3-gcc-5.1.0-openmpi-1.8.5/lib/libpetsc.so.3.6) > ./out: /usr/lib64/libcrypto.so.10: no version information available > (required by > /truba/sw/centos6.4/lib/petsc/3.6.3-gcc-5.1.0-openmpi-1.8.5/lib/libpetsc.so.3.6) > ./out: /usr/lib64/libssl.so.10: no version information available (required > by > /truba/sw/centos6.4/lib/petsc/3.6.3-gcc-5.1.0-openmpi-1.8.5/lib/libpetsc.so.3.6) > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From timothee.nicolas at gmail.com Thu Jan 7 07:49:47 2016 From: timothee.nicolas at gmail.com (=?UTF-8?Q?Timoth=C3=A9e_Nicolas?=) Date: Thu, 7 Jan 2016 22:49:47 +0900 Subject: [petsc-users] Block Jacobi for Matrix-free Message-ID: Hello everyone, I have discovered that I need to use Block Jacobi, rather than Jacobi, as a preconditioner/smoother. The linear problem I am solving at this stage lives in a subspace with 3 degrees of freedom, which represent the 3 components of a 3D vector. In particular for multigrid, using BJACOBI instead of JACOBI as a smoother changes everything in terms of efficiency. I know it because I have tested with the actual matrix in matrix format for my problem. However, eventually, I want to be matrix free. My question is, what are the operations I need to provide for the matrix-free approach to accept BJACOBI ? I am confused because when I try to apply BJACOBI to my matrix-free operator; the code asks for MatGetDiagonalBlock (see error below). But MatGetDiagonalBlock, in my understanding, returns a uniprocessor matrix representing the diagonal part of the matrix on this processor (as defined in the manual). Instead, I would expect that what is needed is a routine which returns a 3x3 matrix at the grid point (that is, the block associated with this grid point, coupling the 3 components of the vector together). How does this work ? Do I simply need to code MatGetDiagonalBlock ? Thx Best Timoth?e [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: No support for this operation for this object type [0]PETSC ERROR: Matrix type shell does not support getting diagonal block [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.6.1, Jul, 22, 2015 [0]PETSC ERROR: ./miips on a arch-linux2-c-debug named Carl-9000 by timothee Thu Jan 7 22:41:13 2016 [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack --download-mpich [0]PETSC ERROR: #1 MatGetDiagonalBlock() line 166 in /home/timothee/Documents/petsc-3.6.1/src/mat/interface/matrix.c [0]PETSC ERROR: #2 PCSetUp_BJacobi() line 126 in /home/timothee/Documents/petsc-3.6.1/src/ksp/pc/impls/bjacobi/bjacobi.c [0]PETSC ERROR: #3 PCSetUp() line 982 in /home/timothee/Documents/petsc-3.6.1/src/ksp/pc/interface/precon.c [0]PETSC ERROR: #4 KSPSetUp() line 332 in /home/timothee/Documents/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: #5 KSPSolve() line 546 in /home/timothee/Documents/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Jan 7 08:06:56 2016 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 7 Jan 2016 08:06:56 -0600 Subject: [petsc-users] Block Jacobi for Matrix-free In-Reply-To: References: Message-ID: On Thu, Jan 7, 2016 at 7:49 AM, Timoth?e Nicolas wrote: > Hello everyone, > > I have discovered that I need to use Block Jacobi, rather than Jacobi, as > a preconditioner/smoother. The linear problem I am solving at this stage > lives in a subspace with 3 degrees of freedom, which represent the 3 > components of a 3D vector. In particular for multigrid, using BJACOBI > instead of JACOBI as a smoother changes everything in terms of efficiency. > I know it because I have tested with the actual matrix in matrix format for > my problem. However, eventually, I want to be matrix free. > > My question is, what are the operations I need to provide for the > matrix-free approach to accept BJACOBI ? I am confused because when I try > to apply BJACOBI to my matrix-free operator; the code asks for > MatGetDiagonalBlock (see error below). But MatGetDiagonalBlock, in my > understanding, returns a uniprocessor matrix representing the diagonal part > of the matrix on this processor (as defined in the manual). Instead, I > would expect that what is needed is a routine which returns a 3x3 matrix at > the grid point (that is, the block associated with this grid point, > coupling the 3 components of the vector together). How does this work ? Do > I simply need to code MatGetDiagonalBlock ? > Just like Jacobi does not request one diagonal element at a time, Block-Jacobi does not request one diagonal block at a time. You would need to implement that function, or write a custom block Jacobi for this matrix. Thanks, Matt > > Thx > Best > > Timoth?e > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: No support for this operation for this object type > [0]PETSC ERROR: Matrix type shell does not support getting diagonal block > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.6.1, Jul, 22, 2015 > [0]PETSC ERROR: ./miips on a arch-linux2-c-debug named Carl-9000 by > timothee Thu Jan 7 22:41:13 2016 > [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ > --with-fc=gfortran --download-fblaslapack --download-mpich > [0]PETSC ERROR: #1 MatGetDiagonalBlock() line 166 in > /home/timothee/Documents/petsc-3.6.1/src/mat/interface/matrix.c > [0]PETSC ERROR: #2 PCSetUp_BJacobi() line 126 in > /home/timothee/Documents/petsc-3.6.1/src/ksp/pc/impls/bjacobi/bjacobi.c > [0]PETSC ERROR: #3 PCSetUp() line 982 in > /home/timothee/Documents/petsc-3.6.1/src/ksp/pc/interface/precon.c > [0]PETSC ERROR: #4 KSPSetUp() line 332 in > /home/timothee/Documents/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: #5 KSPSolve() line 546 in > /home/timothee/Documents/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From timothee.nicolas at gmail.com Thu Jan 7 08:08:46 2016 From: timothee.nicolas at gmail.com (=?UTF-8?Q?Timoth=C3=A9e_Nicolas?=) Date: Thu, 7 Jan 2016 23:08:46 +0900 Subject: [petsc-users] Block Jacobi for Matrix-free In-Reply-To: References: Message-ID: Ok, so it should be sufficient. Great, I think I can do it. Best Timoth?e 2016-01-07 23:06 GMT+09:00 Matthew Knepley : > On Thu, Jan 7, 2016 at 7:49 AM, Timoth?e Nicolas < > timothee.nicolas at gmail.com> wrote: > >> Hello everyone, >> >> I have discovered that I need to use Block Jacobi, rather than Jacobi, as >> a preconditioner/smoother. The linear problem I am solving at this stage >> lives in a subspace with 3 degrees of freedom, which represent the 3 >> components of a 3D vector. In particular for multigrid, using BJACOBI >> instead of JACOBI as a smoother changes everything in terms of efficiency. >> I know it because I have tested with the actual matrix in matrix format for >> my problem. However, eventually, I want to be matrix free. >> >> My question is, what are the operations I need to provide for the >> matrix-free approach to accept BJACOBI ? I am confused because when I try >> to apply BJACOBI to my matrix-free operator; the code asks for >> MatGetDiagonalBlock (see error below). But MatGetDiagonalBlock, in my >> understanding, returns a uniprocessor matrix representing the diagonal part >> of the matrix on this processor (as defined in the manual). Instead, I >> would expect that what is needed is a routine which returns a 3x3 matrix at >> the grid point (that is, the block associated with this grid point, >> coupling the 3 components of the vector together). How does this work ? Do >> I simply need to code MatGetDiagonalBlock ? >> > > Just like Jacobi does not request one diagonal element at a time, > Block-Jacobi does not request one diagonal block at a time. You > would need to implement that function, or write a custom block Jacobi for > this matrix. > > Thanks, > > Matt > > >> >> Thx >> Best >> >> Timoth?e >> >> [0]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [0]PETSC ERROR: No support for this operation for this object type >> [0]PETSC ERROR: Matrix type shell does not support getting diagonal block >> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html >> for trouble shooting. >> [0]PETSC ERROR: Petsc Release Version 3.6.1, Jul, 22, 2015 >> [0]PETSC ERROR: ./miips on a arch-linux2-c-debug named Carl-9000 by >> timothee Thu Jan 7 22:41:13 2016 >> [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ >> --with-fc=gfortran --download-fblaslapack --download-mpich >> [0]PETSC ERROR: #1 MatGetDiagonalBlock() line 166 in >> /home/timothee/Documents/petsc-3.6.1/src/mat/interface/matrix.c >> [0]PETSC ERROR: #2 PCSetUp_BJacobi() line 126 in >> /home/timothee/Documents/petsc-3.6.1/src/ksp/pc/impls/bjacobi/bjacobi.c >> [0]PETSC ERROR: #3 PCSetUp() line 982 in >> /home/timothee/Documents/petsc-3.6.1/src/ksp/pc/interface/precon.c >> [0]PETSC ERROR: #4 KSPSetUp() line 332 in >> /home/timothee/Documents/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c >> [0]PETSC ERROR: #5 KSPSolve() line 546 in >> /home/timothee/Documents/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu Jan 7 11:38:58 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 7 Jan 2016 11:38:58 -0600 Subject: [petsc-users] Block Jacobi for Matrix-free In-Reply-To: References: Message-ID: <360F0D42-75ED-419F-BFD2-6494AA67F9AA@mcs.anl.gov> Timothee, You are mixing up block Jacobi PCBJACOBI (which in PETSc generally uses "big" blocks) and point block Jacobi PCPBJACOBI (which generally means all the degrees of freedom associated with a single grid point -- in your case 3). If you are doing matrix free with a shell matrix then you need to provide your own MatInvertBlockDiagonal() which in your case would invert each of your little 3 by 3 blocks and store the result in a 1d array; each little block in column major order followed by the next one. See for example MatInvertBlockDiagonal_SeqAIJ(). You also need you matrix to return a block size of 3. Barry > On Jan 7, 2016, at 8:08 AM, Timoth?e Nicolas wrote: > > Ok, so it should be sufficient. Great, I think I can do it. > > Best > > Timoth?e > > 2016-01-07 23:06 GMT+09:00 Matthew Knepley : > On Thu, Jan 7, 2016 at 7:49 AM, Timoth?e Nicolas wrote: > Hello everyone, > > I have discovered that I need to use Block Jacobi, rather than Jacobi, as a preconditioner/smoother. The linear problem I am solving at this stage lives in a subspace with 3 degrees of freedom, which represent the 3 components of a 3D vector. In particular for multigrid, using BJACOBI instead of JACOBI as a smoother changes everything in terms of efficiency. I know it because I have tested with the actual matrix in matrix format for my problem. However, eventually, I want to be matrix free. > > My question is, what are the operations I need to provide for the matrix-free approach to accept BJACOBI ? I am confused because when I try to apply BJACOBI to my matrix-free operator; the code asks for MatGetDiagonalBlock (see error below). But MatGetDiagonalBlock, in my understanding, returns a uniprocessor matrix representing the diagonal part of the matrix on this processor (as defined in the manual). Instead, I would expect that what is needed is a routine which returns a 3x3 matrix at the grid point (that is, the block associated with this grid point, coupling the 3 components of the vector together). How does this work ? Do I simply need to code MatGetDiagonalBlock ? > > Just like Jacobi does not request one diagonal element at a time, Block-Jacobi does not request one diagonal block at a time. You > would need to implement that function, or write a custom block Jacobi for this matrix. > > Thanks, > > Matt > > > Thx > Best > > Timoth?e > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: No support for this operation for this object type > [0]PETSC ERROR: Matrix type shell does not support getting diagonal block > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.6.1, Jul, 22, 2015 > [0]PETSC ERROR: ./miips on a arch-linux2-c-debug named Carl-9000 by timothee Thu Jan 7 22:41:13 2016 > [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack --download-mpich > [0]PETSC ERROR: #1 MatGetDiagonalBlock() line 166 in /home/timothee/Documents/petsc-3.6.1/src/mat/interface/matrix.c > [0]PETSC ERROR: #2 PCSetUp_BJacobi() line 126 in /home/timothee/Documents/petsc-3.6.1/src/ksp/pc/impls/bjacobi/bjacobi.c > [0]PETSC ERROR: #3 PCSetUp() line 982 in /home/timothee/Documents/petsc-3.6.1/src/ksp/pc/interface/precon.c > [0]PETSC ERROR: #4 KSPSetUp() line 332 in /home/timothee/Documents/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: #5 KSPSolve() line 546 in /home/timothee/Documents/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > From timothee.nicolas at gmail.com Thu Jan 7 19:58:38 2016 From: timothee.nicolas at gmail.com (=?UTF-8?Q?Timoth=C3=A9e_Nicolas?=) Date: Fri, 8 Jan 2016 10:58:38 +0900 Subject: [petsc-users] Block Jacobi for Matrix-free In-Reply-To: <360F0D42-75ED-419F-BFD2-6494AA67F9AA@mcs.anl.gov> References: <360F0D42-75ED-419F-BFD2-6494AA67F9AA@mcs.anl.gov> Message-ID: I see, I just tested PCPBJACOBI, which is better than PCJACOBI, but still I may need PCBJACOBI. The problem is that I don't seem to be allowed to define the matrix operation for MatGetDiagonalBlock... Indeed, I don't find MATOP_GET_DIAGONAL_BLOCK in ${PETSC_DIR}/include/petscmat.h Therefore, when I try to define it, I get the following error at compilation (quite logically) matrices.F90(174): error #6404: This name does not have a type, and must have an explicit type. [MATOP_GET_DIAGONAL_BLOCK] call MatShellSetOperation(lctx(1)%PSmat,MATOP_GET_DIAGONAL_BLOCK,PSmatGetDiagonalBlock,ierr) ---------------------------------------------^ Also, if I change my mind and instead decide to go for PCPBJACOBI, I still have a problem because the manual says that the routine you talk about, MatInvertBlockDiagonal, is not available from FORTRAN. Indeed I cannot call it. I still cannot call it after I provide a routine corresponding to MATOP_INVERT_BLOCK_DIAGONAL. So, it seems to mean that if I want to use this kind of algorithms, I will have to hard code them, which would be too bad. Is that right, or is there an other way around these two issues ? Best Timothee 2016-01-08 2:38 GMT+09:00 Barry Smith : > > Timothee, > > You are mixing up block Jacobi PCBJACOBI (which in PETSc generally > uses "big" blocks) and point block Jacobi PCPBJACOBI (which generally means > all the degrees of freedom associated with a single grid point -- in your > case 3). > > If you are doing matrix free with a shell matrix then you need to > provide your own MatInvertBlockDiagonal() which in your case would invert > each of your little 3 by 3 blocks and store the result in a 1d array; each > little block in column major order followed by the next one. See for > example MatInvertBlockDiagonal_SeqAIJ(). You also need you matrix to return > a block size of 3. > > > Barry > > > > > On Jan 7, 2016, at 8:08 AM, Timoth?e Nicolas > wrote: > > > > Ok, so it should be sufficient. Great, I think I can do it. > > > > Best > > > > Timoth?e > > > > 2016-01-07 23:06 GMT+09:00 Matthew Knepley : > > On Thu, Jan 7, 2016 at 7:49 AM, Timoth?e Nicolas < > timothee.nicolas at gmail.com> wrote: > > Hello everyone, > > > > I have discovered that I need to use Block Jacobi, rather than Jacobi, > as a preconditioner/smoother. The linear problem I am solving at this stage > lives in a subspace with 3 degrees of freedom, which represent the 3 > components of a 3D vector. In particular for multigrid, using BJACOBI > instead of JACOBI as a smoother changes everything in terms of efficiency. > I know it because I have tested with the actual matrix in matrix format for > my problem. However, eventually, I want to be matrix free. > > > > My question is, what are the operations I need to provide for the > matrix-free approach to accept BJACOBI ? I am confused because when I try > to apply BJACOBI to my matrix-free operator; the code asks for > MatGetDiagonalBlock (see error below). But MatGetDiagonalBlock, in my > understanding, returns a uniprocessor matrix representing the diagonal part > of the matrix on this processor (as defined in the manual). Instead, I > would expect that what is needed is a routine which returns a 3x3 matrix at > the grid point (that is, the block associated with this grid point, > coupling the 3 components of the vector together). How does this work ? Do > I simply need to code MatGetDiagonalBlock ? > > > > Just like Jacobi does not request one diagonal element at a time, > Block-Jacobi does not request one diagonal block at a time. You > > would need to implement that function, or write a custom block Jacobi > for this matrix. > > > > Thanks, > > > > Matt > > > > > > Thx > > Best > > > > Timoth?e > > > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [0]PETSC ERROR: No support for this operation for this object type > > [0]PETSC ERROR: Matrix type shell does not support getting diagonal block > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > > [0]PETSC ERROR: Petsc Release Version 3.6.1, Jul, 22, 2015 > > [0]PETSC ERROR: ./miips on a arch-linux2-c-debug named Carl-9000 by > timothee Thu Jan 7 22:41:13 2016 > > [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ > --with-fc=gfortran --download-fblaslapack --download-mpich > > [0]PETSC ERROR: #1 MatGetDiagonalBlock() line 166 in > /home/timothee/Documents/petsc-3.6.1/src/mat/interface/matrix.c > > [0]PETSC ERROR: #2 PCSetUp_BJacobi() line 126 in > /home/timothee/Documents/petsc-3.6.1/src/ksp/pc/impls/bjacobi/bjacobi.c > > [0]PETSC ERROR: #3 PCSetUp() line 982 in > /home/timothee/Documents/petsc-3.6.1/src/ksp/pc/interface/precon.c > > [0]PETSC ERROR: #4 KSPSetUp() line 332 in > /home/timothee/Documents/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c > > [0]PETSC ERROR: #5 KSPSolve() line 546 in > /home/timothee/Documents/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu Jan 7 20:06:29 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 7 Jan 2016 20:06:29 -0600 Subject: [petsc-users] Block Jacobi for Matrix-free In-Reply-To: References: <360F0D42-75ED-419F-BFD2-6494AA67F9AA@mcs.anl.gov> Message-ID: > On Jan 7, 2016, at 7:58 PM, Timoth?e Nicolas wrote: > > I see, I just tested PCPBJACOBI, which is better than PCJACOBI, but still I may need PCBJACOBI. Note that using PCBJACOBI means you are providing big blocks of the Jacobian. If you do provide big blocks of the Jacobian you might as well just provide the entire Jacobin IMHO. Anyways the easiest way to do either PCPBJACOBI or PCBJACOBI is to explicitly construct the portion of the Jacobian you need, in a AIJ or BAIJ matrix and pass that as the SECOND matrix argument to KSPSetOperator() or SNESSetJacobian() then PETSc will use the piece you provide to build the preconditioner. So for example if you want PBJACOBI you would create a BAIJ matrix with block size 3 and only fill up the 3 by 3 block diagonal with Jacobian entries. Barry > The problem is that I don't seem to be allowed to define the matrix operation for MatGetDiagonalBlock... Indeed, I don't find > > MATOP_GET_DIAGONAL_BLOCK in ${PETSC_DIR}/include/petscmat.h > > Therefore, when I try to define it, I get the following error at compilation (quite logically) > > matrices.F90(174): error #6404: This name does not have a type, and must have an explicit type. [MATOP_GET_DIAGONAL_BLOCK] > call MatShellSetOperation(lctx(1)%PSmat,MATOP_GET_DIAGONAL_BLOCK,PSmatGetDiagonalBlock,ierr) > ---------------------------------------------^ > > Also, if I change my mind and instead decide to go for PCPBJACOBI, I still have a problem because the manual says that the routine you talk about, MatInvertBlockDiagonal, is not available from FORTRAN. Indeed I cannot call it. I still cannot call it after I provide a routine corresponding to MATOP_INVERT_BLOCK_DIAGONAL. > > So, it seems to mean that if I want to use this kind of algorithms, I will have to hard code them, which would be too bad. Is that right, or is there an other way around these two issues ? > > Best > > Timothee > > > > > 2016-01-08 2:38 GMT+09:00 Barry Smith : > > Timothee, > > You are mixing up block Jacobi PCBJACOBI (which in PETSc generally uses "big" blocks) and point block Jacobi PCPBJACOBI (which generally means all the degrees of freedom associated with a single grid point -- in your case 3). > > If you are doing matrix free with a shell matrix then you need to provide your own MatInvertBlockDiagonal() which in your case would invert each of your little 3 by 3 blocks and store the result in a 1d array; each little block in column major order followed by the next one. See for example MatInvertBlockDiagonal_SeqAIJ(). You also need you matrix to return a block size of 3. > > > Barry > > > > > On Jan 7, 2016, at 8:08 AM, Timoth?e Nicolas wrote: > > > > Ok, so it should be sufficient. Great, I think I can do it. > > > > Best > > > > Timoth?e > > > > 2016-01-07 23:06 GMT+09:00 Matthew Knepley : > > On Thu, Jan 7, 2016 at 7:49 AM, Timoth?e Nicolas wrote: > > Hello everyone, > > > > I have discovered that I need to use Block Jacobi, rather than Jacobi, as a preconditioner/smoother. The linear problem I am solving at this stage lives in a subspace with 3 degrees of freedom, which represent the 3 components of a 3D vector. In particular for multigrid, using BJACOBI instead of JACOBI as a smoother changes everything in terms of efficiency. I know it because I have tested with the actual matrix in matrix format for my problem. However, eventually, I want to be matrix free. > > > > My question is, what are the operations I need to provide for the matrix-free approach to accept BJACOBI ? I am confused because when I try to apply BJACOBI to my matrix-free operator; the code asks for MatGetDiagonalBlock (see error below). But MatGetDiagonalBlock, in my understanding, returns a uniprocessor matrix representing the diagonal part of the matrix on this processor (as defined in the manual). Instead, I would expect that what is needed is a routine which returns a 3x3 matrix at the grid point (that is, the block associated with this grid point, coupling the 3 components of the vector together). How does this work ? Do I simply need to code MatGetDiagonalBlock ? > > > > Just like Jacobi does not request one diagonal element at a time, Block-Jacobi does not request one diagonal block at a time. You > > would need to implement that function, or write a custom block Jacobi for this matrix. > > > > Thanks, > > > > Matt > > > > > > Thx > > Best > > > > Timoth?e > > > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > [0]PETSC ERROR: No support for this operation for this object type > > [0]PETSC ERROR: Matrix type shell does not support getting diagonal block > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > > [0]PETSC ERROR: Petsc Release Version 3.6.1, Jul, 22, 2015 > > [0]PETSC ERROR: ./miips on a arch-linux2-c-debug named Carl-9000 by timothee Thu Jan 7 22:41:13 2016 > > [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack --download-mpich > > [0]PETSC ERROR: #1 MatGetDiagonalBlock() line 166 in /home/timothee/Documents/petsc-3.6.1/src/mat/interface/matrix.c > > [0]PETSC ERROR: #2 PCSetUp_BJacobi() line 126 in /home/timothee/Documents/petsc-3.6.1/src/ksp/pc/impls/bjacobi/bjacobi.c > > [0]PETSC ERROR: #3 PCSetUp() line 982 in /home/timothee/Documents/petsc-3.6.1/src/ksp/pc/interface/precon.c > > [0]PETSC ERROR: #4 KSPSetUp() line 332 in /home/timothee/Documents/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c > > [0]PETSC ERROR: #5 KSPSolve() line 546 in /home/timothee/Documents/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c > > > > > > > > -- > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > > -- Norbert Wiener > > > > From timothee.nicolas at gmail.com Thu Jan 7 20:31:35 2016 From: timothee.nicolas at gmail.com (=?UTF-8?Q?Timoth=C3=A9e_Nicolas?=) Date: Fri, 8 Jan 2016 11:31:35 +0900 Subject: [petsc-users] Block Jacobi for Matrix-free In-Reply-To: References: <360F0D42-75ED-419F-BFD2-6494AA67F9AA@mcs.anl.gov> Message-ID: Ah, I understand, so by allocating this BAIJ in an intelligent way (allocating only the diagonal 3x3 blocks), I can still be basically memory efficient, and use matrix-free formulation for the first matrix in KSPSetOperator, right ? Timothee 2016-01-08 11:06 GMT+09:00 Barry Smith : > > > On Jan 7, 2016, at 7:58 PM, Timoth?e Nicolas > wrote: > > > > I see, I just tested PCPBJACOBI, which is better than PCJACOBI, but > still I may need PCBJACOBI. > > Note that using PCBJACOBI means you are providing big blocks of the > Jacobian. If you do provide big blocks of the Jacobian you might as well > just provide the entire Jacobin IMHO. > > Anyways the easiest way to do either PCPBJACOBI or PCBJACOBI is to > explicitly construct the portion of the Jacobian you need, in a AIJ or BAIJ > matrix and pass that as the SECOND matrix argument to KSPSetOperator() or > SNESSetJacobian() then PETSc will use the piece you provide to build the > preconditioner. So for example if you want PBJACOBI you would create a BAIJ > matrix with block size 3 and only fill up the 3 by 3 block diagonal with > Jacobian entries. > > > Barry > > > > The problem is that I don't seem to be allowed to define the matrix > operation for MatGetDiagonalBlock... Indeed, I don't find > > > > MATOP_GET_DIAGONAL_BLOCK in ${PETSC_DIR}/include/petscmat.h > > > > Therefore, when I try to define it, I get the following error at > compilation (quite logically) > > > > matrices.F90(174): error #6404: This name does not have a type, and must > have an explicit type. [MATOP_GET_DIAGONAL_BLOCK] > > call > MatShellSetOperation(lctx(1)%PSmat,MATOP_GET_DIAGONAL_BLOCK,PSmatGetDiagonalBlock,ierr) > > ---------------------------------------------^ > > > > Also, if I change my mind and instead decide to go for PCPBJACOBI, I > still have a problem because the manual says that the routine you talk > about, MatInvertBlockDiagonal, is not available from FORTRAN. Indeed I > cannot call it. I still cannot call it after I provide a routine > corresponding to MATOP_INVERT_BLOCK_DIAGONAL. > > > > So, it seems to mean that if I want to use this kind of algorithms, I > will have to hard code them, which would be too bad. Is that right, or is > there an other way around these two issues ? > > > > Best > > > > Timothee > > > > > > > > > > 2016-01-08 2:38 GMT+09:00 Barry Smith : > > > > Timothee, > > > > You are mixing up block Jacobi PCBJACOBI (which in PETSc generally > uses "big" blocks) and point block Jacobi PCPBJACOBI (which generally means > all the degrees of freedom associated with a single grid point -- in your > case 3). > > > > If you are doing matrix free with a shell matrix then you need to > provide your own MatInvertBlockDiagonal() which in your case would invert > each of your little 3 by 3 blocks and store the result in a 1d array; each > little block in column major order followed by the next one. See for > example MatInvertBlockDiagonal_SeqAIJ(). You also need you matrix to return > a block size of 3. > > > > > > Barry > > > > > > > > > On Jan 7, 2016, at 8:08 AM, Timoth?e Nicolas < > timothee.nicolas at gmail.com> wrote: > > > > > > Ok, so it should be sufficient. Great, I think I can do it. > > > > > > Best > > > > > > Timoth?e > > > > > > 2016-01-07 23:06 GMT+09:00 Matthew Knepley : > > > On Thu, Jan 7, 2016 at 7:49 AM, Timoth?e Nicolas < > timothee.nicolas at gmail.com> wrote: > > > Hello everyone, > > > > > > I have discovered that I need to use Block Jacobi, rather than Jacobi, > as a preconditioner/smoother. The linear problem I am solving at this stage > lives in a subspace with 3 degrees of freedom, which represent the 3 > components of a 3D vector. In particular for multigrid, using BJACOBI > instead of JACOBI as a smoother changes everything in terms of efficiency. > I know it because I have tested with the actual matrix in matrix format for > my problem. However, eventually, I want to be matrix free. > > > > > > My question is, what are the operations I need to provide for the > matrix-free approach to accept BJACOBI ? I am confused because when I try > to apply BJACOBI to my matrix-free operator; the code asks for > MatGetDiagonalBlock (see error below). But MatGetDiagonalBlock, in my > understanding, returns a uniprocessor matrix representing the diagonal part > of the matrix on this processor (as defined in the manual). Instead, I > would expect that what is needed is a routine which returns a 3x3 matrix at > the grid point (that is, the block associated with this grid point, > coupling the 3 components of the vector together). How does this work ? Do > I simply need to code MatGetDiagonalBlock ? > > > > > > Just like Jacobi does not request one diagonal element at a time, > Block-Jacobi does not request one diagonal block at a time. You > > > would need to implement that function, or write a custom block Jacobi > for this matrix. > > > > > > Thanks, > > > > > > Matt > > > > > > > > > Thx > > > Best > > > > > > Timoth?e > > > > > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > > [0]PETSC ERROR: No support for this operation for this object type > > > [0]PETSC ERROR: Matrix type shell does not support getting diagonal > block > > > [0]PETSC ERROR: See > http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > > > [0]PETSC ERROR: Petsc Release Version 3.6.1, Jul, 22, 2015 > > > [0]PETSC ERROR: ./miips on a arch-linux2-c-debug named Carl-9000 by > timothee Thu Jan 7 22:41:13 2016 > > > [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ > --with-fc=gfortran --download-fblaslapack --download-mpich > > > [0]PETSC ERROR: #1 MatGetDiagonalBlock() line 166 in > /home/timothee/Documents/petsc-3.6.1/src/mat/interface/matrix.c > > > [0]PETSC ERROR: #2 PCSetUp_BJacobi() line 126 in > /home/timothee/Documents/petsc-3.6.1/src/ksp/pc/impls/bjacobi/bjacobi.c > > > [0]PETSC ERROR: #3 PCSetUp() line 982 in > /home/timothee/Documents/petsc-3.6.1/src/ksp/pc/interface/precon.c > > > [0]PETSC ERROR: #4 KSPSetUp() line 332 in > /home/timothee/Documents/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c > > > [0]PETSC ERROR: #5 KSPSolve() line 546 in > /home/timothee/Documents/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c > > > > > > > > > > > > -- > > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > > -- Norbert Wiener > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu Jan 7 20:37:03 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 7 Jan 2016 20:37:03 -0600 Subject: [petsc-users] Block Jacobi for Matrix-free In-Reply-To: References: <360F0D42-75ED-419F-BFD2-6494AA67F9AA@mcs.anl.gov> Message-ID: <9563B240-921E-4D5D-AA86-E4D07C74362D@mcs.anl.gov> > On Jan 7, 2016, at 8:31 PM, Timoth?e Nicolas wrote: > > Ah, I understand, so by allocating this BAIJ in an intelligent way (allocating only the diagonal 3x3 blocks), I can still be basically memory efficient, and use matrix-free formulation for the first matrix in KSPSetOperator, right ? Exactly > > Timothee > > 2016-01-08 11:06 GMT+09:00 Barry Smith : > > > On Jan 7, 2016, at 7:58 PM, Timoth?e Nicolas wrote: > > > > I see, I just tested PCPBJACOBI, which is better than PCJACOBI, but still I may need PCBJACOBI. > > Note that using PCBJACOBI means you are providing big blocks of the Jacobian. If you do provide big blocks of the Jacobian you might as well just provide the entire Jacobin IMHO. > > Anyways the easiest way to do either PCPBJACOBI or PCBJACOBI is to explicitly construct the portion of the Jacobian you need, in a AIJ or BAIJ matrix and pass that as the SECOND matrix argument to KSPSetOperator() or SNESSetJacobian() then PETSc will use the piece you provide to build the preconditioner. So for example if you want PBJACOBI you would create a BAIJ matrix with block size 3 and only fill up the 3 by 3 block diagonal with Jacobian entries. > > > Barry > > > > The problem is that I don't seem to be allowed to define the matrix operation for MatGetDiagonalBlock... Indeed, I don't find > > > > MATOP_GET_DIAGONAL_BLOCK in ${PETSC_DIR}/include/petscmat.h > > > > Therefore, when I try to define it, I get the following error at compilation (quite logically) > > > > matrices.F90(174): error #6404: This name does not have a type, and must have an explicit type. [MATOP_GET_DIAGONAL_BLOCK] > > call MatShellSetOperation(lctx(1)%PSmat,MATOP_GET_DIAGONAL_BLOCK,PSmatGetDiagonalBlock,ierr) > > ---------------------------------------------^ > > > > Also, if I change my mind and instead decide to go for PCPBJACOBI, I still have a problem because the manual says that the routine you talk about, MatInvertBlockDiagonal, is not available from FORTRAN. Indeed I cannot call it. I still cannot call it after I provide a routine corresponding to MATOP_INVERT_BLOCK_DIAGONAL. > > > > So, it seems to mean that if I want to use this kind of algorithms, I will have to hard code them, which would be too bad. Is that right, or is there an other way around these two issues ? > > > > Best > > > > Timothee > > > > > > > > > > 2016-01-08 2:38 GMT+09:00 Barry Smith : > > > > Timothee, > > > > You are mixing up block Jacobi PCBJACOBI (which in PETSc generally uses "big" blocks) and point block Jacobi PCPBJACOBI (which generally means all the degrees of freedom associated with a single grid point -- in your case 3). > > > > If you are doing matrix free with a shell matrix then you need to provide your own MatInvertBlockDiagonal() which in your case would invert each of your little 3 by 3 blocks and store the result in a 1d array; each little block in column major order followed by the next one. See for example MatInvertBlockDiagonal_SeqAIJ(). You also need you matrix to return a block size of 3. > > > > > > Barry > > > > > > > > > On Jan 7, 2016, at 8:08 AM, Timoth?e Nicolas wrote: > > > > > > Ok, so it should be sufficient. Great, I think I can do it. > > > > > > Best > > > > > > Timoth?e > > > > > > 2016-01-07 23:06 GMT+09:00 Matthew Knepley : > > > On Thu, Jan 7, 2016 at 7:49 AM, Timoth?e Nicolas wrote: > > > Hello everyone, > > > > > > I have discovered that I need to use Block Jacobi, rather than Jacobi, as a preconditioner/smoother. The linear problem I am solving at this stage lives in a subspace with 3 degrees of freedom, which represent the 3 components of a 3D vector. In particular for multigrid, using BJACOBI instead of JACOBI as a smoother changes everything in terms of efficiency. I know it because I have tested with the actual matrix in matrix format for my problem. However, eventually, I want to be matrix free. > > > > > > My question is, what are the operations I need to provide for the matrix-free approach to accept BJACOBI ? I am confused because when I try to apply BJACOBI to my matrix-free operator; the code asks for MatGetDiagonalBlock (see error below). But MatGetDiagonalBlock, in my understanding, returns a uniprocessor matrix representing the diagonal part of the matrix on this processor (as defined in the manual). Instead, I would expect that what is needed is a routine which returns a 3x3 matrix at the grid point (that is, the block associated with this grid point, coupling the 3 components of the vector together). How does this work ? Do I simply need to code MatGetDiagonalBlock ? > > > > > > Just like Jacobi does not request one diagonal element at a time, Block-Jacobi does not request one diagonal block at a time. You > > > would need to implement that function, or write a custom block Jacobi for this matrix. > > > > > > Thanks, > > > > > > Matt > > > > > > > > > Thx > > > Best > > > > > > Timoth?e > > > > > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > > [0]PETSC ERROR: No support for this operation for this object type > > > [0]PETSC ERROR: Matrix type shell does not support getting diagonal block > > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > > > [0]PETSC ERROR: Petsc Release Version 3.6.1, Jul, 22, 2015 > > > [0]PETSC ERROR: ./miips on a arch-linux2-c-debug named Carl-9000 by timothee Thu Jan 7 22:41:13 2016 > > > [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack --download-mpich > > > [0]PETSC ERROR: #1 MatGetDiagonalBlock() line 166 in /home/timothee/Documents/petsc-3.6.1/src/mat/interface/matrix.c > > > [0]PETSC ERROR: #2 PCSetUp_BJacobi() line 126 in /home/timothee/Documents/petsc-3.6.1/src/ksp/pc/impls/bjacobi/bjacobi.c > > > [0]PETSC ERROR: #3 PCSetUp() line 982 in /home/timothee/Documents/petsc-3.6.1/src/ksp/pc/interface/precon.c > > > [0]PETSC ERROR: #4 KSPSetUp() line 332 in /home/timothee/Documents/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c > > > [0]PETSC ERROR: #5 KSPSolve() line 546 in /home/timothee/Documents/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c > > > > > > > > > > > > -- > > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > > > -- Norbert Wiener > > > > > > > > > From orxan.shibli at gmail.com Fri Jan 8 07:33:00 2016 From: orxan.shibli at gmail.com (Orxan Shibliyev) Date: Fri, 8 Jan 2016 15:33:00 +0200 Subject: [petsc-users] blas and lapack directory Message-ID: I am trying to configure petsc by giving blas and lapack directory myself. First of all, can I just give the directory of lapack for --with-blas-lapack-dir since blas is included in lapack. Secondly, I tried what I said above but I got the message that the folder I provided cannot be used. I am not sure if this is due to permission rights on cluster I work on or due to a configure mistake. -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Jan 8 07:37:47 2016 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 8 Jan 2016 07:37:47 -0600 Subject: [petsc-users] blas and lapack directory In-Reply-To: References: Message-ID: For any configure question, you need to send configure.log Matt On Fri, Jan 8, 2016 at 7:33 AM, Orxan Shibliyev wrote: > I am trying to configure petsc by giving blas and lapack directory myself. > First of all, can I just give the directory of lapack > for --with-blas-lapack-dir since blas is included in lapack. Secondly, I > tried what I said above but I got the message that the folder I provided > cannot be used. I am not sure if this is due to permission rights on > cluster I work on or due to a configure mistake. > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From mpovolot at purdue.edu Fri Jan 8 14:28:17 2016 From: mpovolot at purdue.edu (Michael Povolotskyi) Date: Fri, 08 Jan 2016 15:28:17 -0500 Subject: [petsc-users] question about SNESLINESEARCHBT Message-ID: <56901BE1.7010605@purdue.edu> Dear Petsc developers and users, I solve nonlinear systems with Newton Broyden method with Petsc. I use line search algorithm of type SNESLINESEARCHBT. Usually it works, but sometimes diverges. I want to change its parameters that are listed in http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESLINESEARCHBT.html#SNESLINESEARCHBT Is there any way to set them in the code rather then from the command line? Thank you, Michael. -- Michael Povolotskyi, PhD Research Assistant Professor Network for Computational Nanotechnology Hall for Discover and Learning Research, Room 441 West Lafayette, IN 47907 Phone (765) 4949396 From bsmith at mcs.anl.gov Fri Jan 8 14:35:26 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 8 Jan 2016 14:35:26 -0600 Subject: [petsc-users] question about SNESLINESEARCHBT In-Reply-To: <56901BE1.7010605@purdue.edu> References: <56901BE1.7010605@purdue.edu> Message-ID: <828B8DC3-81B3-4EB9-9339-A4A834B08DF1@mcs.anl.gov> SNESGetLineSearch() then things like SNESLineSearchSetTolerances() SNESLineSearchSetLambda(), SNESLineSearchSetOrder() Barry > On Jan 8, 2016, at 2:28 PM, Michael Povolotskyi wrote: > > Dear Petsc developers and users, > I solve nonlinear systems with Newton Broyden method with Petsc. > I use line search algorithm of type SNESLINESEARCHBT. > Usually it works, but sometimes diverges. I want to change its parameters that are listed in > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESLINESEARCHBT.html#SNESLINESEARCHBT > > Is there any way to set them in the code rather then from the command line? > Thank you, > Michael. > > -- > Michael Povolotskyi, PhD > Research Assistant Professor > Network for Computational Nanotechnology > Hall for Discover and Learning Research, Room 441 > West Lafayette, IN 47907 > Phone (765) 4949396 > From tabrezali at gmail.com Mon Jan 11 09:41:42 2016 From: tabrezali at gmail.com (Tabrez Ali) Date: Mon, 11 Jan 2016 09:41:42 -0600 Subject: [petsc-users] METIS without C++ compiler Message-ID: <5693CD36.10205@gmail.com> Hello I just wanted to point that configure fails when "--with-metis=1 --download-metis=1" options are used and a C++ compiler is not installed. After changing "project(METIS)" to "project(METIS C)" in petsc-3.6.3/arch-linux2-c-opt/externalpackages/metis-5.1.0-p1/CMakeLists.txt it works alright. Not sure if there is another way to suppress the check. Regards, Tabrez ******************************************************************************* UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): ------------------------------------------------------------------------------- Error configuring METIS with cmake Could not execute "cd /home/ubuntu/petsc-3.6.3/arch-linux2-c-opt/externalpackages/metis-5.1.0-p1/build && /usr/bin/cmake .. -DCMAKE_INSTALL_PREFIX=/home/ubuntu/petsc-3.6.3/arch-linux2-c-opt -DCMAKE_VERBOSE_MAKEFILE=1 -DCMAKE_C_COMPILER="/home/ubuntu/petsc-3.6.3/arch-linux2-c-opt/bin/mpicc" -DCMAKE_AR=/usr/bin/ar -DCMAKE_RANLIB=/usr/bin/ranlib -DCMAKE_C_FLAGS:STRING="-fPIC -O" -DCMAKE_Fortran_COMPILER="/home/ubuntu/petsc-3.6.3/arch-linux2-c-opt/bin/mpif90" -DCMAKE_Fortran_FLAGS:STRING="-fPIC -ffree-line-length-0 -O" -DGKLIB_PATH=../GKlib -DSHARED=1 -DMETIS_USE_DOUBLEPRECISION=1": -- The C compiler identification is GNU 4.8.4 -- The CXX compiler identification is unknown -- Check for working C compiler: /home/ubuntu/petsc-3.6.3/arch-linux2-c-opt/bin/mpicc -- Check for working C compiler: /home/ubuntu/petsc-3.6.3/arch-linux2-c-opt/bin/mpicc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Looking for execinfo.h -- Looking for execinfo.h - found -- Looking for getline -- Looking for getline - found -- Performing Test HAVE__thread -- Performing Test HAVE__thread - Success -- checking for __thread thread-local storage - found -- Configuring incomplete, errors occurred! See also "/home/ubuntu/petsc-3.6.3/arch-linux2-c-opt/externalpackages/metis-5.1.0-p1/build/CMakeFiles/CMakeOutput.log". See also "/home/ubuntu/petsc-3.6.3/arch-linux2-c-opt/externalpackages/metis-5.1.0-p1/build/CMakeFiles/CMakeError.log".CMake Error: your CXX compiler: "CMAKE_CXX_COMPILER-NOTFOUND" was not found. Please set CMAKE_CXX_COMPILER to a valid compiler path or name. ******************************************************************************* From Shuangshuang.Jin at pnnl.gov Mon Jan 11 13:15:27 2016 From: Shuangshuang.Jin at pnnl.gov (Jin, Shuangshuang) Date: Mon, 11 Jan 2016 19:15:27 +0000 Subject: [petsc-users] PETSC_i Message-ID: <71FF54182841B443932BB8F835FD98531A59ACC6@EX10MBOX02.pnnl.gov> Hi, I have the following codes (The version of Petsc installed on my machine has PetscScalar set to be complex number): double ival_re, ival_im; PetscScalar val; ... val = ival_re + PETSC_i * ival_im; I got a compilation error below: error: cannot convert 'std::complex' to 'PetscScalar {aka double}' in assignment Even if I set "val = 1.0 * PETSC_i;", the error stays the same. Can anyone help to evaluate the problem? Thanks, Shuangshuang -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Mon Jan 11 13:18:26 2016 From: jed at jedbrown.org (Jed Brown) Date: Mon, 11 Jan 2016 12:18:26 -0700 Subject: [petsc-users] PETSC_i In-Reply-To: <71FF54182841B443932BB8F835FD98531A59ACC6@EX10MBOX02.pnnl.gov> References: <71FF54182841B443932BB8F835FD98531A59ACC6@EX10MBOX02.pnnl.gov> Message-ID: <87ziwbx6v1.fsf@jedbrown.org> "Jin, Shuangshuang" writes: > Hi, I have the following codes (The version of Petsc installed on my machine has PetscScalar set to be complex number): Looks like you're trying to compile with a different PETSc. Check PETSC_DIR and PETSC_ARCH. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From Shuangshuang.Jin at pnnl.gov Mon Jan 11 13:26:16 2016 From: Shuangshuang.Jin at pnnl.gov (Jin, Shuangshuang) Date: Mon, 11 Jan 2016 19:26:16 +0000 Subject: [petsc-users] PETSC_i In-Reply-To: <87ziwbx6v1.fsf@jedbrown.org> References: <71FF54182841B443932BB8F835FD98531A59ACC6@EX10MBOX02.pnnl.gov> <87ziwbx6v1.fsf@jedbrown.org> Message-ID: <71FF54182841B443932BB8F835FD98531A59ACE3@EX10MBOX02.pnnl.gov> I didn't see anything wrong with my PETSC_DIR and PETSC_ARCH. Please see my setup below: setenv PETSC_DIR /pic/projects/software_new/petsc-3.6.0 setenv PETSC_ARCH linux-openmpi-gnu-cxx-complex-opt Thanks, Shuangshuang -----Original Message----- From: Jed Brown [mailto:jed at jedbrown.org] Sent: Monday, January 11, 2016 11:18 AM To: Jin, Shuangshuang; petsc-users at mcs.anl.gov Subject: Re: [petsc-users] PETSC_i "Jin, Shuangshuang" writes: > Hi, I have the following codes (The version of Petsc installed on my machine has PetscScalar set to be complex number): Looks like you're trying to compile with a different PETSc. Check PETSC_DIR and PETSC_ARCH. From balay at mcs.anl.gov Mon Jan 11 13:48:29 2016 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 11 Jan 2016 13:48:29 -0600 Subject: [petsc-users] PETSC_i In-Reply-To: <71FF54182841B443932BB8F835FD98531A59ACE3@EX10MBOX02.pnnl.gov> References: <71FF54182841B443932BB8F835FD98531A59ACC6@EX10MBOX02.pnnl.gov> <87ziwbx6v1.fsf@jedbrown.org> <71FF54182841B443932BB8F835FD98531A59ACE3@EX10MBOX02.pnnl.gov> Message-ID: Are you sure its a complex build? Please send us configure.log or make.log for this build. Also send us test.log for this build. Satish On Mon, 11 Jan 2016, Jin, Shuangshuang wrote: > I didn't see anything wrong with my PETSC_DIR and PETSC_ARCH. > > Please see my setup below: > > setenv PETSC_DIR /pic/projects/software_new/petsc-3.6.0 > setenv PETSC_ARCH linux-openmpi-gnu-cxx-complex-opt > > Thanks, > Shuangshuang > > -----Original Message----- > From: Jed Brown [mailto:jed at jedbrown.org] > Sent: Monday, January 11, 2016 11:18 AM > To: Jin, Shuangshuang; petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] PETSC_i > > "Jin, Shuangshuang" writes: > > > Hi, I have the following codes (The version of Petsc installed on my machine has PetscScalar set to be complex number): > > Looks like you're trying to compile with a different PETSc. Check PETSC_DIR and PETSC_ARCH. > From bsmith at mcs.anl.gov Mon Jan 11 13:50:33 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 11 Jan 2016 13:50:33 -0600 Subject: [petsc-users] PETSC_i In-Reply-To: <71FF54182841B443932BB8F835FD98531A59ACC6@EX10MBOX02.pnnl.gov> References: <71FF54182841B443932BB8F835FD98531A59ACC6@EX10MBOX02.pnnl.gov> Message-ID: <99A40BEA-136F-426D-A4E4-9969304C85A5@mcs.anl.gov> > On Jan 11, 2016, at 1:15 PM, Jin, Shuangshuang wrote: > > Hi, I have the following codes (The version of Petsc installed on my machine has PetscScalar set to be complex number): > > double ival_re, ival_im; > PetscScalar val; > ? > val = ival_re + PETSC_i * ival_im; > > I got a compilation error below: > > error: cannot convert 'std::complex' to 'PetscScalar {aka double}' in assignment For sure something is wrong. It definitely believes that PetscScalar is a double when it will be std::complex if all the ducks are in order. Did/does make test work after you installed PETSc? Barry > > Even if I set ?val = 1.0 * PETSC_i;?, the error stays the same. > > Can anyone help to evaluate the problem? > > Thanks, > Shuangshuang From Shuangshuang.Jin at pnnl.gov Mon Jan 11 14:59:00 2016 From: Shuangshuang.Jin at pnnl.gov (Jin, Shuangshuang) Date: Mon, 11 Jan 2016 20:59:00 +0000 Subject: [petsc-users] PETSC_i In-Reply-To: <99A40BEA-136F-426D-A4E4-9969304C85A5@mcs.anl.gov> References: <71FF54182841B443932BB8F835FD98531A59ACC6@EX10MBOX02.pnnl.gov> <99A40BEA-136F-426D-A4E4-9969304C85A5@mcs.anl.gov> Message-ID: <71FF54182841B443932BB8F835FD98531A59AD2C@EX10MBOX02.pnnl.gov> Thanks, I reinstalled the PETSc to make sure the PetscScalar type is complex, and it works fine now. Shuangshuang -----Original Message----- From: Barry Smith [mailto:bsmith at mcs.anl.gov] Sent: Monday, January 11, 2016 11:51 AM To: Jin, Shuangshuang Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] PETSC_i > On Jan 11, 2016, at 1:15 PM, Jin, Shuangshuang wrote: > > Hi, I have the following codes (The version of Petsc installed on my machine has PetscScalar set to be complex number): > > double ival_re, ival_im; > PetscScalar val; > ? > val = ival_re + PETSC_i * ival_im; > > I got a compilation error below: > > error: cannot convert 'std::complex' to 'PetscScalar {aka double}' in assignment For sure something is wrong. It definitely believes that PetscScalar is a double when it will be std::complex if all the ducks are in order. Did/does make test work after you installed PETSc? Barry > > Even if I set ?val = 1.0 * PETSC_i;?, the error stays the same. > > Can anyone help to evaluate the problem? > > Thanks, > Shuangshuang From gideon.simpson at gmail.com Mon Jan 11 15:26:35 2016 From: gideon.simpson at gmail.com (Gideon Simpson) Date: Mon, 11 Jan 2016 16:26:35 -0500 Subject: [petsc-users] SNES norm control Message-ID: <12AEF269-EF0A-4330-8924-0B3A02C234BB@gmail.com> I?m solving nonlinear problem for a complex valued function which is decomposed into real and imaginary parts, Q = u + i v. What I?m finding is that where |Q| is small, the numerical phase errors tend to be larger. I suspect this is because it?s using the 2-norm for convergence in the SNES, so, where the solution is already, the phase errors are seen as small too. Is there a way to use something more like an infinity norm with SNES, to get more point wise control? -gideon -------------- next part -------------- An HTML attachment was scrubbed... URL: From ckhuangf at gmail.com Mon Jan 11 20:53:43 2016 From: ckhuangf at gmail.com (Chung-Kan Huang) Date: Mon, 11 Jan 2016 20:53:43 -0600 Subject: [petsc-users] KSPConvergedReason = KSP_CONVERGED_ITERATING Message-ID: Hi, I am encountering KSPSolve hanging with one process finished KSPSolve reporting KSPConvergedReason = KSP_CONVERGED_ITERATING while other processes stuck in KSPSolve. The problem is not seen when code was compiled in debug mode and problem only appears after more than 10 hours of run time with production mode. Can anyone suggest how I can do to debug this case? Thanks, Ken -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Jan 11 21:03:32 2016 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 11 Jan 2016 21:03:32 -0600 Subject: [petsc-users] KSPConvergedReason = KSP_CONVERGED_ITERATING In-Reply-To: References: Message-ID: On Mon, Jan 11, 2016 at 8:53 PM, Chung-Kan Huang wrote: > > Hi, > > I am encountering KSPSolve hanging with one process finished > KSPSolve reporting KSPConvergedReason = KSP_CONVERGED_ITERATING while other > processes stuck in KSPSolve. > > The problem is not seen when code was compiled in debug mode and problem > only appears after more than 10 hours of run time with production mode. > > Can anyone suggest how I can do to debug this case? > Can you send the complete solver being used? Something like the output of -ksp_view. Are you using a custom PC? Matt > Thanks, > > Ken > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Mon Jan 11 22:45:19 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 11 Jan 2016 22:45:19 -0600 Subject: [petsc-users] KSPConvergedReason = KSP_CONVERGED_ITERATING In-Reply-To: References: Message-ID: <14100ECD-1510-4DD5-8580-3298948504A5@mcs.anl.gov> Hmm, KSPSolve() should never complete with a KSP_CONVERGED_ITERATING so something is definitely not going well. Have you run your code with valgrind to make sure there is not some subtle memory bug that only rears its ugly head after a great deal of time? http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind I would do this first. It is possible to use the -g flag even with optimized builds to get debug symbols even with optimization (in fact that is our new default) so depending on the machine you are running on and how many MPI processes you use it could be possible to simply run the run in the debugger (-start_in_debugger) and then come back the next day when one process returns and the others hang then control c the other processes and see where they are in the code. Barry > On Jan 11, 2016, at 8:53 PM, Chung-Kan Huang wrote: > > > Hi, > > I am encountering KSPSolve hanging with one process finished KSPSolve reporting KSPConvergedReason = KSP_CONVERGED_ITERATING while other processes stuck in KSPSolve. > > The problem is not seen when code was compiled in debug mode and problem only appears after more than 10 hours of run time with production mode. > > Can anyone suggest how I can do to debug this case? > > Thanks, > > Ken > From bsmith at mcs.anl.gov Mon Jan 11 23:14:49 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 11 Jan 2016 23:14:49 -0600 Subject: [petsc-users] SNES norm control In-Reply-To: <12AEF269-EF0A-4330-8924-0B3A02C234BB@gmail.com> References: <12AEF269-EF0A-4330-8924-0B3A02C234BB@gmail.com> Message-ID: <51EEA1EB-19FE-4F69-A4E9-EA8FCE323F42@mcs.anl.gov> You can use SNESSetConvergenceTest() to use whatever test you want to decide on convergence. Barry > On Jan 11, 2016, at 3:26 PM, Gideon Simpson wrote: > > I?m solving nonlinear problem for a complex valued function which is decomposed into real and imaginary parts, Q = u + i v. What I?m finding is that where |Q| is small, the numerical phase errors tend to be larger. I suspect this is because it?s using the 2-norm for convergence in the SNES, so, where the solution is already, the phase errors are seen as small too. Is there a way to use something more like an infinity norm with SNES, to get more point wise control? > > -gideon > From jed at jedbrown.org Tue Jan 12 00:04:51 2016 From: jed at jedbrown.org (Jed Brown) Date: Mon, 11 Jan 2016 23:04:51 -0700 Subject: [petsc-users] METIS without C++ compiler In-Reply-To: <5693CD36.10205@gmail.com> References: <5693CD36.10205@gmail.com> Message-ID: <87k2nfwcxo.fsf@jedbrown.org> Tabrez Ali writes: > Hello > > I just wanted to point that configure fails when "--with-metis=1 > --download-metis=1" options are used and a C++ compiler is not installed. > > After changing "project(METIS)" to "project(METIS C)" in > petsc-3.6.3/arch-linux2-c-opt/externalpackages/metis-5.1.0-p1/CMakeLists.txt > it works alright. Thanks, it looks like ParMETIS needs this too. I've pushed this change to the repositories for each. Satish can bump the patch number and point PETSc to the new version. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From gideon.simpson at gmail.com Tue Jan 12 07:14:53 2016 From: gideon.simpson at gmail.com (Gideon Simpson) Date: Tue, 12 Jan 2016 08:14:53 -0500 Subject: [petsc-users] SNES norm control In-Reply-To: <51EEA1EB-19FE-4F69-A4E9-EA8FCE323F42@mcs.anl.gov> References: <12AEF269-EF0A-4330-8924-0B3A02C234BB@gmail.com> <51EEA1EB-19FE-4F69-A4E9-EA8FCE323F42@mcs.anl.gov> Message-ID: <4C6FAD00-73E1-4435-940D-6482AB5DDA54@gmail.com> That seems to to allow for me to cook up a convergence test in terms of the 2 norm. What I?m really looking for is the ability to change things to be something like the 2 norm of the vector with elements F_i/|x_i| where I am looking for a root of F(x). I can just build that scaling into the form function, but is there a way to do it without rewriting that piece of the code? -gideon > On Jan 12, 2016, at 12:14 AM, Barry Smith wrote: > > > You can use SNESSetConvergenceTest() to use whatever test you want to decide on convergence. > > Barry > >> On Jan 11, 2016, at 3:26 PM, Gideon Simpson wrote: >> >> I?m solving nonlinear problem for a complex valued function which is decomposed into real and imaginary parts, Q = u + i v. What I?m finding is that where |Q| is small, the numerical phase errors tend to be larger. I suspect this is because it?s using the 2-norm for convergence in the SNES, so, where the solution is already, the phase errors are seen as small too. Is there a way to use something more like an infinity norm with SNES, to get more point wise control? >> >> -gideon >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue Jan 12 07:24:13 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 12 Jan 2016 07:24:13 -0600 Subject: [petsc-users] SNES norm control In-Reply-To: <4C6FAD00-73E1-4435-940D-6482AB5DDA54@gmail.com> References: <12AEF269-EF0A-4330-8924-0B3A02C234BB@gmail.com> <51EEA1EB-19FE-4F69-A4E9-EA8FCE323F42@mcs.anl.gov> <4C6FAD00-73E1-4435-940D-6482AB5DDA54@gmail.com> Message-ID: <5B2AC30A-FF15-4FA5-B0E1-6FA5E5975FF2@mcs.anl.gov> > On Jan 12, 2016, at 7:14 AM, Gideon Simpson wrote: > > That seems to to allow for me to cook up a convergence test in terms of the 2 norm. No, why just the two norm? You can put whatever tests you want into your convergence test, including looking at F_i/|x_i| if you want. You need to call SNESGetSolution() and SNESGetFunction() from within your test routine to get the vectors you want to look at. Barry > What I?m really looking for is the ability to change things to be something like the 2 norm of the vector with elements > > F_i/|x_i| > > where I am looking for a root of F(x). I can just build that scaling into the form function, but is there a way to do it without rewriting that piece of the code? > > > -gideon > >> On Jan 12, 2016, at 12:14 AM, Barry Smith wrote: >> >> >> You can use SNESSetConvergenceTest() to use whatever test you want to decide on convergence. >> >> Barry >> >>> On Jan 11, 2016, at 3:26 PM, Gideon Simpson wrote: >>> >>> I?m solving nonlinear problem for a complex valued function which is decomposed into real and imaginary parts, Q = u + i v. What I?m finding is that where |Q| is small, the numerical phase errors tend to be larger. I suspect this is because it?s using the 2-norm for convergence in the SNES, so, where the solution is already, the phase errors are seen as small too. Is there a way to use something more like an infinity norm with SNES, to get more point wise control? >>> >>> -gideon >>> >> > From dave.mayhem23 at gmail.com Tue Jan 12 07:24:29 2016 From: dave.mayhem23 at gmail.com (Dave May) Date: Tue, 12 Jan 2016 14:24:29 +0100 Subject: [petsc-users] SNES norm control In-Reply-To: <4C6FAD00-73E1-4435-940D-6482AB5DDA54@gmail.com> References: <12AEF269-EF0A-4330-8924-0B3A02C234BB@gmail.com> <51EEA1EB-19FE-4F69-A4E9-EA8FCE323F42@mcs.anl.gov> <4C6FAD00-73E1-4435-940D-6482AB5DDA54@gmail.com> Message-ID: On 12 January 2016 at 14:14, Gideon Simpson wrote: > That seems to to allow for me to cook up a convergence test in terms of > the 2 norm. > While you are only provided the 2 norm of F, you are also given access to the SNES object. Thus inside your user convergence test function, you can call SNESGetFunction() and SNESGetSolution(), then you can compute your convergence criteria and set the converged reason to what ever you want. See http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetFunction.html http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetSolution.html Cheers, Dave > What I?m really looking for is the ability to change things to be > something like the 2 norm of the vector with elements > > F_i/|x_i| > > where I am looking for a root of F(x). I can just build that scaling into > the form function, but is there a way to do it without rewriting that piece > of the code? > > > -gideon > > On Jan 12, 2016, at 12:14 AM, Barry Smith wrote: > > > You can use SNESSetConvergenceTest() to use whatever test you want to > decide on convergence. > > Barry > > On Jan 11, 2016, at 3:26 PM, Gideon Simpson > wrote: > > I?m solving nonlinear problem for a complex valued function which is > decomposed into real and imaginary parts, Q = u + i v. What I?m finding is > that where |Q| is small, the numerical phase errors tend to be larger. I > suspect this is because it?s using the 2-norm for convergence in the SNES, > so, where the solution is already, the phase errors are seen as small too. > Is there a way to use something more like an infinity norm with SNES, to > get more point wise control? > > -gideon > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gideon.simpson at gmail.com Tue Jan 12 07:33:00 2016 From: gideon.simpson at gmail.com (Gideon Simpson) Date: Tue, 12 Jan 2016 08:33:00 -0500 Subject: [petsc-users] SNES norm control In-Reply-To: References: <12AEF269-EF0A-4330-8924-0B3A02C234BB@gmail.com> <51EEA1EB-19FE-4F69-A4E9-EA8FCE323F42@mcs.anl.gov> <4C6FAD00-73E1-4435-940D-6482AB5DDA54@gmail.com> Message-ID: <6D8EB106-8809-409B-BC61-BEEBEF8AF820@gmail.com> I?m just a bit confused by the documentation for SNESConvergenceTestFunction. the arguments for the xnorm, gnorm, and f are passed in, at the current iterate, correct? I interpreted this as though I had to build by convergence test based on those values. -gideon > On Jan 12, 2016, at 8:24 AM, Dave May wrote: > > > > On 12 January 2016 at 14:14, Gideon Simpson > wrote: > That seems to to allow for me to cook up a convergence test in terms of the 2 norm. > > While you are only provided the 2 norm of F, you are also given access to the SNES object. Thus inside your user convergence test function, you can call SNESGetFunction() and SNESGetSolution(), then you can compute your convergence criteria and set the converged reason to what ever you want. > > See > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetFunction.html > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetSolution.html > > Cheers, > Dave > > > > > What I?m really looking for is the ability to change things to be something like the 2 norm of the vector with elements > > F_i/|x_i| > > where I am looking for a root of F(x). I can just build that scaling into the form function, but is there a way to do it without rewriting that piece of the code? > > > -gideon > >> On Jan 12, 2016, at 12:14 AM, Barry Smith > wrote: >> >> >> You can use SNESSetConvergenceTest() to use whatever test you want to decide on convergence. >> >> Barry >> >>> On Jan 11, 2016, at 3:26 PM, Gideon Simpson > wrote: >>> >>> I?m solving nonlinear problem for a complex valued function which is decomposed into real and imaginary parts, Q = u + i v. What I?m finding is that where |Q| is small, the numerical phase errors tend to be larger. I suspect this is because it?s using the 2-norm for convergence in the SNES, so, where the solution is already, the phase errors are seen as small too. Is there a way to use something more like an infinity norm with SNES, to get more point wise control? >>> >>> -gideon >>> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.mayhem23 at gmail.com Tue Jan 12 07:37:16 2016 From: dave.mayhem23 at gmail.com (Dave May) Date: Tue, 12 Jan 2016 14:37:16 +0100 Subject: [petsc-users] SNES norm control In-Reply-To: <6D8EB106-8809-409B-BC61-BEEBEF8AF820@gmail.com> References: <12AEF269-EF0A-4330-8924-0B3A02C234BB@gmail.com> <51EEA1EB-19FE-4F69-A4E9-EA8FCE323F42@mcs.anl.gov> <4C6FAD00-73E1-4435-940D-6482AB5DDA54@gmail.com> <6D8EB106-8809-409B-BC61-BEEBEF8AF820@gmail.com> Message-ID: On 12 January 2016 at 14:33, Gideon Simpson wrote: > I?m just a bit confused by the documentation > for SNESConvergenceTestFunction. the arguments for the xnorm, gnorm, and f > are passed in, at the current iterate, correct? > Yes, but nothing requires you to use them :D > I interpreted this as though I had to build by convergence test based on > those values. > This is a misinterpretation. You can ignore all of xnorm, gnorm and fnorm and define any crazy stopping condition you like. xnorm, gnorm and fnorm are commonly required for many stopping conditions and are computed by the snes methods. As such, are readily available and for efficiency and convenience they are provided to the user (e.g. to avoid you having to re-compute norms). Cheers, Dave > > -gideon > > On Jan 12, 2016, at 8:24 AM, Dave May wrote: > > > > On 12 January 2016 at 14:14, Gideon Simpson > wrote: > >> That seems to to allow for me to cook up a convergence test in terms of >> the 2 norm. >> > > While you are only provided the 2 norm of F, you are also given access to > the SNES object. Thus inside your user convergence test function, you can > call SNESGetFunction() and SNESGetSolution(), then you can compute your > convergence criteria and set the converged reason to what ever you want. > > See > > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetFunction.html > > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetSolution.html > > Cheers, > Dave > > > > > >> What I?m really looking for is the ability to change things to be >> something like the 2 norm of the vector with elements >> >> F_i/|x_i| >> >> where I am looking for a root of F(x). I can just build that scaling >> into the form function, but is there a way to do it without rewriting that >> piece of the code? >> >> >> -gideon >> >> On Jan 12, 2016, at 12:14 AM, Barry Smith wrote: >> >> >> You can use SNESSetConvergenceTest() to use whatever test you want to >> decide on convergence. >> >> Barry >> >> On Jan 11, 2016, at 3:26 PM, Gideon Simpson >> wrote: >> >> I?m solving nonlinear problem for a complex valued function which is >> decomposed into real and imaginary parts, Q = u + i v. What I?m finding is >> that where |Q| is small, the numerical phase errors tend to be larger. I >> suspect this is because it?s using the 2-norm for convergence in the SNES, >> so, where the solution is already, the phase errors are seen as small too. >> Is there a way to use something more like an infinity norm with SNES, to >> get more point wise control? >> >> -gideon >> >> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gideon.simpson at gmail.com Tue Jan 12 08:06:38 2016 From: gideon.simpson at gmail.com (gideon.simpson at gmail.com) Date: Tue, 12 Jan 2016 09:06:38 -0500 Subject: [petsc-users] SNES norm control In-Reply-To: References: <12AEF269-EF0A-4330-8924-0B3A02C234BB@gmail.com> <51EEA1EB-19FE-4F69-A4E9-EA8FCE323F42@mcs.anl.gov> <4C6FAD00-73E1-4435-940D-6482AB5DDA54@gmail.com> <6D8EB106-8809-409B-BC61-BEEBEF8AF820@gmail.com> Message-ID: Do I have to manually code in the divergence criteria too? > On Jan 12, 2016, at 8:37 AM, Dave May wrote: > > > >> On 12 January 2016 at 14:33, Gideon Simpson wrote: >> I?m just a bit confused by the documentation for SNESConvergenceTestFunction. the arguments for the xnorm, gnorm, and f are passed in, at the current iterate, correct? > > Yes, but nothing requires you to use them :D > >> I interpreted this as though I had to build by convergence test based on those values. > > This is a misinterpretation. You can ignore all of xnorm, gnorm and fnorm and define any crazy stopping condition you like. > > xnorm, gnorm and fnorm are commonly required for many stopping conditions and are computed by the snes methods. As such, are readily available and for efficiency and convenience they are provided to the user (e.g. to avoid you having to re-compute norms). > > Cheers, > Dave > >> >> -gideon >> >>> On Jan 12, 2016, at 8:24 AM, Dave May wrote: >>> >>> >>> >>> On 12 January 2016 at 14:14, Gideon Simpson wrote: >>>> That seems to to allow for me to cook up a convergence test in terms of the 2 norm. >>> >>> While you are only provided the 2 norm of F, you are also given access to the SNES object. Thus inside your user convergence test function, you can call SNESGetFunction() and SNESGetSolution(), then you can compute your convergence criteria and set the converged reason to what ever you want. >>> >>> See >>> >>> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetFunction.html >>> >>> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetSolution.html >>> >>> Cheers, >>> Dave >>> >>> >>> >>> >>>> What I?m really looking for is the ability to change things to be something like the 2 norm of the vector with elements >>>> >>>> F_i/|x_i| >>>> >>>> where I am looking for a root of F(x). I can just build that scaling into the form function, but is there a way to do it without rewriting that piece of the code? >>>> >>>> >>>> -gideon >>>> >>>>> On Jan 12, 2016, at 12:14 AM, Barry Smith wrote: >>>>> >>>>> >>>>> You can use SNESSetConvergenceTest() to use whatever test you want to decide on convergence. >>>>> >>>>> Barry >>>>> >>>>>> On Jan 11, 2016, at 3:26 PM, Gideon Simpson wrote: >>>>>> >>>>>> I?m solving nonlinear problem for a complex valued function which is decomposed into real and imaginary parts, Q = u + i v. What I?m finding is that where |Q| is small, the numerical phase errors tend to be larger. I suspect this is because it?s using the 2-norm for convergence in the SNES, so, where the solution is already, the phase errors are seen as small too. Is there a way to use something more like an infinity norm with SNES, to get more point wise control? >>>>>> >>>>>> -gideon > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.mayhem23 at gmail.com Tue Jan 12 08:17:23 2016 From: dave.mayhem23 at gmail.com (Dave May) Date: Tue, 12 Jan 2016 15:17:23 +0100 Subject: [petsc-users] SNES norm control In-Reply-To: References: <12AEF269-EF0A-4330-8924-0B3A02C234BB@gmail.com> <51EEA1EB-19FE-4F69-A4E9-EA8FCE323F42@mcs.anl.gov> <4C6FAD00-73E1-4435-940D-6482AB5DDA54@gmail.com> <6D8EB106-8809-409B-BC61-BEEBEF8AF820@gmail.com> Message-ID: On 12 January 2016 at 15:06, wrote: > Do I have to manually code in the divergence criteria too? > Yes. By calling SNESSetConvergenceTest() you are replacing the default SNES convergence test function which will get called at each SNES iteration, therefore you are responsible for defining all reasons for convergence and divergence. To make life easy, you could copy everything in the funciton SNESConvergedDefault(), http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESConvergedDefault.html#SNESConvergedDefault and just replace the rule for SNES_CONVERGED_FNORM_RELATIVE with your custom scaled stopping condition. > > On Jan 12, 2016, at 8:37 AM, Dave May wrote: > > > > On 12 January 2016 at 14:33, Gideon Simpson > wrote: > >> I?m just a bit confused by the documentation >> for SNESConvergenceTestFunction. the arguments for the xnorm, gnorm, and f >> are passed in, at the current iterate, correct? >> > > Yes, but nothing requires you to use them :D > > >> I interpreted this as though I had to build by convergence test based on >> those values. >> > > This is a misinterpretation. You can ignore all of xnorm, gnorm and fnorm > and define any crazy stopping condition you like. > > xnorm, gnorm and fnorm are commonly required for many stopping conditions > and are computed by the snes methods. As such, are readily available and > for efficiency and convenience they are provided to the user (e.g. to avoid > you having to re-compute norms). > > Cheers, > Dave > > >> >> -gideon >> >> On Jan 12, 2016, at 8:24 AM, Dave May wrote: >> >> >> >> On 12 January 2016 at 14:14, Gideon Simpson >> wrote: >> >>> That seems to to allow for me to cook up a convergence test in terms of >>> the 2 norm. >>> >> >> While you are only provided the 2 norm of F, you are also given access to >> the SNES object. Thus inside your user convergence test function, you can >> call SNESGetFunction() and SNESGetSolution(), then you can compute your >> convergence criteria and set the converged reason to what ever you want. >> >> See >> >> >> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetFunction.html >> >> >> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetSolution.html >> >> Cheers, >> Dave >> >> >> >> >> >>> What I?m really looking for is the ability to change things to be >>> something like the 2 norm of the vector with elements >>> >>> F_i/|x_i| >>> >>> where I am looking for a root of F(x). I can just build that scaling >>> into the form function, but is there a way to do it without rewriting that >>> piece of the code? >>> >>> >>> -gideon >>> >>> On Jan 12, 2016, at 12:14 AM, Barry Smith wrote: >>> >>> >>> You can use SNESSetConvergenceTest() to use whatever test you want to >>> decide on convergence. >>> >>> Barry >>> >>> On Jan 11, 2016, at 3:26 PM, Gideon Simpson >>> wrote: >>> >>> I?m solving nonlinear problem for a complex valued function which is >>> decomposed into real and imaginary parts, Q = u + i v. What I?m finding is >>> that where |Q| is small, the numerical phase errors tend to be larger. I >>> suspect this is because it?s using the 2-norm for convergence in the SNES, >>> so, where the solution is already, the phase errors are seen as small too. >>> Is there a way to use something more like an infinity norm with SNES, to >>> get more point wise control? >>> >>> -gideon >>> >>> >>> >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From borisbou at buffalo.edu Tue Jan 12 09:37:50 2016 From: borisbou at buffalo.edu (Boris Boutkov) Date: Tue, 12 Jan 2016 10:37:50 -0500 Subject: [petsc-users] Providing context to DMShell Message-ID: <56951DCE.2060401@buffalo.edu> Hello All, I'm trying to attach a context to the DMShell similarly to how a context is passed into SNES through the SetFunction routine. Specifically, I'm looking to provide my own interpolation routine to both DMShellSetCreateInterpolationand Injection, which requires some user data from my environment. Ive tried searching around the _p_DM*struct looking for somewhere to attach this data but found no convenient way, any pointers to how I could achieve this would be appreciated. Thanks for your time, Boris Boutkov From lawrence.mitchell at imperial.ac.uk Tue Jan 12 09:44:01 2016 From: lawrence.mitchell at imperial.ac.uk (Lawrence Mitchell) Date: Tue, 12 Jan 2016 15:44:01 +0000 Subject: [petsc-users] Providing context to DMShell In-Reply-To: <56951DCE.2060401@buffalo.edu> References: <56951DCE.2060401@buffalo.edu> Message-ID: <56951F41.2080007@imperial.ac.uk> On 12/01/16 15:37, Boris Boutkov wrote: > Hello All, > > I'm trying to attach a context to the DMShell similarly to how a context > is passed into SNES through the SetFunction routine. Specifically, I'm > looking to provide my own interpolation routine to both > DMShellSetCreateInterpolationand Injection, which requires some user > data from my environment. Ive tried searching around the _p_DM*struct > looking for somewhere to attach this data but found no convenient way, > any pointers to how I could achieve this would be appreciated. I think you want to do: DMSetApplicationContext(dm, user_context); Inside your interpolation routine you can then use: PetscErrorCode interpolate(DM coarse, DM fine, Mat *m, Vec *v) { ... DMGetApplicationContext(coarse, *ctx); ... } Cheers, Lawrence -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 490 bytes Desc: OpenPGP digital signature URL: From balay at mcs.anl.gov Tue Jan 12 11:18:51 2016 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 12 Jan 2016 11:18:51 -0600 Subject: [petsc-users] METIS without C++ compiler In-Reply-To: <87k2nfwcxo.fsf@jedbrown.org> References: <5693CD36.10205@gmail.com> <87k2nfwcxo.fsf@jedbrown.org> Message-ID: On Tue, 12 Jan 2016, Jed Brown wrote: > Tabrez Ali writes: > > > Hello > > > > I just wanted to point that configure fails when "--with-metis=1 > > --download-metis=1" options are used and a C++ compiler is not installed. > > > > After changing "project(METIS)" to "project(METIS C)" in > > petsc-3.6.3/arch-linux2-c-opt/externalpackages/metis-5.1.0-p1/CMakeLists.txt > > it works alright. > > Thanks, it looks like ParMETIS needs this too. I've pushed this change > to the repositories for each. Satish can bump the patch number and > point PETSc to the new version. > spun the new patched tarballs - and added the change in 'balay/to-maint-metis-parmetis-nocxx' and merged to next - for now. Satish From kshyatt at physics.ucsb.edu Tue Jan 12 15:20:05 2016 From: kshyatt at physics.ucsb.edu (Katharine Hyatt) Date: Tue, 12 Jan 2016 13:20:05 -0800 Subject: [petsc-users] HDF5Viewer only on worker 0? Message-ID: <43101E79-FA47-4546-A3C7-88916DDDF023@physics.ucsb.edu> Hello, I?m trying to use PETsc?s HDF5Viewers on a system that doesn?t support parallel HDF5. When I tried naively using PetscViewer hdf5viewer; PetscViewerHDF5Open( PETSC_COMM_WORLD, filename, FILE_MODE_WRITE, &hdf5viewer); I get a segfault because ADIOI can?t lock. So I switched to using the binary format, which routes everything through one CPU. Then my job can output successfully. But I would like to use HDF5 without any intermediate steps, and reading the documentation it was unclear to me if it is possible to ask for behavior similar to the binary viewers from the HDF5 ones - everyone sends their information to worker 0, who then does single-process I/O. Is this possible? Thanks, Katharine From mfadams at lbl.gov Tue Jan 12 17:48:36 2016 From: mfadams at lbl.gov (Mark Adams) Date: Tue, 12 Jan 2016 15:48:36 -0800 Subject: [petsc-users] osx configuration error Message-ID: I did nuke the arch directory. This has worked in the past and don't know what I might have changed. it's been awhile since I've reconfigured. Thanks, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 339545 bytes Desc: not available URL: From bsmith at mcs.anl.gov Tue Jan 12 18:30:13 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 12 Jan 2016 18:30:13 -0600 Subject: [petsc-users] HDF5Viewer only on worker 0? In-Reply-To: <43101E79-FA47-4546-A3C7-88916DDDF023@physics.ucsb.edu> References: <43101E79-FA47-4546-A3C7-88916DDDF023@physics.ucsb.edu> Message-ID: <9D1A2757-BAD2-4638-9FFB-92608570F017@mcs.anl.gov> Katherine, Assuming the vectors are not so large that the entire thing cannot fit on the first process you could do something like VecScatterCreateToZero(vec, &scatter,&veczero); VecScatterBegin/End(scatter,vec,veczero); if (!rank) { > PetscViewer hdf5viewer; > PetscViewerHDF5Open( PETSC_COMM_SELF filename, FILE_MODE_WRITE, &hdf5viewer); VecView(vzero,hdf5viewer); } Not that if your vec came from a DMDA then you need to first do a DMDAGlobalToNaturalBegin/End() to get a vector in the right ordering to pass to VecScatterCreateToZero(). On the other hand if the vectors are enormous and cannot fit on one process it would be more involved. Essentially you would need to copy VecView_MPI_Binary() and modify it to write out to HDF a part at a time instead of the binary format it does now. Barry > On Jan 12, 2016, at 3:20 PM, Katharine Hyatt wrote: > > Hello, > > I?m trying to use PETsc?s HDF5Viewers on a system that doesn?t support parallel HDF5. When I tried naively using > > PetscViewer hdf5viewer; > PetscViewerHDF5Open( PETSC_COMM_WORLD, filename, FILE_MODE_WRITE, &hdf5viewer); > > I get a segfault because ADIOI can?t lock. So I switched to using the binary format, which routes everything through one CPU. Then my job can output successfully. But I would like to use HDF5 without any intermediate steps, and reading the documentation it was unclear to me if it is possible to ask for behavior similar to the binary viewers from the HDF5 ones - everyone sends their information to worker 0, who then does single-process I/O. Is this possible? > > Thanks, > Katharine From balay at mcs.anl.gov Tue Jan 12 18:31:49 2016 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 12 Jan 2016 18:31:49 -0600 Subject: [petsc-users] osx configuration error In-Reply-To: References: Message-ID: > 'file' object has no attribute 'getvalue' File "/Users/markadams/Codes/petsc/config/configure.py", line 363, in petsc_configure Hm - have to figure this one out - but the primary issue is: > stderr: > gfortran: warning: couldn't understand kern.osversion '15.2.0 > ld: -rpath can only be used when targeting Mac OS X 10.5 or later Perhaps you've updated xcode or OSX - but did not reinstall brew/gfortran. > Executing: mpif90 --version > stdout: > GNU Fortran (Homebrew gcc 4.9.1) 4.9.1 I suggest uninstalling/reinstalling homebrew packages. Satish On Tue, 12 Jan 2016, Mark Adams wrote: > I did nuke the arch directory. This has worked in the past and don't know > what I might have changed. it's been awhile since I've reconfigured. > Thanks, > Mark > From gideon.simpson at gmail.com Tue Jan 12 20:19:26 2016 From: gideon.simpson at gmail.com (Gideon Simpson) Date: Tue, 12 Jan 2016 21:19:26 -0500 Subject: [petsc-users] SNES norm control In-Reply-To: References: <12AEF269-EF0A-4330-8924-0B3A02C234BB@gmail.com> <51EEA1EB-19FE-4F69-A4E9-EA8FCE323F42@mcs.anl.gov> <4C6FAD00-73E1-4435-940D-6482AB5DDA54@gmail.com> <6D8EB106-8809-409B-BC61-BEEBEF8AF820@gmail.com> Message-ID: Got it. I?m trying to build up my desired convergence test, based on the default routine. I?m getting the following compiler error, which I don?t entirely understand: blowup_utils.c:180:9: error: incomplete definition of type 'struct _p_SNES' snes->ttol = fnorm_scaled*snes->rtol; ~~~~^ /opt/petsc/include/petscsnes.h:20:16: note: forward declaration of 'struct _p_SNES' typedef struct _p_SNES* SNES; Separately, is there a way to get the step vector? -gideon > On Jan 12, 2016, at 9:17 AM, Dave May wrote: > > > > On 12 January 2016 at 15:06, > wrote: > Do I have to manually code in the divergence criteria too? > > Yes. > > By calling SNESSetConvergenceTest() you are replacing the default SNES convergence test function which will get called at each SNES iteration, therefore you are responsible for defining all reasons for convergence and divergence. > > To make life easy, you could copy everything in the funciton SNESConvergedDefault(), > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESConvergedDefault.html#SNESConvergedDefault > > and just replace the rule for > SNES_CONVERGED_FNORM_RELATIVE > with your custom scaled stopping condition. > > > > > > > On Jan 12, 2016, at 8:37 AM, Dave May > wrote: > >> >> >> On 12 January 2016 at 14:33, Gideon Simpson > wrote: >> I?m just a bit confused by the documentation for SNESConvergenceTestFunction. the arguments for the xnorm, gnorm, and f are passed in, at the current iterate, correct? >> >> Yes, but nothing requires you to use them :D >> >> I interpreted this as though I had to build by convergence test based on those values. >> >> This is a misinterpretation. You can ignore all of xnorm, gnorm and fnorm and define any crazy stopping condition you like. >> >> xnorm, gnorm and fnorm are commonly required for many stopping conditions and are computed by the snes methods. As such, are readily available and for efficiency and convenience they are provided to the user (e.g. to avoid you having to re-compute norms). >> >> Cheers, >> Dave >> >> >> -gideon >> >>> On Jan 12, 2016, at 8:24 AM, Dave May > wrote: >>> >>> >>> >>> On 12 January 2016 at 14:14, Gideon Simpson > wrote: >>> That seems to to allow for me to cook up a convergence test in terms of the 2 norm. >>> >>> While you are only provided the 2 norm of F, you are also given access to the SNES object. Thus inside your user convergence test function, you can call SNESGetFunction() and SNESGetSolution(), then you can compute your convergence criteria and set the converged reason to what ever you want. >>> >>> See >>> >>> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetFunction.html >>> >>> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetSolution.html >>> >>> Cheers, >>> Dave >>> >>> >>> >>> >>> What I?m really looking for is the ability to change things to be something like the 2 norm of the vector with elements >>> >>> F_i/|x_i| >>> >>> where I am looking for a root of F(x). I can just build that scaling into the form function, but is there a way to do it without rewriting that piece of the code? >>> >>> >>> -gideon >>> >>>> On Jan 12, 2016, at 12:14 AM, Barry Smith > wrote: >>>> >>>> >>>> You can use SNESSetConvergenceTest() to use whatever test you want to decide on convergence. >>>> >>>> Barry >>>> >>>>> On Jan 11, 2016, at 3:26 PM, Gideon Simpson > wrote: >>>>> >>>>> I?m solving nonlinear problem for a complex valued function which is decomposed into real and imaginary parts, Q = u + i v. What I?m finding is that where |Q| is small, the numerical phase errors tend to be larger. I suspect this is because it?s using the 2-norm for convergence in the SNES, so, where the solution is already, the phase errors are seen as small too. Is there a way to use something more like an infinity norm with SNES, to get more point wise control? >>>>> >>>>> -gideon >>>>> >>>> >>> >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue Jan 12 21:55:27 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 12 Jan 2016 21:55:27 -0600 Subject: [petsc-users] SNES norm control In-Reply-To: References: <12AEF269-EF0A-4330-8924-0B3A02C234BB@gmail.com> <51EEA1EB-19FE-4F69-A4E9-EA8FCE323F42@mcs.anl.gov> <4C6FAD00-73E1-4435-940D-6482AB5DDA54@gmail.com> <6D8EB106-8809-409B-BC61-BEEBEF8AF820@gmail.com> Message-ID: <140D0B19-E93F-4D7D-9EDD-D0B6CB060656@mcs.anl.gov> > On Jan 12, 2016, at 8:19 PM, Gideon Simpson wrote: > > Got it. I?m trying to build up my desired convergence test, based on the default routine. I?m getting the following compiler error, which I don?t entirely understand: > > blowup_utils.c:180:9: error: > incomplete definition of type 'struct _p_SNES' > snes->ttol = fnorm_scaled*snes->rtol; > ~~~~^ > /opt/petsc/include/petscsnes.h:20:16: note: forward declaration of > 'struct _p_SNES' > typedef struct _p_SNES* SNES; Since you are accessing the internals of SNES you need to include > > Separately, is there a way to get the step vector? SNESGetSolutionUpdate() > > -gideon > >> On Jan 12, 2016, at 9:17 AM, Dave May wrote: >> >> >> >> On 12 January 2016 at 15:06, wrote: >> Do I have to manually code in the divergence criteria too? >> >> Yes. >> >> By calling SNESSetConvergenceTest() you are replacing the default SNES convergence test function which will get called at each SNES iteration, therefore you are responsible for defining all reasons for convergence and divergence. >> >> To make life easy, you could copy everything in the funciton SNESConvergedDefault(), >> >> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESConvergedDefault.html#SNESConvergedDefault >> >> and just replace the rule for >> SNES_CONVERGED_FNORM_RELATIVE >> with your custom scaled stopping condition. >> >> >> >> >> >> >> On Jan 12, 2016, at 8:37 AM, Dave May wrote: >> >>> >>> >>> On 12 January 2016 at 14:33, Gideon Simpson wrote: >>> I?m just a bit confused by the documentation for SNESConvergenceTestFunction. the arguments for the xnorm, gnorm, and f are passed in, at the current iterate, correct? >>> >>> Yes, but nothing requires you to use them :D >>> >>> I interpreted this as though I had to build by convergence test based on those values. >>> >>> This is a misinterpretation. You can ignore all of xnorm, gnorm and fnorm and define any crazy stopping condition you like. >>> >>> xnorm, gnorm and fnorm are commonly required for many stopping conditions and are computed by the snes methods. As such, are readily available and for efficiency and convenience they are provided to the user (e.g. to avoid you having to re-compute norms). >>> >>> Cheers, >>> Dave >>> >>> >>> -gideon >>> >>>> On Jan 12, 2016, at 8:24 AM, Dave May wrote: >>>> >>>> >>>> >>>> On 12 January 2016 at 14:14, Gideon Simpson wrote: >>>> That seems to to allow for me to cook up a convergence test in terms of the 2 norm. >>>> >>>> While you are only provided the 2 norm of F, you are also given access to the SNES object. Thus inside your user convergence test function, you can call SNESGetFunction() and SNESGetSolution(), then you can compute your convergence criteria and set the converged reason to what ever you want. >>>> >>>> See >>>> >>>> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetFunction.html >>>> >>>> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetSolution.html >>>> >>>> Cheers, >>>> Dave >>>> >>>> >>>> >>>> >>>> What I?m really looking for is the ability to change things to be something like the 2 norm of the vector with elements >>>> >>>> F_i/|x_i| >>>> >>>> where I am looking for a root of F(x). I can just build that scaling into the form function, but is there a way to do it without rewriting that piece of the code? >>>> >>>> >>>> -gideon >>>> >>>>> On Jan 12, 2016, at 12:14 AM, Barry Smith wrote: >>>>> >>>>> >>>>> You can use SNESSetConvergenceTest() to use whatever test you want to decide on convergence. >>>>> >>>>> Barry >>>>> >>>>>> On Jan 11, 2016, at 3:26 PM, Gideon Simpson wrote: >>>>>> >>>>>> I?m solving nonlinear problem for a complex valued function which is decomposed into real and imaginary parts, Q = u + i v. What I?m finding is that where |Q| is small, the numerical phase errors tend to be larger. I suspect this is because it?s using the 2-norm for convergence in the SNES, so, where the solution is already, the phase errors are seen as small too. Is there a way to use something more like an infinity norm with SNES, to get more point wise control? >>>>>> >>>>>> -gideon >>>>>> >>>>> >>>> >>>> >>> >>> >> > From hgbk2008 at gmail.com Wed Jan 13 03:34:02 2016 From: hgbk2008 at gmail.com (Hoang Giang Bui) Date: Wed, 13 Jan 2016 10:34:02 +0100 Subject: [petsc-users] error on MatZeroRowsColumns Message-ID: Dear PETSc developers I got an error with MatZeroRowsColumns, which said there was one missing diagonal entries This is the full log message that I got: Mat Object: 2 MPI processes type: mpiaij rows=41064, cols=41064, bs=4 total: nonzeros=5.66069e+06, allocated nonzeros=1.28112e+07 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 5647 nodes, limit used is 5 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Matrix is missing diagonal entry in row 7 [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.6.3, Dec, 03, 2015 [0]PETSC ERROR: python on a arch-linux2-cxx-opt named bermuda by hbui2 Wed Jan 13 10:27:42 2016 [0]PETSC ERROR: Configure options --with-shared-libraries --with-debugging=0 --with-pic --with-clanguage=cxx --download-fblas-lapack=yes --download-ptscotch=yes --download-metis=yes --download-parmetis=yes --download-scalapack=yes --download-mumps=yes --download-hypre=yes --download-ml=yes --download-klu=yes --download-pastix=yes --with-mpi-dir=/opt/openmpi-1.10.1_thread-multiple --prefix=/home/hbui/opt/petsc-3.6.3 [0]PETSC ERROR: #1 MatZeroRowsColumns_SeqAIJ() line 1901 in /media/NEW_HOME/sw2/petsc-3.6.3/src/mat/impls/aij/seq/aij.c [0]PETSC ERROR: #2 MatZeroRowsColumns() line 5476 in /media/NEW_HOME/sw2/petsc-3.6.3/src/mat/interface/matrix.c [0]PETSC ERROR: #3 MatZeroRowsColumns_MPIAIJ() line 908 in /media/NEW_HOME/sw2/petsc-3.6.3/src/mat/impls/aij/mpi/mpiaij.c [0]PETSC ERROR: #4 MatZeroRowsColumns() line 5476 in /media/NEW_HOME/sw2/petsc-3.6.3/src/mat/interface/matrix.c [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [1]PETSC ERROR: Object is in wrong state [1]PETSC ERROR: Matrix is missing diagonal entry in row 7 [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [1]PETSC ERROR: Petsc Release Version 3.6.3, Dec, 03, 2015 [1]PETSC ERROR: python on a arch-linux2-cxx-opt named bermuda by hbui2 Wed Jan 13 10:27:42 2016 [1]PETSC ERROR: Configure options --with-shared-libraries --with-debugging=0 --with-pic --with-clanguage=cxx --download-fblas-lapack=yes --download-ptscotch=yes --download-metis=yes --download-parmetis=yes --download-scalapack=yes --download-mumps=yes --download-hypre=yes --download-ml=yes --download-klu=yes --download-pastix=yes --with-mpi-dir=/opt/openmpi-1.10.1_thread-multiple --prefix=/home/hbui/opt/petsc-3.6.3 [1]PETSC ERROR: #1 MatZeroRowsColumns_SeqAIJ() line 1901 in /media/NEW_HOME/sw2/petsc-3.6.3/src/mat/impls/aij/seq/aij.c [1]PETSC ERROR: #2 MatZeroRowsColumns() line 5476 in /media/NEW_HOME/sw2/petsc-3.6.3/src/mat/interface/matrix.c [1]PETSC ERROR: #3 MatZeroRowsColumns_MPIAIJ() line 908 in /media/NEW_HOME/sw2/petsc-3.6.3/src/mat/impls/aij/mpi/mpiaij.c [1]PETSC ERROR: #4 MatZeroRowsColumns() line 5476 in /media/NEW_HOME/sw2/petsc-3.6.3/src/mat/interface/matrix.c The problem is, before calling MatZeroRowsColumns, I searched for zero rows and set the respective diagonal: PetscInt Istart, Iend; MatGetOwnershipRange(rA.Get(), &Istart, &Iend); // loop through each row in the current partition for(PetscInt row = Istart; row < Iend; ++row) { int ncols; const PetscInt* cols; const PetscScalar* vals; MatGetRow(rA.Get(), row, &ncols, &cols, &vals); PetscScalar row_norm = 0.0; for(int i = 0; i < ncols; ++i) { PetscScalar val = vals[i]; row_norm += pow(val, 2); } row_norm = sqrt(row_norm); if(row_norm == 0.0) { for(int i = 0; i < ncols; ++i) { PetscInt col = cols[i]; if(col == row) { MatSetValue(rA.Get(), row, col, 1.0, INSERT_VALUES); } } } MatRestoreRow(rA.Get(), row, &ncols, &cols, &vals); } // cached the modification MatAssemblyBegin(rA.Get(), MAT_FINAL_ASSEMBLY); MatAssemblyEnd(rA.Get(), MAT_FINAL_ASSEMBLY); This should set all the missing diagonal for missing rows. But I could not figure out why the error message above happen. Any ideas? Regards Giang -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Jan 13 04:15:53 2016 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 13 Jan 2016 04:15:53 -0600 Subject: [petsc-users] osx configuration error In-Reply-To: References: Message-ID: On Tue, Jan 12, 2016 at 6:31 PM, Satish Balay wrote: > > 'file' object has no attribute 'getvalue' File > "/Users/markadams/Codes/petsc/config/configure.py", line 363, in > petsc_configure > > Hm - have to figure this one out - but the primary issue is: > > > stderr: > > gfortran: warning: couldn't understand kern.osversion '15.2.0 > > ld: -rpath can only be used when targeting Mac OS X 10.5 or later > I get this. The remedy I use is to put MACOSX_DEPLOYMENT_TARGET=10.5 in the environment. Its annoying, and quintessentially Mac. Matt > Perhaps you've updated xcode or OSX - but did not reinstall brew/gfortran. > > > Executing: mpif90 --version > > stdout: > > GNU Fortran (Homebrew gcc 4.9.1) 4.9.1 > > I suggest uninstalling/reinstalling homebrew packages. > > Satish > > > > On Tue, 12 Jan 2016, Mark Adams wrote: > > > I did nuke the arch directory. This has worked in the past and don't > know > > what I might have changed. it's been awhile since I've reconfigured. > > Thanks, > > Mark > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Jan 13 04:17:36 2016 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 13 Jan 2016 04:17:36 -0600 Subject: [petsc-users] error on MatZeroRowsColumns In-Reply-To: References: Message-ID: On Wed, Jan 13, 2016 at 3:34 AM, Hoang Giang Bui wrote: > Dear PETSc developers > > I got an error with MatZeroRowsColumns, which said there was one missing > diagonal entries > > This is the full log message that I got: > > Mat Object: 2 MPI processes > type: mpiaij > rows=41064, cols=41064, bs=4 > total: nonzeros=5.66069e+06, allocated nonzeros=1.28112e+07 > total number of mallocs used during MatSetValues calls =0 > using I-node (on process 0) routines: found 5647 nodes, limit used is 5 > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Object is in wrong state > [0]PETSC ERROR: Matrix is missing diagonal entry in row 7 > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.6.3, Dec, 03, 2015 > [0]PETSC ERROR: python on a arch-linux2-cxx-opt named bermuda by hbui2 Wed > Jan 13 10:27:42 2016 > [0]PETSC ERROR: Configure options --with-shared-libraries > --with-debugging=0 --with-pic --with-clanguage=cxx > --download-fblas-lapack=yes --download-ptscotch=yes --download-metis=yes > --download-parmetis=yes --download-scalapack=yes --download-mumps=yes > --download-hypre=yes --download-ml=yes --download-klu=yes > --download-pastix=yes --with-mpi-dir=/opt/openmpi-1.10.1_thread-multiple > --prefix=/home/hbui/opt/petsc-3.6.3 > [0]PETSC ERROR: #1 MatZeroRowsColumns_SeqAIJ() line 1901 in > /media/NEW_HOME/sw2/petsc-3.6.3/src/mat/impls/aij/seq/aij.c > [0]PETSC ERROR: #2 MatZeroRowsColumns() line 5476 in > /media/NEW_HOME/sw2/petsc-3.6.3/src/mat/interface/matrix.c > [0]PETSC ERROR: #3 MatZeroRowsColumns_MPIAIJ() line 908 in > /media/NEW_HOME/sw2/petsc-3.6.3/src/mat/impls/aij/mpi/mpiaij.c > [0]PETSC ERROR: #4 MatZeroRowsColumns() line 5476 in > /media/NEW_HOME/sw2/petsc-3.6.3/src/mat/interface/matrix.c > [1]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [1]PETSC ERROR: Object is in wrong state > [1]PETSC ERROR: Matrix is missing diagonal entry in row 7 > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [1]PETSC ERROR: Petsc Release Version 3.6.3, Dec, 03, 2015 > [1]PETSC ERROR: python on a arch-linux2-cxx-opt named bermuda by hbui2 Wed > Jan 13 10:27:42 2016 > [1]PETSC ERROR: Configure options --with-shared-libraries > --with-debugging=0 --with-pic --with-clanguage=cxx > --download-fblas-lapack=yes --download-ptscotch=yes --download-metis=yes > --download-parmetis=yes --download-scalapack=yes --download-mumps=yes > --download-hypre=yes --download-ml=yes --download-klu=yes > --download-pastix=yes --with-mpi-dir=/opt/openmpi-1.10.1_thread-multiple > --prefix=/home/hbui/opt/petsc-3.6.3 > [1]PETSC ERROR: #1 MatZeroRowsColumns_SeqAIJ() line 1901 in > /media/NEW_HOME/sw2/petsc-3.6.3/src/mat/impls/aij/seq/aij.c > [1]PETSC ERROR: #2 MatZeroRowsColumns() line 5476 in > /media/NEW_HOME/sw2/petsc-3.6.3/src/mat/interface/matrix.c > [1]PETSC ERROR: #3 MatZeroRowsColumns_MPIAIJ() line 908 in > /media/NEW_HOME/sw2/petsc-3.6.3/src/mat/impls/aij/mpi/mpiaij.c > [1]PETSC ERROR: #4 MatZeroRowsColumns() line 5476 in > /media/NEW_HOME/sw2/petsc-3.6.3/src/mat/interface/matrix.c > > > The problem is, before calling MatZeroRowsColumns, I searched for zero > rows and set the respective diagonal: > > PetscInt Istart, Iend; > MatGetOwnershipRange(rA.Get(), &Istart, &Iend); > > // loop through each row in the current partition > for(PetscInt row = Istart; row < Iend; ++row) > { > int ncols; > const PetscInt* cols; > const PetscScalar* vals; > MatGetRow(rA.Get(), row, &ncols, &cols, &vals); > > PetscScalar row_norm = 0.0; > for(int i = 0; i < ncols; ++i) > { > PetscScalar val = vals[i]; > row_norm += pow(val, 2); > } > row_norm = sqrt(row_norm); > > if(row_norm == 0.0) > { > for(int i = 0; i < ncols; ++i) > { > PetscInt col = cols[i]; > if(col == row) > { > MatSetValue(rA.Get(), row, col, 1.0, > INSERT_VALUES); > } > } > } > > MatRestoreRow(rA.Get(), row, &ncols, &cols, &vals); > } > > // cached the modification > MatAssemblyBegin(rA.Get(), MAT_FINAL_ASSEMBLY); > MatAssemblyEnd(rA.Get(), MAT_FINAL_ASSEMBLY); > > This should set all the missing diagonal for missing rows. But I could not > figure out why the error message above happen. Any ideas? > My guess would be that you have a row missing the diagonal, but with other entries. Matt > Regards > Giang > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From damon at ices.utexas.edu Wed Jan 13 10:14:02 2016 From: damon at ices.utexas.edu (Damon McDougall) Date: Wed, 13 Jan 2016 10:14:02 -0600 Subject: [petsc-users] The 7th Annual Scientific Software Days Conference Message-ID: <1452701642.2145471.491082282.5126827B@webmail.messagingengine.com> The 7th Annual Scientific Software Days Conference (SSD) targets users and developers of scientific software. The conference will be held at the University of Texas at Austin Thursday Feb 25 - Friday Feb 26, 2016 and focuses on two themes: a) sharing best practices across scientific software communities; b) sharing the latest tools and technology relevant to scientific software. Past keynotes speakers include Greg Wilson (2008), Victoria Stodden (2009), Steve Easterbrook (2010), Fernando Perez (2011), Will Schroeder (2012), Neil Chue Hong (2013). This year's list of speakers include: - Brian Adams (Sandia, Dakota): http://www.sandia.gov/~briadam/index.html - Jed Brown (CU Boulder, PETSc): https://jedbrown.org/ - Tim Davis (TAMU, SuiteSparse): http://faculty.cse.tamu.edu/davis/welcome.html - Iain Dunning (MIT, Julia Project): http://iaindunning.com/ - Victor Eijkhout (TACC): http://pages.tacc.utexas.edu/~eijkhout/ - Robert van de Geijn (keynote, UT Austin, libflame): https://www.cs.utexas.edu/users/rvdg/ - Jeff Hammond (Intel, nwchem): https://jeffhammond.github.io/ - Mark Hoemmen (keynote, Sandia, Trilinos): https://plus.google.com/+MarkHoemmen - James Howison (UT Austin): http://james.howison.name/ - Fernando Perez (Berkeley, IPython): http://fperez.org/ - Cory Quammen (Kitware, Paraview/VTK): http://www.kitware.com/company/team/quammen.html - Ridgway Scott (UChicago, FEniCS): http://people.cs.uchicago.edu/~ridg/ - Roy Stogner (UT Austin, LibMesh): https://scholar.google.com/citations?user=XcurJI0AAAAJ In addition, we solicit poster submissions that share novel uses of scientific software. Please send an abstract of less than 250 words to ssd-organizers at googlegroups.com. Limited travel funding for students and early career researchers who present posters will be available. Early-bird registration fees (before Feb 10th): Students: $35 Everyone else: $50 Late registration fees (Feb 10th onwards): Students: $55 Everyone else: $70 Register here: http://scisoftdays.org/ Regards, S. Fomel (UTexas), T. Isaac (UChicago), M. Knepley (Rice), R. Kirby (Baylor), Y. Lai (UTexas), K. Long (Texas Tech), D. McDougall (UTexas), J. Stewart (Sandia) From mfadams at lbl.gov Wed Jan 13 11:43:09 2016 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 13 Jan 2016 09:43:09 -0800 Subject: [petsc-users] osx configuration error In-Reply-To: References: Message-ID: I'm still having problems. I have upgraded gcc and mpich. I am now upgrading everything from homebrew. Any ideas on this error? thanks, On Wed, Jan 13, 2016 at 2:15 AM, Matthew Knepley wrote: > On Tue, Jan 12, 2016 at 6:31 PM, Satish Balay wrote: > >> > 'file' object has no attribute 'getvalue' File >> "/Users/markadams/Codes/petsc/config/configure.py", line 363, in >> petsc_configure >> >> Hm - have to figure this one out - but the primary issue is: >> >> > stderr: >> > gfortran: warning: couldn't understand kern.osversion '15.2.0 >> > ld: -rpath can only be used when targeting Mac OS X 10.5 or later >> > > I get this. The remedy I use is to put > > MACOSX_DEPLOYMENT_TARGET=10.5 > > in the environment. Its annoying, and quintessentially Mac. > > Matt > > >> Perhaps you've updated xcode or OSX - but did not reinstall brew/gfortran. >> >> > Executing: mpif90 --version >> > stdout: >> > GNU Fortran (Homebrew gcc 4.9.1) 4.9.1 >> >> I suggest uninstalling/reinstalling homebrew packages. >> >> Satish >> >> >> >> On Tue, 12 Jan 2016, Mark Adams wrote: >> >> > I did nuke the arch directory. This has worked in the past and don't >> know >> > what I might have changed. it's been awhile since I've reconfigured. >> > Thanks, >> > Mark >> > >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 266713 bytes Desc: not available URL: From balay at mcs.anl.gov Wed Jan 13 11:49:21 2016 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 13 Jan 2016 11:49:21 -0600 Subject: [petsc-users] osx configuration error In-Reply-To: References: Message-ID: >>>>>>>> Executing: mpif90 -o /var/folders/sw/67cq0mmx43g93vrb5xkf1j7c0000gn/T/petsc-2z06LS/config.setCompilers/conftest /var/folders/sw/67cq0mmx43g93vrb5xkf1j7c0000gn/T/petsc-2z06LS/config.setCompilers/conftest.o Testing executable /var/folders/sw/67cq0mmx43g93vrb5xkf1j7c0000gn/T/petsc-2z06LS/config.setCompilers/conftest to see if it can be run Executing: /var/folders/sw/67cq0mmx43g93vrb5xkf1j7c0000gn/T/petsc-2z06LS/config.setCompilers/conftest Executing: /var/folders/sw/67cq0mmx43g93vrb5xkf1j7c0000gn/T/petsc-2z06LS/config.setCompilers/conftest ERROR while running executable: Could not execute "/var/folders/sw/67cq0mmx43g93vrb5xkf1j7c0000gn/T/petsc-2z06LS/config.setCompilers/conftest": dyld: Library not loaded: /Users/markadams/homebrew/lib/gcc/x86_64-apple-darwin13.4.0/4.9.1/libgfortran.3.dylib Referenced from: /Users/markadams/homebrew/lib/libmpifort.12.dylib Reason: image not found <<<<<<<<<<< Mostlikely you haven't reinstalled mpich - as its refering to gfortran-4.9.1. Current gfortran is 5.3 GNU Fortran (Homebrew gcc 5.3.0) 5.3.0 This is what I would do to reinstall brew 1. Make list of pkgs to reinstall brew leaves > reinstall.lst 2. delete all installed brew pacakges. brew cleanup brew list > delete.lst brew remove `cat delete.lst 3. Now reinstall all required packages brew update brew install `cat reinstall.lst` Satish On Wed, 13 Jan 2016, Mark Adams wrote: > I'm still having problems. I have upgraded gcc and mpich. I am now > upgrading everything from homebrew. Any ideas on this error? > thanks, > > On Wed, Jan 13, 2016 at 2:15 AM, Matthew Knepley wrote: > > > On Tue, Jan 12, 2016 at 6:31 PM, Satish Balay wrote: > > > >> > 'file' object has no attribute 'getvalue' File > >> "/Users/markadams/Codes/petsc/config/configure.py", line 363, in > >> petsc_configure > >> > >> Hm - have to figure this one out - but the primary issue is: > >> > >> > stderr: > >> > gfortran: warning: couldn't understand kern.osversion '15.2.0 > >> > ld: -rpath can only be used when targeting Mac OS X 10.5 or later > >> > > > > I get this. The remedy I use is to put > > > > MACOSX_DEPLOYMENT_TARGET=10.5 > > > > in the environment. Its annoying, and quintessentially Mac. > > > > Matt > > > > > >> Perhaps you've updated xcode or OSX - but did not reinstall brew/gfortran. > >> > >> > Executing: mpif90 --version > >> > stdout: > >> > GNU Fortran (Homebrew gcc 4.9.1) 4.9.1 > >> > >> I suggest uninstalling/reinstalling homebrew packages. > >> > >> Satish > >> > >> > >> > >> On Tue, 12 Jan 2016, Mark Adams wrote: > >> > >> > I did nuke the arch directory. This has worked in the past and don't > >> know > >> > what I might have changed. it's been awhile since I've reconfigured. > >> > Thanks, > >> > Mark > >> > > >> > >> > > > > > > -- > > What most experimenters take for granted before they begin their > > experiments is infinitely more interesting than any results to which their > > experiments lead. > > -- Norbert Wiener > > > From mhasan8 at vols.utk.edu Wed Jan 13 13:13:33 2016 From: mhasan8 at vols.utk.edu (Hasan, Fahad) Date: Wed, 13 Jan 2016 19:13:33 +0000 Subject: [petsc-users] ODE Solver on multiple cores Message-ID: Hello, I have written a code to solve a simple differential equation (x''+x'+6x=0 with initial values, x(0)=2, x'(0)=3). It works well on a single core and produces result close to theoretical answer but whenever I am trying to run the same code on multiple cores, I am getting incorrect results. It seems to me that, for multiple cores it stops after taking only 2 steps regardless of the final time and gives the final result (which is inaccurate). I tried different TSType (TSEULER, TSBEULER, TSSUNDIALS, TSCN etc.) but I always ended up with the same issue. Can you tell me what possibly may cause this problem? Thanks in advance. Regards, Fahad -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Wed Jan 13 13:28:29 2016 From: hzhang at mcs.anl.gov (Hong) Date: Wed, 13 Jan 2016 13:28:29 -0600 Subject: [petsc-users] ODE Solver on multiple cores In-Reply-To: References: Message-ID: Fahad: Run your code with '-ts_view' to see what solvers being used for sequential and parallel runs. Hong Hello, > > > > I have written a code to solve a simple differential equation (x??+x?+6x=0 > with initial values, x(0)=2, x?(0)=3). It works well on a single core and > produces result close to theoretical answer but whenever I am trying to run > the same code on multiple cores, I am getting incorrect results. It seems > to me that, for multiple cores it stops after taking only 2 steps > regardless of the final time and gives the final result (which is > inaccurate). I tried different TSType (TSEULER, TSBEULER, TSSUNDIALS, TSCN > etc.) but I always ended up with the same issue. > > > > Can you tell me what possibly may cause this problem? Thanks in advance. > > > > Regards, > > Fahad > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed Jan 13 13:35:39 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 13 Jan 2016 13:35:39 -0600 Subject: [petsc-users] ODE Solver on multiple cores In-Reply-To: References: Message-ID: Likely there is something wrong with the IFunction or RHSFunction or their Jacobians that you provide in parallel. For the example you are running the easiest way to manage the parallelism of the data is with a DMDACreate1d(). Otherwise you need to manage the ghost point communication yourself by setting up VecScatters. Barry > On Jan 13, 2016, at 1:13 PM, Hasan, Fahad wrote: > > Hello, > > I have written a code to solve a simple differential equation (x??+x?+6x=0 with initial values, x(0)=2, x?(0)=3). It works well on a single core and produces result close to theoretical answer but whenever I am trying to run the same code on multiple cores, I am getting incorrect results. It seems to me that, for multiple cores it stops after taking only 2 steps regardless of the final time and gives the final result (which is inaccurate). I tried different TSType (TSEULER, TSBEULER, TSSUNDIALS, TSCN etc.) but I always ended up with the same issue. > > Can you tell me what possibly may cause this problem? Thanks in advance. > > Regards, > Fahad From hongzhang at anl.gov Wed Jan 13 14:02:45 2016 From: hongzhang at anl.gov (Hong Zhang) Date: Wed, 13 Jan 2016 14:02:45 -0600 Subject: [petsc-users] ODE Solver on multiple cores In-Reply-To: References: Message-ID: <84E686A1-AFEA-4C2C-8AB2-C0A127B89029@anl.gov> If x is just a scalar, it would not be a surprise that the code does not run in parallel. If x is a vector, you need a DM object to handle the decomposition. Hong On Jan 13, 2016, at 1:13 PM, Hasan, Fahad wrote: > Hello, > > I have written a code to solve a simple differential equation (x??+x?+6x=0 with initial values, x(0)=2, x?(0)=3). It works well on a single core and produces result close to theoretical answer but whenever I am trying to run the same code on multiple cores, I am getting incorrect results. It seems to me that, for multiple cores it stops after taking only 2 steps regardless of the final time and gives the final result (which is inaccurate). I tried different TSType (TSEULER, TSBEULER, TSSUNDIALS, TSCN etc.) but I always ended up with the same issue. > > Can you tell me what possibly may cause this problem? Thanks in advance. > > Regards, > Fahad -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.knezevic at akselos.com Wed Jan 13 14:48:57 2016 From: david.knezevic at akselos.com (David Knezevic) Date: Wed, 13 Jan 2016 15:48:57 -0500 Subject: [petsc-users] SNES NEWTONLS serial vs. parallel Message-ID: I'm using NEWTONLS (with mumps for the linear solves) to do a nonlinear PDE solve. It converges well when I use 1 core. When I use 2 or more cores, the line search stagnates. I've pasted the output of -snes_linesearch_monitor below in these two cases. I was wondering if this implies that I must have a bug in parallel, or if perhaps the NEWTONLS solver can behave slightly differently in parallel? Thanks, David --------------------------------------------------------------------------------------- *Serial case:* NL step 0, |residual|_2 = 4.714515e-02 Line search: gnorm after quadratic fit 7.862867755323e-02 Line search: Cubically determined step, current gnorm 4.663945043239e-02 lambda=1.4276549921126183e-02 NL step 1, |residual|_2 = 4.663945e-02 Line search: gnorm after quadratic fit 6.977268575068e-02 Line search: Cubically determined step, current gnorm 4.594912794004e-02 lambda=2.3644825912085998e-02 NL step 2, |residual|_2 = 4.594913e-02 Line search: gnorm after quadratic fit 5.502067932478e-02 Line search: Cubically determined step, current gnorm 4.494531294405e-02 lambda=4.1260497615261321e-02 NL step 3, |residual|_2 = 4.494531e-02 Line search: gnorm after quadratic fit 5.415371063247e-02 Line search: Cubically determined step, current gnorm 4.392165925471e-02 lambda=3.6375618871780056e-02 NL step 4, |residual|_2 = 4.392166e-02 Line search: gnorm after quadratic fit 4.631663976615e-02 Line search: Cubically determined step, current gnorm 4.246200798775e-02 lambda=5.0000000000000003e-02 NL step 5, |residual|_2 = 4.246201e-02 Line search: gnorm after quadratic fit 4.222105321728e-02 Line search: Quadratically determined step, lambda=1.0000000000000001e-01 NL step 6, |residual|_2 = 4.222105e-02 Line search: gnorm after quadratic fit 4.026081251872e-02 Line search: Quadratically determined step, lambda=1.0000000000000001e-01 NL step 7, |residual|_2 = 4.026081e-02 Line search: gnorm after quadratic fit 3.776439532346e-02 Line search: Quadratically determined step, lambda=1.0000000000000001e-01 NL step 8, |residual|_2 = 3.776440e-02 Line search: gnorm after quadratic fit 3.659796311121e-02 Line search: Quadratically determined step, lambda=1.0000000000000001e-01 NL step 9, |residual|_2 = 3.659796e-02 Line search: gnorm after quadratic fit 3.423207664901e-02 Line search: Quadratically determined step, lambda=1.0000000000000001e-01 NL step 10, |residual|_2 = 3.423208e-02 Line search: gnorm after quadratic fit 3.116928452225e-02 Line search: Quadratically determined step, lambda=1.0000000000000001e-01 NL step 11, |residual|_2 = 3.116928e-02 Line search: gnorm after quadratic fit 2.874310955274e-02 Line search: Quadratically determined step, lambda=1.0000000000000001e-01 NL step 12, |residual|_2 = 2.874311e-02 Line search: gnorm after quadratic fit 2.587826662305e-02 Line search: Quadratically determined step, lambda=1.0000000000000001e-01 NL step 13, |residual|_2 = 2.587827e-02 Line search: gnorm after quadratic fit 2.344161073075e-02 Line search: Quadratically determined step, lambda=1.0000000000000001e-01 NL step 14, |residual|_2 = 2.344161e-02 Line search: gnorm after quadratic fit 2.187719889554e-02 Line search: Quadratically determined step, lambda=1.0000000000000001e-01 NL step 15, |residual|_2 = 2.187720e-02 Line search: gnorm after quadratic fit 1.983089075086e-02 Line search: Quadratically determined step, lambda=1.0000000000000001e-01 NL step 16, |residual|_2 = 1.983089e-02 Line search: gnorm after quadratic fit 1.791227711151e-02 Line search: Quadratically determined step, lambda=1.0000000000000001e-01 NL step 17, |residual|_2 = 1.791228e-02 Line search: gnorm after quadratic fit 1.613250573900e-02 Line search: Quadratically determined step, lambda=1.0000000000000001e-01 NL step 18, |residual|_2 = 1.613251e-02 Line search: gnorm after quadratic fit 1.455841843183e-02 Line search: Quadratically determined step, lambda=1.0000000000000001e-01 NL step 19, |residual|_2 = 1.455842e-02 Line search: gnorm after quadratic fit 1.321849780208e-02 Line search: Quadratically determined step, lambda=1.0574876450981290e-01 NL step 20, |residual|_2 = 1.321850e-02 Line search: gnorm after quadratic fit 9.209641609489e-03 Line search: Quadratically determined step, lambda=3.0589684959139674e-01 NL step 21, |residual|_2 = 9.209642e-03 Line search: gnorm after quadratic fit 7.590942028574e-03 Line search: Quadratically determined step, lambda=2.0920305898507460e-01 NL step 22, |residual|_2 = 7.590942e-03 Line search: gnorm after quadratic fit 4.373918927227e-03 Line search: Quadratically determined step, lambda=4.2379743128074154e-01 NL step 23, |residual|_2 = 4.373919e-03 Line search: gnorm after quadratic fit 3.681351665911e-03 Line search: Quadratically determined step, lambda=1.9626618428089049e-01 NL step 24, |residual|_2 = 3.681352e-03 Line search: gnorm after quadratic fit 2.594782418891e-03 Line search: Quadratically determined step, lambda=3.8057533372167579e-01 NL step 25, |residual|_2 = 2.594782e-03 Line search: gnorm after quadratic fit 1.803188279452e-03 Line search: Quadratically determined step, lambda=4.3574109448916826e-01 NL step 26, |residual|_2 = 1.803188e-03 Line search: Using full step: fnorm 1.803188279452e-03 gnorm 9.015947319176e-04 NL step 27, |residual|_2 = 9.015947e-04 Line search: Using full step: fnorm 9.015947319176e-04 gnorm 7.088879385731e-08 NL step 28, |residual|_2 = 7.088879e-08 Line search: gnorm after quadratic fit 7.088878906502e-08 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088878957116e-08 lambda=2.1132490715284968e-01 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385683e-08 lambda=9.2196195824189087e-02 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385711e-08 lambda=4.0004532931495446e-02 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385722e-08 lambda=1.7374764617622523e-02 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385726e-08 lambda=7.5449542135114234e-03 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=3.2764749100364717e-03 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=1.4228361655588414e-03 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=6.1787884492365153e-04 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=2.6831916265377548e-04 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=1.1651988987471248e-04 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=5.0599757911789984e-05 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=2.1973377296845284e-05 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=9.5421268734746417e-06 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=4.1437501409853001e-06 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=1.7994589108402447e-06 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=7.8143041004756041e-07 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=3.3934283359762142e-07 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=1.4736252548330828e-07 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=6.3993436038104693e-08 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=2.7789696481734489e-08 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=1.2067913185456743e-08 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=5.2405944320925838e-09 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=2.2757729177880525e-09 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=9.8827383810151057e-10 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=4.2916635989551390e-10 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=1.8636915940199893e-10 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=8.0932400164504977e-11 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=3.5145586412497970e-11 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=1.5262271250668997e-11 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=6.6277717206096633e-12 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=2.8781665100197773e-12 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=1.2498684035299616e-12 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=5.4276603549660526e-13 Line search: unable to find good step length! After 33 tries Line search: fnorm=7.0888793857309783e-08, gnorm=7.0888793857309783e-08, ynorm=2.4650076775058285e-08, minlambda=9.9999999999999998e-13, lambda=5.4276603549660526e-13, initial slope=-5.0252210945441613e-15 *Parallel case:* NL step 0, |residual|_2 = 4.714515e-02 Line search: gnorm after quadratic fit 7.862867755323e-02 Line search: Cubically determined step, current gnorm 4.663945043239e-02 lambda=1.4276549921126183e-02 NL step 1, |residual|_2 = 4.663945e-02 Line search: gnorm after quadratic fit 6.977268575068e-02 Line search: Cubically determined step, current gnorm 4.594912794004e-02 lambda=2.3644825912085998e-02 NL step 2, |residual|_2 = 4.594913e-02 Line search: gnorm after quadratic fit 5.502067932478e-02 Line search: Cubically determined step, current gnorm 4.494531294405e-02 lambda=4.1260497615261321e-02 NL step 3, |residual|_2 = 4.494531e-02 Line search: gnorm after quadratic fit 5.415371063247e-02 Line search: Cubically determined step, current gnorm 4.392165925471e-02 lambda=3.6375618871780056e-02 NL step 4, |residual|_2 = 4.392166e-02 Line search: gnorm after quadratic fit 4.631663976615e-02 Line search: Cubically determined step, current gnorm 4.246200798775e-02 lambda=5.0000000000000003e-02 NL step 5, |residual|_2 = 4.246201e-02 Line search: gnorm after quadratic fit 4.222105321728e-02 Line search: Quadratically determined step, lambda=1.0000000000000001e-01 NL step 6, |residual|_2 = 4.222105e-02 Line search: gnorm after quadratic fit 4.026081251872e-02 Line search: Quadratically determined step, lambda=1.0000000000000001e-01 NL step 7, |residual|_2 = 4.026081e-02 Line search: gnorm after quadratic fit 3.776439532346e-02 Line search: Quadratically determined step, lambda=1.0000000000000001e-01 NL step 8, |residual|_2 = 3.776440e-02 Line search: gnorm after quadratic fit 3.659796311121e-02 Line search: Quadratically determined step, lambda=1.0000000000000001e-01 NL step 9, |residual|_2 = 3.659796e-02 Line search: gnorm after quadratic fit 3.423207664901e-02 Line search: Quadratically determined step, lambda=1.0000000000000001e-01 NL step 10, |residual|_2 = 3.423208e-02 Line search: gnorm after quadratic fit 3.116928452225e-02 Line search: Quadratically determined step, lambda=1.0000000000000001e-01 NL step 11, |residual|_2 = 3.116928e-02 Line search: gnorm after quadratic fit 2.874310955274e-02 Line search: Quadratically determined step, lambda=1.0000000000000001e-01 NL step 12, |residual|_2 = 2.874311e-02 Line search: gnorm after quadratic fit 2.587826662305e-02 Line search: Quadratically determined step, lambda=1.0000000000000001e-01 NL step 13, |residual|_2 = 2.587827e-02 Line search: gnorm after quadratic fit 2.344161073075e-02 Line search: Quadratically determined step, lambda=1.0000000000000001e-01 NL step 14, |residual|_2 = 2.344161e-02 Line search: gnorm after quadratic fit 2.187719889554e-02 Line search: Quadratically determined step, lambda=1.0000000000000001e-01 NL step 15, |residual|_2 = 2.187720e-02 Line search: gnorm after quadratic fit 1.983089075086e-02 Line search: Quadratically determined step, lambda=1.0000000000000001e-01 NL step 16, |residual|_2 = 1.983089e-02 Line search: gnorm after quadratic fit 1.791227711151e-02 Line search: Quadratically determined step, lambda=1.0000000000000001e-01 NL step 17, |residual|_2 = 1.791228e-02 Line search: gnorm after quadratic fit 1.613250573900e-02 Line search: Quadratically determined step, lambda=1.0000000000000001e-01 NL step 18, |residual|_2 = 1.613251e-02 Line search: gnorm after quadratic fit 1.455841843183e-02 Line search: Quadratically determined step, lambda=1.0000000000000001e-01 NL step 19, |residual|_2 = 1.455842e-02 Line search: gnorm after quadratic fit 1.321849780208e-02 Line search: Quadratically determined step, lambda=1.0574876450981290e-01 NL step 20, |residual|_2 = 1.321850e-02 Line search: gnorm after quadratic fit 9.209641609489e-03 Line search: Quadratically determined step, lambda=3.0589684959139674e-01 NL step 21, |residual|_2 = 9.209642e-03 Line search: gnorm after quadratic fit 7.590942028574e-03 Line search: Quadratically determined step, lambda=2.0920305898507460e-01 NL step 22, |residual|_2 = 7.590942e-03 Line search: gnorm after quadratic fit 4.373918927227e-03 Line search: Quadratically determined step, lambda=4.2379743128074154e-01 NL step 23, |residual|_2 = 4.373919e-03 Line search: gnorm after quadratic fit 3.681351665911e-03 Line search: Quadratically determined step, lambda=1.9626618428089049e-01 NL step 24, |residual|_2 = 3.681352e-03 Line search: gnorm after quadratic fit 2.594782418891e-03 Line search: Quadratically determined step, lambda=3.8057533372167579e-01 NL step 25, |residual|_2 = 2.594782e-03 Line search: gnorm after quadratic fit 1.803188279452e-03 Line search: Quadratically determined step, lambda=4.3574109448916826e-01 NL step 26, |residual|_2 = 1.803188e-03 Line search: Using full step: fnorm 1.803188279452e-03 gnorm 9.015947319176e-04 NL step 27, |residual|_2 = 9.015947e-04 Line search: Using full step: fnorm 9.015947319176e-04 gnorm 7.088879385731e-08 NL step 28, |residual|_2 = 7.088879e-08 Line search: gnorm after quadratic fit 7.088878906502e-08 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088878957116e-08 lambda=2.1132490715284968e-01 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385683e-08 lambda=9.2196195824189087e-02 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385711e-08 lambda=4.0004532931495446e-02 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385722e-08 lambda=1.7374764617622523e-02 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385726e-08 lambda=7.5449542135114234e-03 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=3.2764749100364717e-03 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=1.4228361655588414e-03 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=6.1787884492365153e-04 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=2.6831916265377548e-04 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=1.1651988987471248e-04 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=5.0599757911789984e-05 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=2.1973377296845284e-05 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=9.5421268734746417e-06 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=4.1437501409853001e-06 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=1.7994589108402447e-06 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=7.8143041004756041e-07 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=3.3934283359762142e-07 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=1.4736252548330828e-07 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=6.3993436038104693e-08 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=2.7789696481734489e-08 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=1.2067913185456743e-08 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=5.2405944320925838e-09 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=2.2757729177880525e-09 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=9.8827383810151057e-10 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=4.2916635989551390e-10 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=1.8636915940199893e-10 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=8.0932400164504977e-11 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=3.5145586412497970e-11 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=1.5262271250668997e-11 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=6.6277717206096633e-12 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=2.8781665100197773e-12 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=1.2498684035299616e-12 Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=5.4276603549660526e-13 Line search: unable to find good step length! After 33 tries Line search: fnorm=7.0888793857309783e-08, gnorm=7.0888793857309783e-08, ynorm=2.4650076775058285e-08, minlambda=9.9999999999999998e-13, lambda=5.4276603549660526e-13, initial slope=-5.0252210945441613e-15 -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.knezevic at akselos.com Wed Jan 13 14:51:19 2016 From: david.knezevic at akselos.com (David Knezevic) Date: Wed, 13 Jan 2016 15:51:19 -0500 Subject: [petsc-users] SNES NEWTONLS serial vs. parallel In-Reply-To: References: Message-ID: Oops! I pasted the wrong text for the serial case. The correct text is below: *Serial case:* NL step 0, |residual|_2 = 4.714515e-02 Line search: gnorm after quadratic fit 7.862867755130e-02 Line search: Cubically determined step, current gnorm 4.663945044088e-02 lambda=1.4276549223307832e-02 NL step 1, |residual|_2 = 4.663945e-02 Line search: gnorm after quadratic fit 6.977268532963e-02 Line search: Cubically determined step, current gnorm 4.594912791877e-02 lambda=2.3644826349821228e-02 NL step 2, |residual|_2 = 4.594913e-02 Line search: gnorm after quadratic fit 5.502067915588e-02 Line search: Cubically determined step, current gnorm 4.494531287593e-02 lambda=4.1260496881982515e-02 NL step 3, |residual|_2 = 4.494531e-02 Line search: gnorm after quadratic fit 5.415371014813e-02 Line search: Cubically determined step, current gnorm 4.392165909219e-02 lambda=3.6375617606865668e-02 NL step 4, |residual|_2 = 4.392166e-02 Line search: gnorm after quadratic fit 4.631663907262e-02 Line search: Cubically determined step, current gnorm 4.246200768767e-02 lambda=5.0000000000000003e-02 NL step 5, |residual|_2 = 4.246201e-02 Line search: gnorm after quadratic fit 4.222105256158e-02 Line search: Quadratically determined step, lambda=1.0000000000000001e-01 NL step 6, |residual|_2 = 4.222105e-02 Line search: gnorm after quadratic fit 4.026081168915e-02 Line search: Quadratically determined step, lambda=1.0000000000000001e-01 NL step 7, |residual|_2 = 4.026081e-02 Line search: gnorm after quadratic fit 3.776439443011e-02 Line search: Quadratically determined step, lambda=1.0000000000000001e-01 NL step 8, |residual|_2 = 3.776439e-02 Line search: gnorm after quadratic fit 3.659796213553e-02 Line search: Quadratically determined step, lambda=1.0000000000000001e-01 NL step 9, |residual|_2 = 3.659796e-02 Line search: gnorm after quadratic fit 3.423207563496e-02 Line search: Quadratically determined step, lambda=1.0000000000000001e-01 NL step 10, |residual|_2 = 3.423208e-02 Line search: gnorm after quadratic fit 3.116928356075e-02 Line search: Quadratically determined step, lambda=1.0000000000000001e-01 NL step 11, |residual|_2 = 3.116928e-02 Line search: gnorm after quadratic fit 2.874310673331e-02 Line search: Quadratically determined step, lambda=1.0000000000000001e-01 NL step 12, |residual|_2 = 2.874311e-02 Line search: gnorm after quadratic fit 2.587826447631e-02 Line search: Quadratically determined step, lambda=1.0000000000000001e-01 NL step 13, |residual|_2 = 2.587826e-02 Line search: gnorm after quadratic fit 2.344160918669e-02 Line search: Quadratically determined step, lambda=1.0000000000000001e-01 NL step 14, |residual|_2 = 2.344161e-02 Line search: gnorm after quadratic fit 2.187719801063e-02 Line search: Quadratically determined step, lambda=1.0000000000000001e-01 NL step 15, |residual|_2 = 2.187720e-02 Line search: gnorm after quadratic fit 1.983089025936e-02 Line search: Quadratically determined step, lambda=1.0000000000000001e-01 NL step 16, |residual|_2 = 1.983089e-02 Line search: gnorm after quadratic fit 1.791227696650e-02 Line search: Quadratically determined step, lambda=1.0000000000000001e-01 NL step 17, |residual|_2 = 1.791228e-02 Line search: gnorm after quadratic fit 1.613250592206e-02 Line search: Quadratically determined step, lambda=1.0000000000000001e-01 NL step 18, |residual|_2 = 1.613251e-02 Line search: gnorm after quadratic fit 1.455841890804e-02 Line search: Quadratically determined step, lambda=1.0000000000000001e-01 NL step 19, |residual|_2 = 1.455842e-02 Line search: gnorm after quadratic fit 1.321849665170e-02 Line search: Quadratically determined step, lambda=1.0574900347563776e-01 NL step 20, |residual|_2 = 1.321850e-02 Line search: gnorm after quadratic fit 9.209642717528e-03 Line search: Quadratically determined step, lambda=3.0589679103560180e-01 NL step 21, |residual|_2 = 9.209643e-03 Line search: gnorm after quadratic fit 7.590944125425e-03 Line search: Quadratically determined step, lambda=2.0920307644146574e-01 NL step 22, |residual|_2 = 7.590944e-03 Line search: gnorm after quadratic fit 4.373921456388e-03 Line search: Quadratically determined step, lambda=4.2379743756255861e-01 NL step 23, |residual|_2 = 4.373921e-03 Line search: gnorm after quadratic fit 3.681355014898e-03 Line search: Quadratically determined step, lambda=1.9626628361883081e-01 NL step 24, |residual|_2 = 3.681355e-03 Line search: gnorm after quadratic fit 2.594785108727e-03 Line search: Quadratically determined step, lambda=3.8057573229158653e-01 NL step 25, |residual|_2 = 2.594785e-03 Line search: gnorm after quadratic fit 1.803191839408e-03 Line search: Quadratically determined step, lambda=4.3574150080610474e-01 NL step 26, |residual|_2 = 1.803192e-03 Line search: Using full step: fnorm 1.803191839408e-03 gnorm 9.015954497317e-04 NL step 27, |residual|_2 = 9.015954e-04 Line search: Using full step: fnorm 9.015954497317e-04 gnorm 1.390181456520e-13 NL step 28, |residual|_2 = 1.390181e-13 Number of nonlinear iterations: 28 On Wed, Jan 13, 2016 at 3:48 PM, David Knezevic wrote: > I'm using NEWTONLS (with mumps for the linear solves) to do a nonlinear > PDE solve. It converges well when I use 1 core. When I use 2 or more cores, > the line search stagnates. I've pasted the output of > -snes_linesearch_monitor below in these two cases. > > I was wondering if this implies that I must have a bug in parallel, or if > perhaps the NEWTONLS solver can behave slightly differently in parallel? > > Thanks, > David > > > --------------------------------------------------------------------------------------- > > > > *Parallel case:* > NL step 0, |residual|_2 = 4.714515e-02 > Line search: gnorm after quadratic fit 7.862867755323e-02 > Line search: Cubically determined step, current gnorm > 4.663945043239e-02 lambda=1.4276549921126183e-02 > NL step 1, |residual|_2 = 4.663945e-02 > Line search: gnorm after quadratic fit 6.977268575068e-02 > Line search: Cubically determined step, current gnorm > 4.594912794004e-02 lambda=2.3644825912085998e-02 > NL step 2, |residual|_2 = 4.594913e-02 > Line search: gnorm after quadratic fit 5.502067932478e-02 > Line search: Cubically determined step, current gnorm > 4.494531294405e-02 lambda=4.1260497615261321e-02 > NL step 3, |residual|_2 = 4.494531e-02 > Line search: gnorm after quadratic fit 5.415371063247e-02 > Line search: Cubically determined step, current gnorm > 4.392165925471e-02 lambda=3.6375618871780056e-02 > NL step 4, |residual|_2 = 4.392166e-02 > Line search: gnorm after quadratic fit 4.631663976615e-02 > Line search: Cubically determined step, current gnorm > 4.246200798775e-02 lambda=5.0000000000000003e-02 > NL step 5, |residual|_2 = 4.246201e-02 > Line search: gnorm after quadratic fit 4.222105321728e-02 > Line search: Quadratically determined step, > lambda=1.0000000000000001e-01 > NL step 6, |residual|_2 = 4.222105e-02 > Line search: gnorm after quadratic fit 4.026081251872e-02 > Line search: Quadratically determined step, > lambda=1.0000000000000001e-01 > NL step 7, |residual|_2 = 4.026081e-02 > Line search: gnorm after quadratic fit 3.776439532346e-02 > Line search: Quadratically determined step, > lambda=1.0000000000000001e-01 > NL step 8, |residual|_2 = 3.776440e-02 > Line search: gnorm after quadratic fit 3.659796311121e-02 > Line search: Quadratically determined step, > lambda=1.0000000000000001e-01 > NL step 9, |residual|_2 = 3.659796e-02 > Line search: gnorm after quadratic fit 3.423207664901e-02 > Line search: Quadratically determined step, > lambda=1.0000000000000001e-01 > NL step 10, |residual|_2 = 3.423208e-02 > Line search: gnorm after quadratic fit 3.116928452225e-02 > Line search: Quadratically determined step, > lambda=1.0000000000000001e-01 > NL step 11, |residual|_2 = 3.116928e-02 > Line search: gnorm after quadratic fit 2.874310955274e-02 > Line search: Quadratically determined step, > lambda=1.0000000000000001e-01 > NL step 12, |residual|_2 = 2.874311e-02 > Line search: gnorm after quadratic fit 2.587826662305e-02 > Line search: Quadratically determined step, > lambda=1.0000000000000001e-01 > NL step 13, |residual|_2 = 2.587827e-02 > Line search: gnorm after quadratic fit 2.344161073075e-02 > Line search: Quadratically determined step, > lambda=1.0000000000000001e-01 > NL step 14, |residual|_2 = 2.344161e-02 > Line search: gnorm after quadratic fit 2.187719889554e-02 > Line search: Quadratically determined step, > lambda=1.0000000000000001e-01 > NL step 15, |residual|_2 = 2.187720e-02 > Line search: gnorm after quadratic fit 1.983089075086e-02 > Line search: Quadratically determined step, > lambda=1.0000000000000001e-01 > NL step 16, |residual|_2 = 1.983089e-02 > Line search: gnorm after quadratic fit 1.791227711151e-02 > Line search: Quadratically determined step, > lambda=1.0000000000000001e-01 > NL step 17, |residual|_2 = 1.791228e-02 > Line search: gnorm after quadratic fit 1.613250573900e-02 > Line search: Quadratically determined step, > lambda=1.0000000000000001e-01 > NL step 18, |residual|_2 = 1.613251e-02 > Line search: gnorm after quadratic fit 1.455841843183e-02 > Line search: Quadratically determined step, > lambda=1.0000000000000001e-01 > NL step 19, |residual|_2 = 1.455842e-02 > Line search: gnorm after quadratic fit 1.321849780208e-02 > Line search: Quadratically determined step, > lambda=1.0574876450981290e-01 > NL step 20, |residual|_2 = 1.321850e-02 > Line search: gnorm after quadratic fit 9.209641609489e-03 > Line search: Quadratically determined step, > lambda=3.0589684959139674e-01 > NL step 21, |residual|_2 = 9.209642e-03 > Line search: gnorm after quadratic fit 7.590942028574e-03 > Line search: Quadratically determined step, > lambda=2.0920305898507460e-01 > NL step 22, |residual|_2 = 7.590942e-03 > Line search: gnorm after quadratic fit 4.373918927227e-03 > Line search: Quadratically determined step, > lambda=4.2379743128074154e-01 > NL step 23, |residual|_2 = 4.373919e-03 > Line search: gnorm after quadratic fit 3.681351665911e-03 > Line search: Quadratically determined step, > lambda=1.9626618428089049e-01 > NL step 24, |residual|_2 = 3.681352e-03 > Line search: gnorm after quadratic fit 2.594782418891e-03 > Line search: Quadratically determined step, > lambda=3.8057533372167579e-01 > NL step 25, |residual|_2 = 2.594782e-03 > Line search: gnorm after quadratic fit 1.803188279452e-03 > Line search: Quadratically determined step, > lambda=4.3574109448916826e-01 > NL step 26, |residual|_2 = 1.803188e-03 > Line search: Using full step: fnorm 1.803188279452e-03 gnorm > 9.015947319176e-04 > NL step 27, |residual|_2 = 9.015947e-04 > Line search: Using full step: fnorm 9.015947319176e-04 gnorm > 7.088879385731e-08 > NL step 28, |residual|_2 = 7.088879e-08 > Line search: gnorm after quadratic fit 7.088878906502e-08 > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088878957116e-08 lambda=2.1132490715284968e-01 > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385683e-08 lambda=9.2196195824189087e-02 > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385711e-08 lambda=4.0004532931495446e-02 > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385722e-08 lambda=1.7374764617622523e-02 > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385726e-08 lambda=7.5449542135114234e-03 > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=3.2764749100364717e-03 > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=1.4228361655588414e-03 > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=6.1787884492365153e-04 > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=2.6831916265377548e-04 > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=1.1651988987471248e-04 > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=5.0599757911789984e-05 > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=2.1973377296845284e-05 > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=9.5421268734746417e-06 > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=4.1437501409853001e-06 > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=1.7994589108402447e-06 > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=7.8143041004756041e-07 > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=3.3934283359762142e-07 > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=1.4736252548330828e-07 > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=6.3993436038104693e-08 > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=2.7789696481734489e-08 > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=1.2067913185456743e-08 > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=5.2405944320925838e-09 > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=2.2757729177880525e-09 > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=9.8827383810151057e-10 > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=4.2916635989551390e-10 > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=1.8636915940199893e-10 > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=8.0932400164504977e-11 > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=3.5145586412497970e-11 > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=1.5262271250668997e-11 > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=6.6277717206096633e-12 > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=2.8781665100197773e-12 > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=1.2498684035299616e-12 > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=5.4276603549660526e-13 > Line search: unable to find good step length! After 33 tries > Line search: fnorm=7.0888793857309783e-08, > gnorm=7.0888793857309783e-08, ynorm=2.4650076775058285e-08, > minlambda=9.9999999999999998e-13, lambda=5.4276603549660526e-13, initial > slope=-5.0252210945441613e-15 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed Jan 13 15:05:49 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 13 Jan 2016 15:05:49 -0600 Subject: [petsc-users] SNES NEWTONLS serial vs. parallel In-Reply-To: References: Message-ID: Since you are using a direct solver almost for sure a bug in your parallel function or parallel Jacobian. Try -snes_mf_operator try -snes_fd try -snes_type test as three different approaches to see what is going on. Barry > On Jan 13, 2016, at 2:51 PM, David Knezevic wrote: > > Oops! I pasted the wrong text for the serial case. The correct text is below: > > Serial case: > NL step 0, |residual|_2 = 4.714515e-02 > Line search: gnorm after quadratic fit 7.862867755130e-02 > Line search: Cubically determined step, current gnorm 4.663945044088e-02 lambda=1.4276549223307832e-02 > NL step 1, |residual|_2 = 4.663945e-02 > Line search: gnorm after quadratic fit 6.977268532963e-02 > Line search: Cubically determined step, current gnorm 4.594912791877e-02 lambda=2.3644826349821228e-02 > NL step 2, |residual|_2 = 4.594913e-02 > Line search: gnorm after quadratic fit 5.502067915588e-02 > Line search: Cubically determined step, current gnorm 4.494531287593e-02 lambda=4.1260496881982515e-02 > NL step 3, |residual|_2 = 4.494531e-02 > Line search: gnorm after quadratic fit 5.415371014813e-02 > Line search: Cubically determined step, current gnorm 4.392165909219e-02 lambda=3.6375617606865668e-02 > NL step 4, |residual|_2 = 4.392166e-02 > Line search: gnorm after quadratic fit 4.631663907262e-02 > Line search: Cubically determined step, current gnorm 4.246200768767e-02 lambda=5.0000000000000003e-02 > NL step 5, |residual|_2 = 4.246201e-02 > Line search: gnorm after quadratic fit 4.222105256158e-02 > Line search: Quadratically determined step, lambda=1.0000000000000001e-01 > NL step 6, |residual|_2 = 4.222105e-02 > Line search: gnorm after quadratic fit 4.026081168915e-02 > Line search: Quadratically determined step, lambda=1.0000000000000001e-01 > NL step 7, |residual|_2 = 4.026081e-02 > Line search: gnorm after quadratic fit 3.776439443011e-02 > Line search: Quadratically determined step, lambda=1.0000000000000001e-01 > NL step 8, |residual|_2 = 3.776439e-02 > Line search: gnorm after quadratic fit 3.659796213553e-02 > Line search: Quadratically determined step, lambda=1.0000000000000001e-01 > NL step 9, |residual|_2 = 3.659796e-02 > Line search: gnorm after quadratic fit 3.423207563496e-02 > Line search: Quadratically determined step, lambda=1.0000000000000001e-01 > NL step 10, |residual|_2 = 3.423208e-02 > Line search: gnorm after quadratic fit 3.116928356075e-02 > Line search: Quadratically determined step, lambda=1.0000000000000001e-01 > NL step 11, |residual|_2 = 3.116928e-02 > Line search: gnorm after quadratic fit 2.874310673331e-02 > Line search: Quadratically determined step, lambda=1.0000000000000001e-01 > NL step 12, |residual|_2 = 2.874311e-02 > Line search: gnorm after quadratic fit 2.587826447631e-02 > Line search: Quadratically determined step, lambda=1.0000000000000001e-01 > NL step 13, |residual|_2 = 2.587826e-02 > Line search: gnorm after quadratic fit 2.344160918669e-02 > Line search: Quadratically determined step, lambda=1.0000000000000001e-01 > NL step 14, |residual|_2 = 2.344161e-02 > Line search: gnorm after quadratic fit 2.187719801063e-02 > Line search: Quadratically determined step, lambda=1.0000000000000001e-01 > NL step 15, |residual|_2 = 2.187720e-02 > Line search: gnorm after quadratic fit 1.983089025936e-02 > Line search: Quadratically determined step, lambda=1.0000000000000001e-01 > NL step 16, |residual|_2 = 1.983089e-02 > Line search: gnorm after quadratic fit 1.791227696650e-02 > Line search: Quadratically determined step, lambda=1.0000000000000001e-01 > NL step 17, |residual|_2 = 1.791228e-02 > Line search: gnorm after quadratic fit 1.613250592206e-02 > Line search: Quadratically determined step, lambda=1.0000000000000001e-01 > NL step 18, |residual|_2 = 1.613251e-02 > Line search: gnorm after quadratic fit 1.455841890804e-02 > Line search: Quadratically determined step, lambda=1.0000000000000001e-01 > NL step 19, |residual|_2 = 1.455842e-02 > Line search: gnorm after quadratic fit 1.321849665170e-02 > Line search: Quadratically determined step, lambda=1.0574900347563776e-01 > NL step 20, |residual|_2 = 1.321850e-02 > Line search: gnorm after quadratic fit 9.209642717528e-03 > Line search: Quadratically determined step, lambda=3.0589679103560180e-01 > NL step 21, |residual|_2 = 9.209643e-03 > Line search: gnorm after quadratic fit 7.590944125425e-03 > Line search: Quadratically determined step, lambda=2.0920307644146574e-01 > NL step 22, |residual|_2 = 7.590944e-03 > Line search: gnorm after quadratic fit 4.373921456388e-03 > Line search: Quadratically determined step, lambda=4.2379743756255861e-01 > NL step 23, |residual|_2 = 4.373921e-03 > Line search: gnorm after quadratic fit 3.681355014898e-03 > Line search: Quadratically determined step, lambda=1.9626628361883081e-01 > NL step 24, |residual|_2 = 3.681355e-03 > Line search: gnorm after quadratic fit 2.594785108727e-03 > Line search: Quadratically determined step, lambda=3.8057573229158653e-01 > NL step 25, |residual|_2 = 2.594785e-03 > Line search: gnorm after quadratic fit 1.803191839408e-03 > Line search: Quadratically determined step, lambda=4.3574150080610474e-01 > NL step 26, |residual|_2 = 1.803192e-03 > Line search: Using full step: fnorm 1.803191839408e-03 gnorm 9.015954497317e-04 > NL step 27, |residual|_2 = 9.015954e-04 > Line search: Using full step: fnorm 9.015954497317e-04 gnorm 1.390181456520e-13 > NL step 28, |residual|_2 = 1.390181e-13 > Number of nonlinear iterations: 28 > > > On Wed, Jan 13, 2016 at 3:48 PM, David Knezevic wrote: > I'm using NEWTONLS (with mumps for the linear solves) to do a nonlinear PDE solve. It converges well when I use 1 core. When I use 2 or more cores, the line search stagnates. I've pasted the output of -snes_linesearch_monitor below in these two cases. > > I was wondering if this implies that I must have a bug in parallel, or if perhaps the NEWTONLS solver can behave slightly differently in parallel? > > Thanks, > David > > --------------------------------------------------------------------------------------- > > > > Parallel case: > NL step 0, |residual|_2 = 4.714515e-02 > Line search: gnorm after quadratic fit 7.862867755323e-02 > Line search: Cubically determined step, current gnorm 4.663945043239e-02 lambda=1.4276549921126183e-02 > NL step 1, |residual|_2 = 4.663945e-02 > Line search: gnorm after quadratic fit 6.977268575068e-02 > Line search: Cubically determined step, current gnorm 4.594912794004e-02 lambda=2.3644825912085998e-02 > NL step 2, |residual|_2 = 4.594913e-02 > Line search: gnorm after quadratic fit 5.502067932478e-02 > Line search: Cubically determined step, current gnorm 4.494531294405e-02 lambda=4.1260497615261321e-02 > NL step 3, |residual|_2 = 4.494531e-02 > Line search: gnorm after quadratic fit 5.415371063247e-02 > Line search: Cubically determined step, current gnorm 4.392165925471e-02 lambda=3.6375618871780056e-02 > NL step 4, |residual|_2 = 4.392166e-02 > Line search: gnorm after quadratic fit 4.631663976615e-02 > Line search: Cubically determined step, current gnorm 4.246200798775e-02 lambda=5.0000000000000003e-02 > NL step 5, |residual|_2 = 4.246201e-02 > Line search: gnorm after quadratic fit 4.222105321728e-02 > Line search: Quadratically determined step, lambda=1.0000000000000001e-01 > NL step 6, |residual|_2 = 4.222105e-02 > Line search: gnorm after quadratic fit 4.026081251872e-02 > Line search: Quadratically determined step, lambda=1.0000000000000001e-01 > NL step 7, |residual|_2 = 4.026081e-02 > Line search: gnorm after quadratic fit 3.776439532346e-02 > Line search: Quadratically determined step, lambda=1.0000000000000001e-01 > NL step 8, |residual|_2 = 3.776440e-02 > Line search: gnorm after quadratic fit 3.659796311121e-02 > Line search: Quadratically determined step, lambda=1.0000000000000001e-01 > NL step 9, |residual|_2 = 3.659796e-02 > Line search: gnorm after quadratic fit 3.423207664901e-02 > Line search: Quadratically determined step, lambda=1.0000000000000001e-01 > NL step 10, |residual|_2 = 3.423208e-02 > Line search: gnorm after quadratic fit 3.116928452225e-02 > Line search: Quadratically determined step, lambda=1.0000000000000001e-01 > NL step 11, |residual|_2 = 3.116928e-02 > Line search: gnorm after quadratic fit 2.874310955274e-02 > Line search: Quadratically determined step, lambda=1.0000000000000001e-01 > NL step 12, |residual|_2 = 2.874311e-02 > Line search: gnorm after quadratic fit 2.587826662305e-02 > Line search: Quadratically determined step, lambda=1.0000000000000001e-01 > NL step 13, |residual|_2 = 2.587827e-02 > Line search: gnorm after quadratic fit 2.344161073075e-02 > Line search: Quadratically determined step, lambda=1.0000000000000001e-01 > NL step 14, |residual|_2 = 2.344161e-02 > Line search: gnorm after quadratic fit 2.187719889554e-02 > Line search: Quadratically determined step, lambda=1.0000000000000001e-01 > NL step 15, |residual|_2 = 2.187720e-02 > Line search: gnorm after quadratic fit 1.983089075086e-02 > Line search: Quadratically determined step, lambda=1.0000000000000001e-01 > NL step 16, |residual|_2 = 1.983089e-02 > Line search: gnorm after quadratic fit 1.791227711151e-02 > Line search: Quadratically determined step, lambda=1.0000000000000001e-01 > NL step 17, |residual|_2 = 1.791228e-02 > Line search: gnorm after quadratic fit 1.613250573900e-02 > Line search: Quadratically determined step, lambda=1.0000000000000001e-01 > NL step 18, |residual|_2 = 1.613251e-02 > Line search: gnorm after quadratic fit 1.455841843183e-02 > Line search: Quadratically determined step, lambda=1.0000000000000001e-01 > NL step 19, |residual|_2 = 1.455842e-02 > Line search: gnorm after quadratic fit 1.321849780208e-02 > Line search: Quadratically determined step, lambda=1.0574876450981290e-01 > NL step 20, |residual|_2 = 1.321850e-02 > Line search: gnorm after quadratic fit 9.209641609489e-03 > Line search: Quadratically determined step, lambda=3.0589684959139674e-01 > NL step 21, |residual|_2 = 9.209642e-03 > Line search: gnorm after quadratic fit 7.590942028574e-03 > Line search: Quadratically determined step, lambda=2.0920305898507460e-01 > NL step 22, |residual|_2 = 7.590942e-03 > Line search: gnorm after quadratic fit 4.373918927227e-03 > Line search: Quadratically determined step, lambda=4.2379743128074154e-01 > NL step 23, |residual|_2 = 4.373919e-03 > Line search: gnorm after quadratic fit 3.681351665911e-03 > Line search: Quadratically determined step, lambda=1.9626618428089049e-01 > NL step 24, |residual|_2 = 3.681352e-03 > Line search: gnorm after quadratic fit 2.594782418891e-03 > Line search: Quadratically determined step, lambda=3.8057533372167579e-01 > NL step 25, |residual|_2 = 2.594782e-03 > Line search: gnorm after quadratic fit 1.803188279452e-03 > Line search: Quadratically determined step, lambda=4.3574109448916826e-01 > NL step 26, |residual|_2 = 1.803188e-03 > Line search: Using full step: fnorm 1.803188279452e-03 gnorm 9.015947319176e-04 > NL step 27, |residual|_2 = 9.015947e-04 > Line search: Using full step: fnorm 9.015947319176e-04 gnorm 7.088879385731e-08 > NL step 28, |residual|_2 = 7.088879e-08 > Line search: gnorm after quadratic fit 7.088878906502e-08 > Line search: Cubic step no good, shrinking lambda, current gnorm 7.088878957116e-08 lambda=2.1132490715284968e-01 > Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385683e-08 lambda=9.2196195824189087e-02 > Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385711e-08 lambda=4.0004532931495446e-02 > Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385722e-08 lambda=1.7374764617622523e-02 > Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385726e-08 lambda=7.5449542135114234e-03 > Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=3.2764749100364717e-03 > Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=1.4228361655588414e-03 > Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=6.1787884492365153e-04 > Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=2.6831916265377548e-04 > Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=1.1651988987471248e-04 > Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=5.0599757911789984e-05 > Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=2.1973377296845284e-05 > Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=9.5421268734746417e-06 > Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=4.1437501409853001e-06 > Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=1.7994589108402447e-06 > Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=7.8143041004756041e-07 > Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=3.3934283359762142e-07 > Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=1.4736252548330828e-07 > Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=6.3993436038104693e-08 > Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=2.7789696481734489e-08 > Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=1.2067913185456743e-08 > Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=5.2405944320925838e-09 > Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=2.2757729177880525e-09 > Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=9.8827383810151057e-10 > Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=4.2916635989551390e-10 > Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=1.8636915940199893e-10 > Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=8.0932400164504977e-11 > Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=3.5145586412497970e-11 > Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=1.5262271250668997e-11 > Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=6.6277717206096633e-12 > Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=2.8781665100197773e-12 > Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=1.2498684035299616e-12 > Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=5.4276603549660526e-13 > Line search: unable to find good step length! After 33 tries > Line search: fnorm=7.0888793857309783e-08, gnorm=7.0888793857309783e-08, ynorm=2.4650076775058285e-08, minlambda=9.9999999999999998e-13, lambda=5.4276603549660526e-13, initial slope=-5.0252210945441613e-15 > > From david.knezevic at akselos.com Wed Jan 13 15:08:01 2016 From: david.knezevic at akselos.com (David Knezevic) Date: Wed, 13 Jan 2016 16:08:01 -0500 Subject: [petsc-users] SNES NEWTONLS serial vs. parallel In-Reply-To: References: Message-ID: OK, will do, thanks. David On Wed, Jan 13, 2016 at 4:05 PM, Barry Smith wrote: > > Since you are using a direct solver almost for sure a bug in your > parallel function or parallel Jacobian. > > Try -snes_mf_operator try -snes_fd try -snes_type test as three > different approaches to see what is going on. > > Barry > > > On Jan 13, 2016, at 2:51 PM, David Knezevic > wrote: > > > > Oops! I pasted the wrong text for the serial case. The correct text is > below: > > > > Serial case: > > NL step 0, |residual|_2 = 4.714515e-02 > > Line search: gnorm after quadratic fit 7.862867755130e-02 > > Line search: Cubically determined step, current gnorm > 4.663945044088e-02 lambda=1.4276549223307832e-02 > > NL step 1, |residual|_2 = 4.663945e-02 > > Line search: gnorm after quadratic fit 6.977268532963e-02 > > Line search: Cubically determined step, current gnorm > 4.594912791877e-02 lambda=2.3644826349821228e-02 > > NL step 2, |residual|_2 = 4.594913e-02 > > Line search: gnorm after quadratic fit 5.502067915588e-02 > > Line search: Cubically determined step, current gnorm > 4.494531287593e-02 lambda=4.1260496881982515e-02 > > NL step 3, |residual|_2 = 4.494531e-02 > > Line search: gnorm after quadratic fit 5.415371014813e-02 > > Line search: Cubically determined step, current gnorm > 4.392165909219e-02 lambda=3.6375617606865668e-02 > > NL step 4, |residual|_2 = 4.392166e-02 > > Line search: gnorm after quadratic fit 4.631663907262e-02 > > Line search: Cubically determined step, current gnorm > 4.246200768767e-02 lambda=5.0000000000000003e-02 > > NL step 5, |residual|_2 = 4.246201e-02 > > Line search: gnorm after quadratic fit 4.222105256158e-02 > > Line search: Quadratically determined step, > lambda=1.0000000000000001e-01 > > NL step 6, |residual|_2 = 4.222105e-02 > > Line search: gnorm after quadratic fit 4.026081168915e-02 > > Line search: Quadratically determined step, > lambda=1.0000000000000001e-01 > > NL step 7, |residual|_2 = 4.026081e-02 > > Line search: gnorm after quadratic fit 3.776439443011e-02 > > Line search: Quadratically determined step, > lambda=1.0000000000000001e-01 > > NL step 8, |residual|_2 = 3.776439e-02 > > Line search: gnorm after quadratic fit 3.659796213553e-02 > > Line search: Quadratically determined step, > lambda=1.0000000000000001e-01 > > NL step 9, |residual|_2 = 3.659796e-02 > > Line search: gnorm after quadratic fit 3.423207563496e-02 > > Line search: Quadratically determined step, > lambda=1.0000000000000001e-01 > > NL step 10, |residual|_2 = 3.423208e-02 > > Line search: gnorm after quadratic fit 3.116928356075e-02 > > Line search: Quadratically determined step, > lambda=1.0000000000000001e-01 > > NL step 11, |residual|_2 = 3.116928e-02 > > Line search: gnorm after quadratic fit 2.874310673331e-02 > > Line search: Quadratically determined step, > lambda=1.0000000000000001e-01 > > NL step 12, |residual|_2 = 2.874311e-02 > > Line search: gnorm after quadratic fit 2.587826447631e-02 > > Line search: Quadratically determined step, > lambda=1.0000000000000001e-01 > > NL step 13, |residual|_2 = 2.587826e-02 > > Line search: gnorm after quadratic fit 2.344160918669e-02 > > Line search: Quadratically determined step, > lambda=1.0000000000000001e-01 > > NL step 14, |residual|_2 = 2.344161e-02 > > Line search: gnorm after quadratic fit 2.187719801063e-02 > > Line search: Quadratically determined step, > lambda=1.0000000000000001e-01 > > NL step 15, |residual|_2 = 2.187720e-02 > > Line search: gnorm after quadratic fit 1.983089025936e-02 > > Line search: Quadratically determined step, > lambda=1.0000000000000001e-01 > > NL step 16, |residual|_2 = 1.983089e-02 > > Line search: gnorm after quadratic fit 1.791227696650e-02 > > Line search: Quadratically determined step, > lambda=1.0000000000000001e-01 > > NL step 17, |residual|_2 = 1.791228e-02 > > Line search: gnorm after quadratic fit 1.613250592206e-02 > > Line search: Quadratically determined step, > lambda=1.0000000000000001e-01 > > NL step 18, |residual|_2 = 1.613251e-02 > > Line search: gnorm after quadratic fit 1.455841890804e-02 > > Line search: Quadratically determined step, > lambda=1.0000000000000001e-01 > > NL step 19, |residual|_2 = 1.455842e-02 > > Line search: gnorm after quadratic fit 1.321849665170e-02 > > Line search: Quadratically determined step, > lambda=1.0574900347563776e-01 > > NL step 20, |residual|_2 = 1.321850e-02 > > Line search: gnorm after quadratic fit 9.209642717528e-03 > > Line search: Quadratically determined step, > lambda=3.0589679103560180e-01 > > NL step 21, |residual|_2 = 9.209643e-03 > > Line search: gnorm after quadratic fit 7.590944125425e-03 > > Line search: Quadratically determined step, > lambda=2.0920307644146574e-01 > > NL step 22, |residual|_2 = 7.590944e-03 > > Line search: gnorm after quadratic fit 4.373921456388e-03 > > Line search: Quadratically determined step, > lambda=4.2379743756255861e-01 > > NL step 23, |residual|_2 = 4.373921e-03 > > Line search: gnorm after quadratic fit 3.681355014898e-03 > > Line search: Quadratically determined step, > lambda=1.9626628361883081e-01 > > NL step 24, |residual|_2 = 3.681355e-03 > > Line search: gnorm after quadratic fit 2.594785108727e-03 > > Line search: Quadratically determined step, > lambda=3.8057573229158653e-01 > > NL step 25, |residual|_2 = 2.594785e-03 > > Line search: gnorm after quadratic fit 1.803191839408e-03 > > Line search: Quadratically determined step, > lambda=4.3574150080610474e-01 > > NL step 26, |residual|_2 = 1.803192e-03 > > Line search: Using full step: fnorm 1.803191839408e-03 gnorm > 9.015954497317e-04 > > NL step 27, |residual|_2 = 9.015954e-04 > > Line search: Using full step: fnorm 9.015954497317e-04 gnorm > 1.390181456520e-13 > > NL step 28, |residual|_2 = 1.390181e-13 > > Number of nonlinear iterations: 28 > > > > > > On Wed, Jan 13, 2016 at 3:48 PM, David Knezevic < > david.knezevic at akselos.com> wrote: > > I'm using NEWTONLS (with mumps for the linear solves) to do a nonlinear > PDE solve. It converges well when I use 1 core. When I use 2 or more cores, > the line search stagnates. I've pasted the output of > -snes_linesearch_monitor below in these two cases. > > > > I was wondering if this implies that I must have a bug in parallel, or > if perhaps the NEWTONLS solver can behave slightly differently in parallel? > > > > Thanks, > > David > > > > > --------------------------------------------------------------------------------------- > > > > > > > > Parallel case: > > NL step 0, |residual|_2 = 4.714515e-02 > > Line search: gnorm after quadratic fit 7.862867755323e-02 > > Line search: Cubically determined step, current gnorm > 4.663945043239e-02 lambda=1.4276549921126183e-02 > > NL step 1, |residual|_2 = 4.663945e-02 > > Line search: gnorm after quadratic fit 6.977268575068e-02 > > Line search: Cubically determined step, current gnorm > 4.594912794004e-02 lambda=2.3644825912085998e-02 > > NL step 2, |residual|_2 = 4.594913e-02 > > Line search: gnorm after quadratic fit 5.502067932478e-02 > > Line search: Cubically determined step, current gnorm > 4.494531294405e-02 lambda=4.1260497615261321e-02 > > NL step 3, |residual|_2 = 4.494531e-02 > > Line search: gnorm after quadratic fit 5.415371063247e-02 > > Line search: Cubically determined step, current gnorm > 4.392165925471e-02 lambda=3.6375618871780056e-02 > > NL step 4, |residual|_2 = 4.392166e-02 > > Line search: gnorm after quadratic fit 4.631663976615e-02 > > Line search: Cubically determined step, current gnorm > 4.246200798775e-02 lambda=5.0000000000000003e-02 > > NL step 5, |residual|_2 = 4.246201e-02 > > Line search: gnorm after quadratic fit 4.222105321728e-02 > > Line search: Quadratically determined step, > lambda=1.0000000000000001e-01 > > NL step 6, |residual|_2 = 4.222105e-02 > > Line search: gnorm after quadratic fit 4.026081251872e-02 > > Line search: Quadratically determined step, > lambda=1.0000000000000001e-01 > > NL step 7, |residual|_2 = 4.026081e-02 > > Line search: gnorm after quadratic fit 3.776439532346e-02 > > Line search: Quadratically determined step, > lambda=1.0000000000000001e-01 > > NL step 8, |residual|_2 = 3.776440e-02 > > Line search: gnorm after quadratic fit 3.659796311121e-02 > > Line search: Quadratically determined step, > lambda=1.0000000000000001e-01 > > NL step 9, |residual|_2 = 3.659796e-02 > > Line search: gnorm after quadratic fit 3.423207664901e-02 > > Line search: Quadratically determined step, > lambda=1.0000000000000001e-01 > > NL step 10, |residual|_2 = 3.423208e-02 > > Line search: gnorm after quadratic fit 3.116928452225e-02 > > Line search: Quadratically determined step, > lambda=1.0000000000000001e-01 > > NL step 11, |residual|_2 = 3.116928e-02 > > Line search: gnorm after quadratic fit 2.874310955274e-02 > > Line search: Quadratically determined step, > lambda=1.0000000000000001e-01 > > NL step 12, |residual|_2 = 2.874311e-02 > > Line search: gnorm after quadratic fit 2.587826662305e-02 > > Line search: Quadratically determined step, > lambda=1.0000000000000001e-01 > > NL step 13, |residual|_2 = 2.587827e-02 > > Line search: gnorm after quadratic fit 2.344161073075e-02 > > Line search: Quadratically determined step, > lambda=1.0000000000000001e-01 > > NL step 14, |residual|_2 = 2.344161e-02 > > Line search: gnorm after quadratic fit 2.187719889554e-02 > > Line search: Quadratically determined step, > lambda=1.0000000000000001e-01 > > NL step 15, |residual|_2 = 2.187720e-02 > > Line search: gnorm after quadratic fit 1.983089075086e-02 > > Line search: Quadratically determined step, > lambda=1.0000000000000001e-01 > > NL step 16, |residual|_2 = 1.983089e-02 > > Line search: gnorm after quadratic fit 1.791227711151e-02 > > Line search: Quadratically determined step, > lambda=1.0000000000000001e-01 > > NL step 17, |residual|_2 = 1.791228e-02 > > Line search: gnorm after quadratic fit 1.613250573900e-02 > > Line search: Quadratically determined step, > lambda=1.0000000000000001e-01 > > NL step 18, |residual|_2 = 1.613251e-02 > > Line search: gnorm after quadratic fit 1.455841843183e-02 > > Line search: Quadratically determined step, > lambda=1.0000000000000001e-01 > > NL step 19, |residual|_2 = 1.455842e-02 > > Line search: gnorm after quadratic fit 1.321849780208e-02 > > Line search: Quadratically determined step, > lambda=1.0574876450981290e-01 > > NL step 20, |residual|_2 = 1.321850e-02 > > Line search: gnorm after quadratic fit 9.209641609489e-03 > > Line search: Quadratically determined step, > lambda=3.0589684959139674e-01 > > NL step 21, |residual|_2 = 9.209642e-03 > > Line search: gnorm after quadratic fit 7.590942028574e-03 > > Line search: Quadratically determined step, > lambda=2.0920305898507460e-01 > > NL step 22, |residual|_2 = 7.590942e-03 > > Line search: gnorm after quadratic fit 4.373918927227e-03 > > Line search: Quadratically determined step, > lambda=4.2379743128074154e-01 > > NL step 23, |residual|_2 = 4.373919e-03 > > Line search: gnorm after quadratic fit 3.681351665911e-03 > > Line search: Quadratically determined step, > lambda=1.9626618428089049e-01 > > NL step 24, |residual|_2 = 3.681352e-03 > > Line search: gnorm after quadratic fit 2.594782418891e-03 > > Line search: Quadratically determined step, > lambda=3.8057533372167579e-01 > > NL step 25, |residual|_2 = 2.594782e-03 > > Line search: gnorm after quadratic fit 1.803188279452e-03 > > Line search: Quadratically determined step, > lambda=4.3574109448916826e-01 > > NL step 26, |residual|_2 = 1.803188e-03 > > Line search: Using full step: fnorm 1.803188279452e-03 gnorm > 9.015947319176e-04 > > NL step 27, |residual|_2 = 9.015947e-04 > > Line search: Using full step: fnorm 9.015947319176e-04 gnorm > 7.088879385731e-08 > > NL step 28, |residual|_2 = 7.088879e-08 > > Line search: gnorm after quadratic fit 7.088878906502e-08 > > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088878957116e-08 lambda=2.1132490715284968e-01 > > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385683e-08 lambda=9.2196195824189087e-02 > > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385711e-08 lambda=4.0004532931495446e-02 > > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385722e-08 lambda=1.7374764617622523e-02 > > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385726e-08 lambda=7.5449542135114234e-03 > > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=3.2764749100364717e-03 > > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=1.4228361655588414e-03 > > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=6.1787884492365153e-04 > > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=2.6831916265377548e-04 > > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=1.1651988987471248e-04 > > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=5.0599757911789984e-05 > > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=2.1973377296845284e-05 > > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=9.5421268734746417e-06 > > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=4.1437501409853001e-06 > > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=1.7994589108402447e-06 > > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=7.8143041004756041e-07 > > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=3.3934283359762142e-07 > > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=1.4736252548330828e-07 > > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=6.3993436038104693e-08 > > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=2.7789696481734489e-08 > > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=1.2067913185456743e-08 > > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=5.2405944320925838e-09 > > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=2.2757729177880525e-09 > > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=9.8827383810151057e-10 > > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=4.2916635989551390e-10 > > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=1.8636915940199893e-10 > > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=8.0932400164504977e-11 > > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=3.5145586412497970e-11 > > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=1.5262271250668997e-11 > > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=6.6277717206096633e-12 > > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=2.8781665100197773e-12 > > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=1.2498684035299616e-12 > > Line search: Cubic step no good, shrinking lambda, current gnorm > 7.088879385731e-08 lambda=5.4276603549660526e-13 > > Line search: unable to find good step length! After 33 tries > > Line search: fnorm=7.0888793857309783e-08, > gnorm=7.0888793857309783e-08, ynorm=2.4650076775058285e-08, > minlambda=9.9999999999999998e-13, lambda=5.4276603549660526e-13, initial > slope=-5.0252210945441613e-15 > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From amneetb at live.unc.edu Wed Jan 13 20:01:33 2016 From: amneetb at live.unc.edu (Bhalla, Amneet Pal S) Date: Thu, 14 Jan 2016 02:01:33 +0000 Subject: [petsc-users] HPCToolKit/HPCViewer on OS X Message-ID: <669480D3-5CB2-4A0E-B6BD-FF4B702BB5A7@ad.unc.edu> Hi Folks, I am trying to profile my application code that uses a lot of PETSc solvers. I am running applications on OS X - Yosemite. I am thinking of using HPCToolKit for the purpose, but could not find a dmg package for that. I have access to a remote linux machine that has HPCToolkit and HPCViewer installed on it ? so I just need to have a viewer on my local Mac machine to analyze the files generate by HPCToolkit. Has anyone tried building/installing these packages on OS X? Thanks, ? Amneet -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Jan 13 20:22:22 2016 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 13 Jan 2016 20:22:22 -0600 Subject: [petsc-users] HPCToolKit/HPCViewer on OS X In-Reply-To: <669480D3-5CB2-4A0E-B6BD-FF4B702BB5A7@ad.unc.edu> References: <669480D3-5CB2-4A0E-B6BD-FF4B702BB5A7@ad.unc.edu> Message-ID: On Wed, Jan 13, 2016 at 8:01 PM, Bhalla, Amneet Pal S wrote: > > Hi Folks, > > I am trying to profile my application code that uses a lot of PETSc > solvers. I am running applications on OS X - Yosemite. I am thinking > of using HPCToolKit for the purpose, but could not find a dmg package for > that. I have access to a remote linux machine that has HPCToolkit > and HPCViewer installed on it ? so I just need to have a viewer on my > local Mac machine to analyze the files generate by HPCToolkit. > Has anyone tried building/installing these packages on OS X? > I have not done it on OSX. Can you mail us a -log_summary for a rough cut? Sometimes its hard to interpret the data avalanche from one of those tools without a simple map. Thanks, Matt > Thanks, > > ? Amneet > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed Jan 13 20:59:05 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 13 Jan 2016 20:59:05 -0600 Subject: [petsc-users] [petsc-maint] HPCToolKit/HPCViewer on OS X In-Reply-To: References: <669480D3-5CB2-4A0E-B6BD-FF4B702BB5A7@ad.unc.edu> Message-ID: <6571084C-52AF-4CD4-B96A-A9EECB924060@mcs.anl.gov> The Instruments tool on the Mac, part of Xcode is trivial to use (you don't need to use Xcode GUI to build) and seems to provide useful information. Barry > On Jan 13, 2016, at 8:22 PM, Matthew Knepley wrote: > > On Wed, Jan 13, 2016 at 8:01 PM, Bhalla, Amneet Pal S wrote: > > Hi Folks, > > I am trying to profile my application code that uses a lot of PETSc solvers. I am running applications on OS X - Yosemite. I am thinking > of using HPCToolKit for the purpose, but could not find a dmg package for that. I have access to a remote linux machine that has HPCToolkit > and HPCViewer installed on it ? so I just need to have a viewer on my local Mac machine to analyze the files generate by HPCToolkit. > Has anyone tried building/installing these packages on OS X? > > I have not done it on OSX. Can you mail us a -log_summary for a rough cut? Sometimes its hard > to interpret the data avalanche from one of those tools without a simple map. > > Thanks, > > Matt > > Thanks, > > ? Amneet > > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener From jychang48 at gmail.com Wed Jan 13 21:05:44 2016 From: jychang48 at gmail.com (Justin Chang) Date: Wed, 13 Jan 2016 20:05:44 -0700 Subject: [petsc-users] Difference between Block Jacobi and ILU? Message-ID: Hi all, What exactly is the difference between these two preconditioners? When I use them to solve a Galerkin finite element poisson problem, I get the exact same performance (iterations, wall-clock time, etc). Only thing is I can't seem to use ILU in parallel though. Thanks, Justin -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Wed Jan 13 21:26:30 2016 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 13 Jan 2016 21:26:30 -0600 Subject: [petsc-users] Difference between Block Jacobi and ILU? In-Reply-To: References: Message-ID: On Wed, 13 Jan 2016, Justin Chang wrote: > Hi all, > > What exactly is the difference between these two preconditioners? When I > use them to solve a Galerkin finite element poisson problem, I get the > exact same performance (iterations, wall-clock time, etc). you mean - when you run sequentially? With block jacobi - you decide the number of blocks. The default is 1-block/proc i.e - for sequnetial run you have only 1block i.e the whole matrix. So the following are essentially the same: -pc_type bjacobi -pc_bjacobi_blocks 1 [default] -sub_pc_type ilu [default] -pc_type ilu Satish > Only thing is I can't seem to use ILU in parallel though. From jychang48 at gmail.com Wed Jan 13 21:37:12 2016 From: jychang48 at gmail.com (Justin Chang) Date: Wed, 13 Jan 2016 20:37:12 -0700 Subject: [petsc-users] Difference between Block Jacobi and ILU? In-Reply-To: References: Message-ID: Thanks Satish, And yes I meant sequentially. On Wed, Jan 13, 2016 at 8:26 PM, Satish Balay wrote: > On Wed, 13 Jan 2016, Justin Chang wrote: > > > Hi all, > > > > What exactly is the difference between these two preconditioners? When I > > use them to solve a Galerkin finite element poisson problem, I get the > > exact same performance (iterations, wall-clock time, etc). > > you mean - when you run sequentially? > > With block jacobi - you decide the number of blocks. The default is > 1-block/proc > i.e - for sequnetial run you have only 1block i.e the whole matrix. > > So the following are essentially the same: > -pc_type bjacobi -pc_bjacobi_blocks 1 [default] -sub_pc_type ilu [default] > -pc_type ilu > > Satish > > > Only thing is I can't seem to use ILU in parallel though. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jychang48 at gmail.com Wed Jan 13 21:57:38 2016 From: jychang48 at gmail.com (Justin Chang) Date: Wed, 13 Jan 2016 20:57:38 -0700 Subject: [petsc-users] Why use MATMPIBAIJ? Message-ID: Hi all, 1) I am guessing MATMPIBAIJ could theoretically have better performance than simply using MATMPIAIJ. Why is that? Is it similar to the reasoning that block (dense) matrix-vector multiply is "faster" than simple matrix-vector? 2) I am looking through the manual and online documentation and it seems the term "block" used everywhere. In the section on "block matrices" (3.1.3 of the manual), it refers to field splitting, where you could either have a monolithic matrix or a nested matrix. Does that concept have anything to do with MATMPIBAIJ? It makes sense to me that one could create a BAIJ where if you have 5 dofs of the same type of physics (e.g., five different primary species of a geochemical reaction) per grid point, you could create a block size of 5. And if you have different physics (e.g., velocity and pressure) you would ideally want to separate them out (i.e., nested matrices) for better preconditioning. Thanks, Justin -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed Jan 13 22:12:20 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 13 Jan 2016 22:12:20 -0600 Subject: [petsc-users] Why use MATMPIBAIJ? In-Reply-To: References: Message-ID: <2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov> > On Jan 13, 2016, at 9:57 PM, Justin Chang wrote: > > Hi all, > > 1) I am guessing MATMPIBAIJ could theoretically have better performance than simply using MATMPIAIJ. Why is that? Is it similar to the reasoning that block (dense) matrix-vector multiply is "faster" than simple matrix-vector? See for example table 1 in http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.38.7668&rep=rep1&type=pdf > > 2) I am looking through the manual and online documentation and it seems the term "block" used everywhere. In the section on "block matrices" (3.1.3 of the manual), it refers to field splitting, where you could either have a monolithic matrix or a nested matrix. Does that concept have anything to do with MATMPIBAIJ? Unfortunately the numerical analysis literature uses the term block in multiple ways. For small blocks, sometimes called "point-block" with BAIJ and for very large blocks (where the blocks are sparse themselves). I used fieldsplit for big sparse blocks to try to avoid confusion in PETSc. > > It makes sense to me that one could create a BAIJ where if you have 5 dofs of the same type of physics (e.g., five different primary species of a geochemical reaction) per grid point, you could create a block size of 5. And if you have different physics (e.g., velocity and pressure) you would ideally want to separate them out (i.e., nested matrices) for better preconditioning. Sometimes you put them together with BAIJ and sometimes you keep them separate with nested matrices. > > Thanks, > Justin From jychang48 at gmail.com Wed Jan 13 22:24:46 2016 From: jychang48 at gmail.com (Justin Chang) Date: Wed, 13 Jan 2016 21:24:46 -0700 Subject: [petsc-users] Why use MATMPIBAIJ? In-Reply-To: <2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov> References: <2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov> Message-ID: Thanks Barry, 1) So for block matrices, the ja array is smaller. But what's the "hardware" explanation for this performance improvement? Does it have to do with spatial locality where you are more likely to reuse data in that ja array, or does it have to do with the fact that loading/storing smaller arrays are less likely to invoke a cache miss, thus reducing the amount of bandwidth? 2) So if one wants to assemble a monolithic matrix (i.e., aggregation of more than one dof per point) then using the BAIJ format is highly advisable. But if I want to form a nested matrix, say I am solving Stokes equation, then each "submatrix" is of AIJ format? Can these sub matrices also be BAIJ? Thanks, Justin On Wed, Jan 13, 2016 at 9:12 PM, Barry Smith wrote: > > > On Jan 13, 2016, at 9:57 PM, Justin Chang wrote: > > > > Hi all, > > > > 1) I am guessing MATMPIBAIJ could theoretically have better performance > than simply using MATMPIAIJ. Why is that? Is it similar to the reasoning > that block (dense) matrix-vector multiply is "faster" than simple > matrix-vector? > > See for example table 1 in > http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.38.7668&rep=rep1&type=pdf > > > > > 2) I am looking through the manual and online documentation and it seems > the term "block" used everywhere. In the section on "block matrices" (3.1.3 > of the manual), it refers to field splitting, where you could either have a > monolithic matrix or a nested matrix. Does that concept have anything to do > with MATMPIBAIJ? > > Unfortunately the numerical analysis literature uses the term block in > multiple ways. For small blocks, sometimes called "point-block" with BAIJ > and for very large blocks (where the blocks are sparse themselves). I used > fieldsplit for big sparse blocks to try to avoid confusion in PETSc. > > > > It makes sense to me that one could create a BAIJ where if you have 5 > dofs of the same type of physics (e.g., five different primary species of a > geochemical reaction) per grid point, you could create a block size of 5. > And if you have different physics (e.g., velocity and pressure) you would > ideally want to separate them out (i.e., nested matrices) for better > preconditioning. > > Sometimes you put them together with BAIJ and sometimes you keep them > separate with nested matrices. > > > > > Thanks, > > Justin > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gideon.simpson at gmail.com Wed Jan 13 22:42:21 2016 From: gideon.simpson at gmail.com (Gideon Simpson) Date: Wed, 13 Jan 2016 23:42:21 -0500 Subject: [petsc-users] compiler error Message-ID: <18F5EB28-AE2E-4E28-B95E-2D6AD1DBECEE@gmail.com> I haven?t seen this before: /mnt/HA/opt/openmpi/intel/2015/1.8.1-mlnx-ofed/bin/mpicc -o fixed_batch.o -c -fPIC -wd1572 -g -I/home/simpson/software/petsc/include -I/home/simpson/software/petsc/arch-linux2-c-debug/include -I/mnt/HA/opt/openmpi/intel/2015/1.8.1-mlnx-ofed/include -Wall `pwd`/fixed_batch.c /home/simpson/projects/dnls/petsc/fixed_batch.c(44): warning #167: argument of type "PetscScalar={PetscReal={double}} *" is incompatible with parameter of type "const char *" PetscOptionsGetScalar(NULL,"-xmax",&xmax,NULL); ^ /home/simpson/projects/dnls/petsc/fixed_batch.c(44): error #165: too few arguments in function call PetscOptionsGetScalar(NULL,"-xmax",&xmax,NULL); -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Wed Jan 13 22:54:02 2016 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 13 Jan 2016 22:54:02 -0600 Subject: [petsc-users] compiler error In-Reply-To: <18F5EB28-AE2E-4E28-B95E-2D6AD1DBECEE@gmail.com> References: <18F5EB28-AE2E-4E28-B95E-2D6AD1DBECEE@gmail.com> Message-ID: On Wed, 13 Jan 2016, Gideon Simpson wrote: > I haven?t seen this before: > > /mnt/HA/opt/openmpi/intel/2015/1.8.1-mlnx-ofed/bin/mpicc -o fixed_batch.o -c -fPIC -wd1572 -g -I/home/simpson/software/petsc/include -I/home/simpson/software/petsc/arch-linux2-c-debug/include -I/mnt/HA/opt/openmpi/intel/2015/1.8.1-mlnx-ofed/include -Wall `pwd`/fixed_batch.c > /home/simpson/projects/dnls/petsc/fixed_batch.c(44): warning #167: argument of type "PetscScalar={PetscReal={double}} *" is incompatible with parameter of type "const char *" > PetscOptionsGetScalar(NULL,"-xmax",&xmax,NULL); > ^ > > /home/simpson/projects/dnls/petsc/fixed_batch.c(44): error #165: too few arguments in function call > PetscOptionsGetScalar(NULL,"-xmax",&xmax,NULL); Try: PetscOptionsGetScalar(NULL,NULL,"-xmax",&xmax,NULL); Satish From bsmith at mcs.anl.gov Wed Jan 13 23:12:18 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 13 Jan 2016 23:12:18 -0600 Subject: [petsc-users] Why use MATMPIBAIJ? In-Reply-To: References: <2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov> Message-ID: > On Jan 13, 2016, at 10:24 PM, Justin Chang wrote: > > Thanks Barry, > > 1) So for block matrices, the ja array is smaller. But what's the "hardware" explanation for this performance improvement? Does it have to do with spatial locality where you are more likely to reuse data in that ja array, or does it have to do with the fact that loading/storing smaller arrays are less likely to invoke a cache miss, thus reducing the amount of bandwidth? There are two distinct reasons for the improvement: 1) For 5 by 5 blocks the ja array is 1/25th the size. The "hardware" savings is that you have to load something that is much smaller than before. Cache/spatial locality have nothing to do with this particular improvement. 2) The other improvement comes from the reuse of each x[j] value multiplied by 5 values (a column) of the little block. The hardware explanation is that x[j] can be reused in a register for the 5 multiplies (while otherwise it would have to come from cache to register 5 times and sometimes might even have been flushed from the cache so would have to come from memory). This is why we have code like for (j=0; j > 2) So if one wants to assemble a monolithic matrix (i.e., aggregation of more than one dof per point) then using the BAIJ format is highly advisable. But if I want to form a nested matrix, say I am solving Stokes equation, then each "submatrix" is of AIJ format? Can these sub matrices also be BAIJ? Sure, but if you have separated all the variables of pressure, velocity_x, velocity_y, etc into there own regions of the vector then the block size for the sub matrices would be 1 so BAIJ does not help. There are Stokes solvers that use Vanka smoothing that keep the variables interlaced and hence would use BAIJ and NOT use fieldsplit > > Thanks, > Justin > > On Wed, Jan 13, 2016 at 9:12 PM, Barry Smith wrote: > > > On Jan 13, 2016, at 9:57 PM, Justin Chang wrote: > > > > Hi all, > > > > 1) I am guessing MATMPIBAIJ could theoretically have better performance than simply using MATMPIAIJ. Why is that? Is it similar to the reasoning that block (dense) matrix-vector multiply is "faster" than simple matrix-vector? > > See for example table 1 in http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.38.7668&rep=rep1&type=pdf > > > > > 2) I am looking through the manual and online documentation and it seems the term "block" used everywhere. In the section on "block matrices" (3.1.3 of the manual), it refers to field splitting, where you could either have a monolithic matrix or a nested matrix. Does that concept have anything to do with MATMPIBAIJ? > > Unfortunately the numerical analysis literature uses the term block in multiple ways. For small blocks, sometimes called "point-block" with BAIJ and for very large blocks (where the blocks are sparse themselves). I used fieldsplit for big sparse blocks to try to avoid confusion in PETSc. > > > > It makes sense to me that one could create a BAIJ where if you have 5 dofs of the same type of physics (e.g., five different primary species of a geochemical reaction) per grid point, you could create a block size of 5. And if you have different physics (e.g., velocity and pressure) you would ideally want to separate them out (i.e., nested matrices) for better preconditioning. > > Sometimes you put them together with BAIJ and sometimes you keep them separate with nested matrices. > > > > > Thanks, > > Justin > > From amneetb at live.unc.edu Wed Jan 13 23:12:46 2016 From: amneetb at live.unc.edu (Bhalla, Amneet Pal S) Date: Thu, 14 Jan 2016 05:12:46 +0000 Subject: [petsc-users] HPCToolKit/HPCViewer on OS X In-Reply-To: References: <669480D3-5CB2-4A0E-B6BD-FF4B702BB5A7@ad.unc.edu> Message-ID: <6780B32F-E05B-4FDB-9671-1A0C963B3B85@ad.unc.edu> On Jan 13, 2016, at 6:22 PM, Matthew Knepley > wrote: Can you mail us a -log_summary for a rough cut? Sometimes its hard to interpret the data avalanche from one of those tools without a simple map. Does this indicate some hot spots? ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./main2d on a darwin-dbg named Amneets-MBP.attlocal.net with 1 processor, by Taylor Wed Jan 13 21:07:43 2016 Using Petsc Development GIT revision: v3.6.1-2556-g6721a46 GIT Date: 2015-11-16 13:07:08 -0600 Max Max/Min Avg Total Time (sec): 1.039e+01 1.00000 1.039e+01 Objects: 2.834e+03 1.00000 2.834e+03 Flops: 3.552e+08 1.00000 3.552e+08 3.552e+08 Flops/sec: 3.418e+07 1.00000 3.418e+07 3.418e+07 Memory: 3.949e+07 1.00000 3.949e+07 MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00 MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00 MPI Reductions: 0.000e+00 0.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 1.0391e+01 100.0% 3.5520e+08 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ ########################################################## # # # WARNING!!! # # # # This code was compiled with a debugging option, # # To get timing results run ./configure # # using --with-debugging=no, the performance will # # be generally two or three times faster. # # # ########################################################## Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage VecDot 4 1.0 9.0525e-04 1.0 3.31e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 37 VecMDot 533 1.0 1.5936e-02 1.0 5.97e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 375 VecNorm 412 1.0 9.2107e-03 1.0 3.57e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 388 VecScale 331 1.0 5.8195e-01 1.0 1.41e+06 1.0 0.0e+00 0.0e+00 0.0e+00 6 0 0 0 0 6 0 0 0 0 2 VecCopy 116 1.0 1.9983e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 18362 1.0 1.5249e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecAXPY 254 1.0 4.3961e-01 1.0 1.95e+06 1.0 0.0e+00 0.0e+00 0.0e+00 4 1 0 0 0 4 1 0 0 0 4 VecAYPX 92 1.0 2.5167e-03 1.0 2.66e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 106 VecAXPBYCZ 36 1.0 8.6242e-04 1.0 2.94e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 341 VecWAXPY 58 1.0 1.2539e-03 1.0 2.47e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 197 VecMAXPY 638 1.0 2.3439e-02 1.0 7.68e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 328 VecSwap 111 1.0 1.9721e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 VecAssemblyBegin 607 1.0 3.8150e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 607 1.0 8.3705e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 26434 1.0 3.0096e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 VecNormalize 260 1.0 4.9754e-01 1.0 3.84e+06 1.0 0.0e+00 0.0e+00 0.0e+00 5 1 0 0 0 5 1 0 0 0 8 BuildTwoSidedF 600 1.0 1.8942e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatMult 365 1.0 6.0306e-01 1.0 6.26e+07 1.0 0.0e+00 0.0e+00 0.0e+00 6 18 0 0 0 6 18 0 0 0 104 MatSolve 8775 1.0 6.8506e-01 1.0 2.25e+08 1.0 0.0e+00 0.0e+00 0.0e+00 7 63 0 0 0 7 63 0 0 0 328 MatLUFactorSym 85 1.0 1.0664e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 MatLUFactorNum 85 1.0 1.2066e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 12 0 0 0 1 12 0 0 0 350 MatScale 4 1.0 4.0145e-04 1.0 2.51e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 625 MatAssemblyBegin 108 1.0 4.8849e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 108 1.0 9.8455e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetRow 33120 1.0 1.4157e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 MatGetRowIJ 85 1.0 2.6060e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetSubMatrice 4 1.0 4.2922e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 85 1.0 3.1230e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAXPY 4 1.0 4.0459e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 4 0 0 0 0 4 0 0 0 0 0 MatPtAP 4 1.0 1.1362e-01 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 44 MatPtAPSymbolic 4 1.0 6.4973e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 MatPtAPNumeric 4 1.0 4.8521e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 103 MatGetSymTrans 4 1.0 5.9780e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPGMRESOrthog 182 1.0 2.0538e-02 1.0 5.11e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 249 KSPSetUp 90 1.0 2.1210e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 9.5567e+00 1.0 3.50e+08 1.0 0.0e+00 0.0e+00 0.0e+00 92 98 0 0 0 92 98 0 0 0 37 PCSetUp 90 1.0 4.0597e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00 0.0e+00 4 12 0 0 0 4 12 0 0 0 104 PCSetUpOnBlocks 91 1.0 2.9886e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00 0.0e+00 3 12 0 0 0 3 12 0 0 0 141 PCApply 13 1.0 9.0558e+00 1.0 3.49e+08 1.0 0.0e+00 0.0e+00 0.0e+00 87 98 0 0 0 87 98 0 0 0 39 SNESSolve 1 1.0 9.5729e+00 1.0 3.50e+08 1.0 0.0e+00 0.0e+00 0.0e+00 92 98 0 0 0 92 98 0 0 0 37 SNESFunctionEval 2 1.0 1.3347e-02 1.0 4.68e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 4 SNESJacobianEval 1 1.0 2.4613e-03 1.0 4.26e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Vector 870 762 13314200 0. Vector Scatter 290 289 189584 0. Index Set 1171 823 951096 0. IS L to G Mapping 110 109 2156656 0. Application Order 6 6 99952 0. MatMFFD 1 1 776 0. Matrix 189 189 24202324 0. Matrix Null Space 4 4 2432 0. Krylov Solver 90 90 190080 0. DMKSP interface 1 1 648 0. Preconditioner 90 90 89128 0. SNES 1 1 1328 0. SNESLineSearch 1 1 856 0. DMSNES 1 1 664 0. Distributed Mesh 2 2 9024 0. Star Forest Bipartite Graph 4 4 3168 0. Discrete System 2 2 1696 0. Viewer 1 0 0 0. ======================================================================================================================== Average time to get PetscTime(): 4.74e-08 #PETSc Option Table entries: -ib_ksp_converged_reason -ib_ksp_monitor_true_residual -ib_snes_type ksponly -log_summary -stokes_ib_pc_level_ksp_richardson_self_scae -stokes_ib_pc_level_ksp_type gmres -stokes_ib_pc_level_pc_asm_local_type additive -stokes_ib_pc_level_pc_asm_type interpolate -stokes_ib_pc_level_sub_pc_factor_nonzeros_along_diagonal -stokes_ib_pc_level_sub_pc_type lu #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --CC=mpicc --CXX=mpicxx --FC=mpif90 --PETSC_ARCH=darwin-dbg --with-debugging=1 --with-c++-support=1 --with-hypre=1 --download-hypre=1 --with-hdf5=yes --with-hdf5-dir=/Users/Taylor/Documents/SOFTWARES/HDF5/ ----------------------------------------- Libraries compiled on Mon Nov 16 15:11:21 2015 on d209.math.ucdavis.edu Machine characteristics: Darwin-14.5.0-x86_64-i386-64bit Using PETSc directory: /Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc Using PETSc arch: darwin-dbg ----------------------------------------- Using C compiler: mpicc -g ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: mpif90 -g ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/include -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/include -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/include -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/include -I/opt/X11/include -I/Users/Taylor/Documents/SOFTWARES/HDF5/include -I/opt/local/include -I/Users/Taylor/Documents/SOFTWARES/MPICH/include ----------------------------------------- Using C linker: mpicc Using Fortran linker: mpif90 Using libraries: -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib -L/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib -lpetsc -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib -L/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib -lHYPRE -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/MPICH/lib -L/Users/Taylor/Documents/SOFTWARES/MPICH/lib -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/6.1.0/lib/darwin -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/6.1.0/lib/darwin -lclang_rt.osx -lmpicxx -lc++ -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin -lclang_rt.osx -llapack -lblas -Wl,-rpath,/opt/X11/lib -L/opt/X11/lib -lX11 -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/HDF5/lib -L/Users/Taylor/Documents/SOFTWARES/HDF5/lib -lhdf5_hl -lhdf5 -lssl -lcrypto -lmpifort -lgfortran -Wl,-rpath,/opt/local/lib/gcc49/gcc/x86_64-apple-darwin14/4.9.1 -L/opt/local/lib/gcc49/gcc/x86_64-apple-darwin14/4.9.1 -Wl,-rpath,/opt/local/lib/gcc49 -L/opt/local/lib/gcc49 -lgfortran -lgcc_ext.10.5 -lquadmath -lm -lclang_rt.osx -lmpicxx -lc++ -lclang_rt.osx -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/MPICH/lib -L/Users/Taylor/Documents/SOFTWARES/MPICH/lib -ldl -lmpi -lpmpi -lSystem -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin -lclang_rt.osx -ldl ----------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From boyceg at email.unc.edu Wed Jan 13 23:17:21 2016 From: boyceg at email.unc.edu (Griffith, Boyce Eugene) Date: Thu, 14 Jan 2016 05:17:21 +0000 Subject: [petsc-users] HPCToolKit/HPCViewer on OS X In-Reply-To: <6780B32F-E05B-4FDB-9671-1A0C963B3B85@ad.unc.edu> References: <669480D3-5CB2-4A0E-B6BD-FF4B702BB5A7@ad.unc.edu> <6780B32F-E05B-4FDB-9671-1A0C963B3B85@ad.unc.edu> Message-ID: <2A7F8BF5-7E17-4774-A13E-E75394109F95@email.unc.edu> I see one hot spot: On Jan 14, 2016, at 12:12 AM, Bhalla, Amneet Pal S > wrote: ########################################################## # # # WARNING!!! # # # # This code was compiled with a debugging option, # # To get timing results run ./configure # # using --with-debugging=no, the performance will # # be generally two or three times faster. # # # ########################################################## -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Wed Jan 13 23:22:18 2016 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 13 Jan 2016 21:22:18 -0800 Subject: [petsc-users] osx configuration error In-Reply-To: References: Message-ID: Thanks Satish, this worked. On Wed, Jan 13, 2016 at 9:49 AM, Satish Balay wrote: > >>>>>>>> > Executing: mpif90 -o > /var/folders/sw/67cq0mmx43g93vrb5xkf1j7c0000gn/T/petsc-2z06LS/config.setCompilers/conftest > > /var/folders/sw/67cq0mmx43g93vrb5xkf1j7c0000gn/T/petsc-2z06LS/config.setCompilers/conftest.o > Testing executable > /var/folders/sw/67cq0mmx43g93vrb5xkf1j7c0000gn/T/petsc-2z06LS/config.setCompilers/conftest > to see if it can be run > Executing: > /var/folders/sw/67cq0mmx43g93vrb5xkf1j7c0000gn/T/petsc-2z06LS/config.setCompilers/conftest > Executing: > /var/folders/sw/67cq0mmx43g93vrb5xkf1j7c0000gn/T/petsc-2z06LS/config.setCompilers/conftest > ERROR while running executable: Could not execute > "/var/folders/sw/67cq0mmx43g93vrb5xkf1j7c0000gn/T/petsc-2z06LS/config.setCompilers/conftest": > dyld: Library not loaded: > /Users/markadams/homebrew/lib/gcc/x86_64-apple-darwin13.4.0/4.9.1/libgfortran.3.dylib > Referenced from: /Users/markadams/homebrew/lib/libmpifort.12.dylib > Reason: image not found > <<<<<<<<<<< > > Mostlikely you haven't reinstalled mpich - as its refering to > gfortran-4.9.1. Current gfortran is 5.3 > GNU Fortran (Homebrew gcc 5.3.0) 5.3.0 > > > This is what I would do to reinstall brew > > 1. Make list of pkgs to reinstall > > brew leaves > reinstall.lst > > 2. delete all installed brew pacakges. > > brew cleanup > brew list > delete.lst > brew remove `cat delete.lst > > 3. Now reinstall all required packages > brew update > brew install `cat reinstall.lst` > > > Satish > > > On Wed, 13 Jan 2016, Mark Adams wrote: > > > I'm still having problems. I have upgraded gcc and mpich. I am now > > upgrading everything from homebrew. Any ideas on this error? > > thanks, > > > > On Wed, Jan 13, 2016 at 2:15 AM, Matthew Knepley > wrote: > > > > > On Tue, Jan 12, 2016 at 6:31 PM, Satish Balay > wrote: > > > > > >> > 'file' object has no attribute 'getvalue' File > > >> "/Users/markadams/Codes/petsc/config/configure.py", line 363, in > > >> petsc_configure > > >> > > >> Hm - have to figure this one out - but the primary issue is: > > >> > > >> > stderr: > > >> > gfortran: warning: couldn't understand kern.osversion '15.2.0 > > >> > ld: -rpath can only be used when targeting Mac OS X 10.5 or later > > >> > > > > > > I get this. The remedy I use is to put > > > > > > MACOSX_DEPLOYMENT_TARGET=10.5 > > > > > > in the environment. Its annoying, and quintessentially Mac. > > > > > > Matt > > > > > > > > >> Perhaps you've updated xcode or OSX - but did not reinstall > brew/gfortran. > > >> > > >> > Executing: mpif90 --version > > >> > stdout: > > >> > GNU Fortran (Homebrew gcc 4.9.1) 4.9.1 > > >> > > >> I suggest uninstalling/reinstalling homebrew packages. > > >> > > >> Satish > > >> > > >> > > >> > > >> On Tue, 12 Jan 2016, Mark Adams wrote: > > >> > > >> > I did nuke the arch directory. This has worked in the past and > don't > > >> know > > >> > what I might have changed. it's been awhile since I've > reconfigured. > > >> > Thanks, > > >> > Mark > > >> > > > >> > > >> > > > > > > > > > -- > > > What most experimenters take for granted before they begin their > > > experiments is infinitely more interesting than any results to which > their > > > experiments lead. > > > -- Norbert Wiener > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From praveenpetsc at gmail.com Thu Jan 14 00:03:52 2016 From: praveenpetsc at gmail.com (praveen kumar) Date: Thu, 14 Jan 2016 11:33:52 +0530 Subject: [petsc-users] undefined reference error in make test Message-ID: I?ve written a fortan code (F90) for domain decomposition.* I've specified **the paths of include files and libraries, but the compiler/linker still * *complained about undefined references.undefined reference to `vectorset_'undefined reference to `dmdagetlocalinfo_'*I?m attaching makefile and code. any help will be appreciated. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: makefile Type: application/octet-stream Size: 326 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test.F90 Type: text/x-fortran Size: 3078 bytes Desc: not available URL: From jychang48 at gmail.com Thu Jan 14 00:05:59 2016 From: jychang48 at gmail.com (Justin Chang) Date: Wed, 13 Jan 2016 23:05:59 -0700 Subject: [petsc-users] HPCToolKit/HPCViewer on OS X In-Reply-To: <2A7F8BF5-7E17-4774-A13E-E75394109F95@email.unc.edu> References: <669480D3-5CB2-4A0E-B6BD-FF4B702BB5A7@ad.unc.edu> <6780B32F-E05B-4FDB-9671-1A0C963B3B85@ad.unc.edu> <2A7F8BF5-7E17-4774-A13E-E75394109F95@email.unc.edu> Message-ID: HPCToolkit for MacOSX doesn't require any installation. Just go to: http://hpctoolkit.org/download/hpcviewer/ and download this file: hpctraceviewer-5.4.2-r20160111-macosx.cocoa.x86_64.zip Important note: be sure to unzip the file via the terminal, not with Finder. It may screw up the GUI. On my MacOSX I had to "Download Linked File As..." Then you can drag the corresponding hpctraceviewer.app into your Applications directory. Now you *should* be good to go. Thanks, Justin On Wed, Jan 13, 2016 at 10:17 PM, Griffith, Boyce Eugene < boyceg at email.unc.edu> wrote: > I see one hot spot: > > On Jan 14, 2016, at 12:12 AM, Bhalla, Amneet Pal S > wrote: > > ########################################################## > # # > # WARNING!!! # > # # > # This code was compiled with a debugging option, # > # To get timing results run ./configure # > # using --with-debugging=no, the performance will # > # be generally two or three times faster. # > # # > ########################################################## > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jychang48 at gmail.com Thu Jan 14 00:13:31 2016 From: jychang48 at gmail.com (Justin Chang) Date: Wed, 13 Jan 2016 23:13:31 -0700 Subject: [petsc-users] Why use MATMPIBAIJ? In-Reply-To: References: <2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov> Message-ID: Okay that makes sense, thanks On Wed, Jan 13, 2016 at 10:12 PM, Barry Smith wrote: > > > On Jan 13, 2016, at 10:24 PM, Justin Chang wrote: > > > > Thanks Barry, > > > > 1) So for block matrices, the ja array is smaller. But what's the > "hardware" explanation for this performance improvement? Does it have to do > with spatial locality where you are more likely to reuse data in that ja > array, or does it have to do with the fact that loading/storing smaller > arrays are less likely to invoke a cache miss, thus reducing the amount of > bandwidth? > > There are two distinct reasons for the improvement: > > 1) For 5 by 5 blocks the ja array is 1/25th the size. The "hardware" > savings is that you have to load something that is much smaller than > before. Cache/spatial locality have nothing to do with this particular > improvement. > > 2) The other improvement comes from the reuse of each x[j] value > multiplied by 5 values (a column) of the little block. The hardware > explanation is that x[j] can be reused in a register for the 5 multiplies > (while otherwise it would have to come from cache to register 5 times and > sometimes might even have been flushed from the cache so would have to come > from memory). This is why we have code like > > for (j=0; j xb = x + 5*(*idx++); > x1 = xb[0]; x2 = xb[1]; x3 = xb[2]; x4 = xb[3]; x5 = xb[4]; > sum1 += v[0]*x1 + v[5]*x2 + v[10]*x3 + v[15]*x4 + v[20]*x5; > sum2 += v[1]*x1 + v[6]*x2 + v[11]*x3 + v[16]*x4 + v[21]*x5; > sum3 += v[2]*x1 + v[7]*x2 + v[12]*x3 + v[17]*x4 + v[22]*x5; > sum4 += v[3]*x1 + v[8]*x2 + v[13]*x3 + v[18]*x4 + v[23]*x5; > sum5 += v[4]*x1 + v[9]*x2 + v[14]*x3 + v[19]*x4 + v[24]*x5; > v += 25; > } > > to do the block multiple. > > > > > 2) So if one wants to assemble a monolithic matrix (i.e., aggregation of > more than one dof per point) then using the BAIJ format is highly > advisable. But if I want to form a nested matrix, say I am solving Stokes > equation, then each "submatrix" is of AIJ format? Can these sub matrices > also be BAIJ? > > Sure, but if you have separated all the variables of pressure, > velocity_x, velocity_y, etc into there own regions of the vector then the > block size for the sub matrices would be 1 so BAIJ does not help. > > There are Stokes solvers that use Vanka smoothing that keep the > variables interlaced and hence would use BAIJ and NOT use fieldsplit > > > > > > Thanks, > > Justin > > > > On Wed, Jan 13, 2016 at 9:12 PM, Barry Smith wrote: > > > > > On Jan 13, 2016, at 9:57 PM, Justin Chang wrote: > > > > > > Hi all, > > > > > > 1) I am guessing MATMPIBAIJ could theoretically have better > performance than simply using MATMPIAIJ. Why is that? Is it similar to the > reasoning that block (dense) matrix-vector multiply is "faster" than simple > matrix-vector? > > > > See for example table 1 in > http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.38.7668&rep=rep1&type=pdf > > > > > > > > 2) I am looking through the manual and online documentation and it > seems the term "block" used everywhere. In the section on "block matrices" > (3.1.3 of the manual), it refers to field splitting, where you could either > have a monolithic matrix or a nested matrix. Does that concept have > anything to do with MATMPIBAIJ? > > > > Unfortunately the numerical analysis literature uses the term block > in multiple ways. For small blocks, sometimes called "point-block" with > BAIJ and for very large blocks (where the blocks are sparse themselves). I > used fieldsplit for big sparse blocks to try to avoid confusion in PETSc. > > > > > > It makes sense to me that one could create a BAIJ where if you have 5 > dofs of the same type of physics (e.g., five different primary species of a > geochemical reaction) per grid point, you could create a block size of 5. > And if you have different physics (e.g., velocity and pressure) you would > ideally want to separate them out (i.e., nested matrices) for better > preconditioning. > > > > Sometimes you put them together with BAIJ and sometimes you keep them > separate with nested matrices. > > > > > > > > Thanks, > > > Justin > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From amneetb at live.unc.edu Thu Jan 14 00:19:42 2016 From: amneetb at live.unc.edu (Bhalla, Amneet Pal S) Date: Thu, 14 Jan 2016 06:19:42 +0000 Subject: [petsc-users] HPCToolKit/HPCViewer on OS X In-Reply-To: References: <669480D3-5CB2-4A0E-B6BD-FF4B702BB5A7@ad.unc.edu> <6780B32F-E05B-4FDB-9671-1A0C963B3B85@ad.unc.edu> <2A7F8BF5-7E17-4774-A13E-E75394109F95@email.unc.edu> Message-ID: Thanks! That worked for me. On Jan 13, 2016, at 10:05 PM, Justin Chang > wrote: HPCToolkit for MacOSX doesn't require any installation. Just go to: http://hpctoolkit.org/download/hpcviewer/ and download this file: hpctraceviewer-5.4.2-r20160111-macosx.cocoa.x86_64.zip Important note: be sure to unzip the file via the terminal, not with Finder. It may screw up the GUI. On my MacOSX I had to "Download Linked File As..." Then you can drag the corresponding hpctraceviewer.app into your Applications directory. Now you *should* be good to go. Thanks, Justin On Wed, Jan 13, 2016 at 10:17 PM, Griffith, Boyce Eugene > wrote: I see one hot spot: On Jan 14, 2016, at 12:12 AM, Bhalla, Amneet Pal S > wrote: ########################################################## # # # WARNING!!! # # # # This code was compiled with a debugging option, # # To get timing results run ./configure # # using --with-debugging=no, the performance will # # be generally two or three times faster. # # # ########################################################## -------------- next part -------------- An HTML attachment was scrubbed... URL: From amneetb at live.unc.edu Thu Jan 14 01:26:57 2016 From: amneetb at live.unc.edu (Bhalla, Amneet Pal S) Date: Thu, 14 Jan 2016 07:26:57 +0000 Subject: [petsc-users] HPCToolKit/HPCViewer on OS X In-Reply-To: <2A7F8BF5-7E17-4774-A13E-E75394109F95@email.unc.edu> References: <669480D3-5CB2-4A0E-B6BD-FF4B702BB5A7@ad.unc.edu> <6780B32F-E05B-4FDB-9671-1A0C963B3B85@ad.unc.edu> <2A7F8BF5-7E17-4774-A13E-E75394109F95@email.unc.edu> Message-ID: On Jan 13, 2016, at 9:17 PM, Griffith, Boyce Eugene > wrote: I see one hot spot: Here is with opt build ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./main2d on a linux-opt named aorta with 1 processor, by amneetb Thu Jan 14 02:24:43 2016 Using Petsc Development GIT revision: v3.6.3-3098-ga3ecda2 GIT Date: 2016-01-13 21:30:26 -0600 Max Max/Min Avg Total Time (sec): 1.018e+00 1.00000 1.018e+00 Objects: 2.935e+03 1.00000 2.935e+03 Flops: 4.957e+08 1.00000 4.957e+08 4.957e+08 Flops/sec: 4.868e+08 1.00000 4.868e+08 4.868e+08 MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00 MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00 MPI Reductions: 0.000e+00 0.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 1.0183e+00 100.0% 4.9570e+08 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage VecDot 4 1.0 2.9564e-05 1.0 3.31e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1120 VecDotNorm2 272 1.0 1.4565e-03 1.0 4.25e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 2920 VecMDot 624 1.0 8.4300e-03 1.0 5.29e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 627 VecNorm 565 1.0 3.8033e-03 1.0 4.38e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1151 VecScale 86 1.0 5.5480e-04 1.0 1.55e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 279 VecCopy 28 1.0 5.2261e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 14567 1.0 1.2443e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecAXPY 903 1.0 4.2996e-03 1.0 6.66e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1550 VecAYPX 225 1.0 1.2550e-03 1.0 8.55e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 681 VecAXPBYCZ 42 1.0 1.7118e-04 1.0 3.45e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2014 VecWAXPY 70 1.0 1.9503e-04 1.0 2.98e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1528 VecMAXPY 641 1.0 1.1136e-02 1.0 5.29e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 475 VecSwap 135 1.0 4.5896e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyBegin 745 1.0 4.9477e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 745 1.0 9.2411e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 40831 1.0 3.4502e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 BuildTwoSidedF 738 1.0 2.6712e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatMult 513 1.0 9.1235e-02 1.0 7.75e+07 1.0 0.0e+00 0.0e+00 0.0e+00 9 16 0 0 0 9 16 0 0 0 849 MatSolve 13568 1.0 2.3605e-01 1.0 3.45e+08 1.0 0.0e+00 0.0e+00 0.0e+00 23 70 0 0 0 23 70 0 0 0 1460 MatLUFactorSym 84 1.0 3.7430e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 4 0 0 0 0 4 0 0 0 0 0 MatLUFactorNum 85 1.0 3.9623e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00 4 8 0 0 0 4 8 0 0 0 1058 MatILUFactorSym 1 1.0 3.3617e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatScale 4 1.0 2.5511e-04 1.0 2.51e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 984 MatAssemblyBegin 108 1.0 6.3658e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 108 1.0 2.9490e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetRow 33120 1.0 2.0157e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 MatGetRowIJ 85 1.0 1.2145e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetSubMatrice 4 1.0 8.4379e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 MatGetOrdering 85 1.0 7.7887e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 MatAXPY 4 1.0 4.9596e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 5 0 0 0 0 5 0 0 0 0 0 MatPtAP 4 1.0 4.4426e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 0.0e+00 4 1 0 0 0 4 1 0 0 0 112 MatPtAPSymbolic 4 1.0 2.7664e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 MatPtAPNumeric 4 1.0 1.6732e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 298 MatGetSymTrans 4 1.0 3.6621e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPGMRESOrthog 16 1.0 9.7778e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 KSPSetUp 90 1.0 5.7650e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 7.8831e-01 1.0 4.90e+08 1.0 0.0e+00 0.0e+00 0.0e+00 77 99 0 0 0 77 99 0 0 0 622 PCSetUp 90 1.0 9.9725e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00 10 8 0 0 0 10 8 0 0 0 420 PCSetUpOnBlocks 112 1.0 8.7547e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00 9 8 0 0 0 9 8 0 0 0 479 PCApply 16 1.0 7.1952e-01 1.0 4.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00 71 99 0 0 0 71 99 0 0 0 680 SNESSolve 1 1.0 7.9225e-01 1.0 4.90e+08 1.0 0.0e+00 0.0e+00 0.0e+00 78 99 0 0 0 78 99 0 0 0 619 SNESFunctionEval 2 1.0 3.2940e-03 1.0 4.68e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 14 SNESJacobianEval 1 1.0 4.7255e-04 1.0 4.26e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 9 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Vector 971 839 15573352 0. Vector Scatter 290 289 189584 0. Index Set 1171 823 951928 0. IS L to G Mapping 110 109 2156656 0. Application Order 6 6 99952 0. MatMFFD 1 1 776 0. Matrix 189 189 24083332 0. Matrix Null Space 4 4 2432 0. Krylov Solver 90 90 122720 0. DMKSP interface 1 1 648 0. Preconditioner 90 90 89872 0. SNES 1 1 1328 0. SNESLineSearch 1 1 984 0. DMSNES 1 1 664 0. Distributed Mesh 2 2 9168 0. Star Forest Bipartite Graph 4 4 3168 0. Discrete System 2 2 1712 0. Viewer 1 0 0 0. ======================================================================================================================== Average time to get PetscTime(): 9.53674e-07 #PETSc Option Table entries: -ib_ksp_converged_reason -ib_ksp_monitor_true_residual -ib_snes_type ksponly -log_summary -stokes_ib_pc_level_0_sub_pc_factor_nonzeros_along_diagonal -stokes_ib_pc_level_0_sub_pc_type ilu -stokes_ib_pc_level_ksp_richardson_self_scale -stokes_ib_pc_level_ksp_type richardson -stokes_ib_pc_level_pc_asm_local_type additive -stokes_ib_pc_level_sub_pc_factor_nonzeros_along_diagonal -stokes_ib_pc_level_sub_pc_type lu #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --CC=mpicc --CXX=mpicxx --FC=mpif90 --with-default-arch=0 --PETSC_ARCH=linux-opt --with-debugging=0 --with-c++-support=1 --with-hypre=1 --download-hypre=1 --with-hdf5=yes --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3 ----------------------------------------- Libraries compiled on Thu Jan 14 01:29:56 2016 on aorta Machine characteristics: Linux-3.13.0-63-generic-x86_64-with-Ubuntu-14.04-trusty Using PETSc directory: /not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc Using PETSc arch: linux-opt ----------------------------------------- Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -Qunused-arguments -O3 ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: mpif90 -fPIC -Wall -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -O3 ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/include -I/not_backed_up/softwares/MPICH/include ----------------------------------------- Using C linker: mpicc Using Fortran linker: mpif90 Using libraries: -Wl,-rpath,/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -L/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -lpetsc -Wl,-rpath,/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -L/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -lHYPRE -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -L/not_backed_up/softwares/MPICH/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpicxx -lstdc++ -llapack -lblas -lpthread -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lX11 -lm -lmpifort -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpicxx -lstdc++ -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -L/not_backed_up/softwares/MPICH/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -ldl -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -lmpi -lgcc_s -ldl ----------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From hgbk2008 at gmail.com Thu Jan 14 05:04:35 2016 From: hgbk2008 at gmail.com (Hoang Giang Bui) Date: Thu, 14 Jan 2016 12:04:35 +0100 Subject: [petsc-users] Why use MATMPIBAIJ? In-Reply-To: References: <2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov> Message-ID: This is a very interesting thread because use of block matrix improves the performance of AMG a lot. In my case is the elasticity problem. One more question I like to ask, which is more on the performance of the solver. That if I have a coupled problem, says the point block is [u_x u_y u_z p] in which entries of p block in stiffness matrix is in a much smaller scale than u (p~1e-6, u~1e+8), then AMG with hypre in PETSc still scale? Also, is there a utility in PETSc which does automatic scaling of variables? Giang On Thu, Jan 14, 2016 at 7:13 AM, Justin Chang wrote: > Okay that makes sense, thanks > > On Wed, Jan 13, 2016 at 10:12 PM, Barry Smith wrote: > >> >> > On Jan 13, 2016, at 10:24 PM, Justin Chang wrote: >> > >> > Thanks Barry, >> > >> > 1) So for block matrices, the ja array is smaller. But what's the >> "hardware" explanation for this performance improvement? Does it have to do >> with spatial locality where you are more likely to reuse data in that ja >> array, or does it have to do with the fact that loading/storing smaller >> arrays are less likely to invoke a cache miss, thus reducing the amount of >> bandwidth? >> >> There are two distinct reasons for the improvement: >> >> 1) For 5 by 5 blocks the ja array is 1/25th the size. The "hardware" >> savings is that you have to load something that is much smaller than >> before. Cache/spatial locality have nothing to do with this particular >> improvement. >> >> 2) The other improvement comes from the reuse of each x[j] value >> multiplied by 5 values (a column) of the little block. The hardware >> explanation is that x[j] can be reused in a register for the 5 multiplies >> (while otherwise it would have to come from cache to register 5 times and >> sometimes might even have been flushed from the cache so would have to come >> from memory). This is why we have code like >> >> for (j=0; j> xb = x + 5*(*idx++); >> x1 = xb[0]; x2 = xb[1]; x3 = xb[2]; x4 = xb[3]; x5 = xb[4]; >> sum1 += v[0]*x1 + v[5]*x2 + v[10]*x3 + v[15]*x4 + v[20]*x5; >> sum2 += v[1]*x1 + v[6]*x2 + v[11]*x3 + v[16]*x4 + v[21]*x5; >> sum3 += v[2]*x1 + v[7]*x2 + v[12]*x3 + v[17]*x4 + v[22]*x5; >> sum4 += v[3]*x1 + v[8]*x2 + v[13]*x3 + v[18]*x4 + v[23]*x5; >> sum5 += v[4]*x1 + v[9]*x2 + v[14]*x3 + v[19]*x4 + v[24]*x5; >> v += 25; >> } >> >> to do the block multiple. >> >> > >> > 2) So if one wants to assemble a monolithic matrix (i.e., aggregation >> of more than one dof per point) then using the BAIJ format is highly >> advisable. But if I want to form a nested matrix, say I am solving Stokes >> equation, then each "submatrix" is of AIJ format? Can these sub matrices >> also be BAIJ? >> >> Sure, but if you have separated all the variables of pressure, >> velocity_x, velocity_y, etc into there own regions of the vector then the >> block size for the sub matrices would be 1 so BAIJ does not help. >> >> There are Stokes solvers that use Vanka smoothing that keep the >> variables interlaced and hence would use BAIJ and NOT use fieldsplit >> >> >> > >> > Thanks, >> > Justin >> > >> > On Wed, Jan 13, 2016 at 9:12 PM, Barry Smith >> wrote: >> > >> > > On Jan 13, 2016, at 9:57 PM, Justin Chang >> wrote: >> > > >> > > Hi all, >> > > >> > > 1) I am guessing MATMPIBAIJ could theoretically have better >> performance than simply using MATMPIAIJ. Why is that? Is it similar to the >> reasoning that block (dense) matrix-vector multiply is "faster" than simple >> matrix-vector? >> > >> > See for example table 1 in >> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.38.7668&rep=rep1&type=pdf >> > >> > > >> > > 2) I am looking through the manual and online documentation and it >> seems the term "block" used everywhere. In the section on "block matrices" >> (3.1.3 of the manual), it refers to field splitting, where you could either >> have a monolithic matrix or a nested matrix. Does that concept have >> anything to do with MATMPIBAIJ? >> > >> > Unfortunately the numerical analysis literature uses the term block >> in multiple ways. For small blocks, sometimes called "point-block" with >> BAIJ and for very large blocks (where the blocks are sparse themselves). I >> used fieldsplit for big sparse blocks to try to avoid confusion in PETSc. >> > > >> > > It makes sense to me that one could create a BAIJ where if you have 5 >> dofs of the same type of physics (e.g., five different primary species of a >> geochemical reaction) per grid point, you could create a block size of 5. >> And if you have different physics (e.g., velocity and pressure) you would >> ideally want to separate them out (i.e., nested matrices) for better >> preconditioning. >> > >> > Sometimes you put them together with BAIJ and sometimes you keep >> them separate with nested matrices. >> > >> > > >> > > Thanks, >> > > Justin >> > >> > >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Jan 14 07:24:54 2016 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 14 Jan 2016 07:24:54 -0600 Subject: [petsc-users] HPCToolKit/HPCViewer on OS X In-Reply-To: <6780B32F-E05B-4FDB-9671-1A0C963B3B85@ad.unc.edu> References: <669480D3-5CB2-4A0E-B6BD-FF4B702BB5A7@ad.unc.edu> <6780B32F-E05B-4FDB-9671-1A0C963B3B85@ad.unc.edu> Message-ID: On Wed, Jan 13, 2016 at 11:12 PM, Bhalla, Amneet Pal S wrote: > > > On Jan 13, 2016, at 6:22 PM, Matthew Knepley wrote: > > Can you mail us a -log_summary for a rough cut? Sometimes its hard > to interpret the data avalanche from one of those tools without a simple > map. > > > Does this indicate some hot spots? > 1) There is a misspelled option -stokes_ib_pc_level_ksp_richardson_self_scae You can try to avoid this by giving -options_left 2) Are you using any custom code during the solve? There is a gaping whole in the timing. It take 9s to do PCApply(), but something like a collective 1s to do everything we time under that. Since this is serial, we can use something like kcachegrind to look at performance as well, which should at least tell us what is sucking up this time so we can put a PETSc even on it. Thanks, Matt > > ************************************************************************************************************************ > *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r > -fCourier9' to print this document *** > > ************************************************************************************************************************ > > ---------------------------------------------- PETSc Performance Summary: > ---------------------------------------------- > > ./main2d on a darwin-dbg named Amneets-MBP.attlocal.net with 1 processor, > by Taylor Wed Jan 13 21:07:43 2016 > Using Petsc Development GIT revision: v3.6.1-2556-g6721a46 GIT Date: > 2015-11-16 13:07:08 -0600 > > Max Max/Min Avg Total > Time (sec): 1.039e+01 1.00000 1.039e+01 > Objects: 2.834e+03 1.00000 2.834e+03 > Flops: 3.552e+08 1.00000 3.552e+08 3.552e+08 > Flops/sec: 3.418e+07 1.00000 3.418e+07 3.418e+07 > Memory: 3.949e+07 1.00000 3.949e+07 > MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00 > MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00 > MPI Reductions: 0.000e+00 0.00000 > > Flop counting convention: 1 flop = 1 real number operation of type > (multiply/divide/add/subtract) > e.g., VecAXPY() for real vectors of length N > --> 2N flops > and VecAXPY() for complex vectors of length N > --> 8N flops > > Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages > --- -- Message Lengths -- -- Reductions -- > Avg %Total Avg %Total counts > %Total Avg %Total counts %Total > 0: Main Stage: 1.0391e+01 100.0% 3.5520e+08 100.0% 0.000e+00 > 0.0% 0.000e+00 0.0% 0.000e+00 0.0% > > > ------------------------------------------------------------------------------------------------------------------------ > See the 'Profiling' chapter of the users' manual for details on > interpreting output. > Phase summary info: > Count: number of times phase was executed > Time and Flops: Max - maximum over all processors > Ratio - ratio of maximum to minimum over all processors > Mess: number of messages sent > Avg. len: average message length (bytes) > Reduct: number of global reductions > Global: entire computation > Stage: stages of a computation. Set stages with PetscLogStagePush() and > PetscLogStagePop(). > %T - percent time in this phase %F - percent flops in this > phase > %M - percent messages in this phase %L - percent message lengths > in this phase > %R - percent reductions in this phase > Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time > over all processors) > > ------------------------------------------------------------------------------------------------------------------------ > > > ########################################################## > # # > # WARNING!!! # > # # > # This code was compiled with a debugging option, # > # To get timing results run ./configure # > # using --with-debugging=no, the performance will # > # be generally two or three times faster. # > # # > ########################################################## > > > Event Count Time (sec) Flops > --- Global --- --- Stage --- Total > Max Ratio Max Ratio Max Ratio Mess Avg len > Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > > ------------------------------------------------------------------------------------------------------------------------ > > --- Event Stage 0: Main Stage > > VecDot 4 1.0 9.0525e-04 1.0 3.31e+04 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 37 > VecMDot 533 1.0 1.5936e-02 1.0 5.97e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 2 0 0 0 0 2 0 0 0 375 > VecNorm 412 1.0 9.2107e-03 1.0 3.57e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 1 0 0 0 0 1 0 0 0 388 > VecScale 331 1.0 5.8195e-01 1.0 1.41e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 6 0 0 0 0 6 0 0 0 0 2 > VecCopy 116 1.0 1.9983e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecSet 18362 1.0 1.5249e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > VecAXPY 254 1.0 4.3961e-01 1.0 1.95e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 4 1 0 0 0 4 1 0 0 0 4 > VecAYPX 92 1.0 2.5167e-03 1.0 2.66e+05 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 106 > VecAXPBYCZ 36 1.0 8.6242e-04 1.0 2.94e+05 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 341 > VecWAXPY 58 1.0 1.2539e-03 1.0 2.47e+05 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 197 > VecMAXPY 638 1.0 2.3439e-02 1.0 7.68e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 2 0 0 0 0 2 0 0 0 328 > VecSwap 111 1.0 1.9721e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 > VecAssemblyBegin 607 1.0 3.8150e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecAssemblyEnd 607 1.0 8.3705e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecScatterBegin 26434 1.0 3.0096e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 > VecNormalize 260 1.0 4.9754e-01 1.0 3.84e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 5 1 0 0 0 5 1 0 0 0 8 > BuildTwoSidedF 600 1.0 1.8942e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatMult 365 1.0 6.0306e-01 1.0 6.26e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 6 18 0 0 0 6 18 0 0 0 104 > MatSolve 8775 1.0 6.8506e-01 1.0 2.25e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 7 63 0 0 0 7 63 0 0 0 328 > MatLUFactorSym 85 1.0 1.0664e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > MatLUFactorNum 85 1.0 1.2066e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 1 12 0 0 0 1 12 0 0 0 350 > MatScale 4 1.0 4.0145e-04 1.0 2.51e+05 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 625 > MatAssemblyBegin 108 1.0 4.8849e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyEnd 108 1.0 9.8455e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatGetRow 33120 1.0 1.4157e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > MatGetRowIJ 85 1.0 2.6060e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatGetSubMatrice 4 1.0 4.2922e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatGetOrdering 85 1.0 3.1230e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAXPY 4 1.0 4.0459e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 4 0 0 0 0 4 0 0 0 0 0 > MatPtAP 4 1.0 1.1362e-01 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 1 1 0 0 0 1 1 0 0 0 44 > MatPtAPSymbolic 4 1.0 6.4973e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > MatPtAPNumeric 4 1.0 4.8521e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 1 0 0 0 0 1 0 0 0 103 > MatGetSymTrans 4 1.0 5.9780e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPGMRESOrthog 182 1.0 2.0538e-02 1.0 5.11e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 1 0 0 0 0 1 0 0 0 249 > KSPSetUp 90 1.0 2.1210e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSolve 1 1.0 9.5567e+00 1.0 3.50e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 92 98 0 0 0 92 98 0 0 0 37 > PCSetUp 90 1.0 4.0597e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 4 12 0 0 0 4 12 0 0 0 104 > PCSetUpOnBlocks 91 1.0 2.9886e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 3 12 0 0 0 3 12 0 0 0 141 > PCApply 13 1.0 9.0558e+00 1.0 3.49e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 87 98 0 0 0 87 98 0 0 0 39 > SNESSolve 1 1.0 9.5729e+00 1.0 3.50e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 92 98 0 0 0 92 98 0 0 0 37 > SNESFunctionEval 2 1.0 1.3347e-02 1.0 4.68e+04 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 4 > SNESJacobianEval 1 1.0 2.4613e-03 1.0 4.26e+03 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 2 > > ------------------------------------------------------------------------------------------------------------------------ > > Memory usage is given in bytes: > > Object Type Creations Destructions Memory Descendants' Mem. > Reports information only for process 0. > > --- Event Stage 0: Main Stage > > Vector 870 762 13314200 0. > Vector Scatter 290 289 189584 0. > Index Set 1171 823 951096 0. > IS L to G Mapping 110 109 2156656 0. > Application Order 6 6 99952 0. > MatMFFD 1 1 776 0. > Matrix 189 189 24202324 0. > Matrix Null Space 4 4 2432 0. > Krylov Solver 90 90 190080 0. > DMKSP interface 1 1 648 0. > Preconditioner 90 90 89128 0. > SNES 1 1 1328 0. > SNESLineSearch 1 1 856 0. > DMSNES 1 1 664 0. > Distributed Mesh 2 2 9024 0. > Star Forest Bipartite Graph 4 4 3168 0. > Discrete System 2 2 1696 0. > Viewer 1 0 0 0. > > ======================================================================================================================== > Average time to get PetscTime(): 4.74e-08 > #PETSc Option Table entries: > -ib_ksp_converged_reason > -ib_ksp_monitor_true_residual > -ib_snes_type ksponly > -log_summary > -stokes_ib_pc_level_ksp_richardson_self_scae > -stokes_ib_pc_level_ksp_type gmres > -stokes_ib_pc_level_pc_asm_local_type additive > -stokes_ib_pc_level_pc_asm_type interpolate > -stokes_ib_pc_level_sub_pc_factor_nonzeros_along_diagonal > -stokes_ib_pc_level_sub_pc_type lu > #End of PETSc Option Table entries > Compiled without FORTRAN kernels > Compiled with full precision matrices (default) > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 > sizeof(PetscScalar) 8 sizeof(PetscInt) 4 > Configure options: --CC=mpicc --CXX=mpicxx --FC=mpif90 > --PETSC_ARCH=darwin-dbg --with-debugging=1 --with-c++-support=1 > --with-hypre=1 --download-hypre=1 --with-hdf5=yes > --with-hdf5-dir=/Users/Taylor/Documents/SOFTWARES/HDF5/ > ----------------------------------------- > Libraries compiled on Mon Nov 16 15:11:21 2015 on d209.math.ucdavis.edu > Machine characteristics: Darwin-14.5.0-x86_64-i386-64bit > Using PETSc directory: > /Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc > Using PETSc arch: darwin-dbg > ----------------------------------------- > > Using C compiler: mpicc -g ${COPTFLAGS} ${CFLAGS} > Using Fortran compiler: mpif90 -g ${FOPTFLAGS} ${FFLAGS} > ----------------------------------------- > > Using include paths: > -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/include > -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/include > -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/include > -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/include > -I/opt/X11/include -I/Users/Taylor/Documents/SOFTWARES/HDF5/include > -I/opt/local/include -I/Users/Taylor/Documents/SOFTWARES/MPICH/include > ----------------------------------------- > > Using C linker: mpicc > Using Fortran linker: mpif90 > Using libraries: > -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib > -L/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib > -lpetsc > -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib > -L/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib > -lHYPRE -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/MPICH/lib > -L/Users/Taylor/Documents/SOFTWARES/MPICH/lib > -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/6.1.0/lib/darwin > -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/6.1.0/lib/darwin > -lclang_rt.osx -lmpicxx -lc++ > -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin > -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin > -lclang_rt.osx -llapack -lblas -Wl,-rpath,/opt/X11/lib -L/opt/X11/lib -lX11 > -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/HDF5/lib > -L/Users/Taylor/Documents/SOFTWARES/HDF5/lib -lhdf5_hl -lhdf5 -lssl > -lcrypto -lmpifort -lgfortran > -Wl,-rpath,/opt/local/lib/gcc49/gcc/x86_64-apple-darwin14/4.9.1 > -L/opt/local/lib/gcc49/gcc/x86_64-apple-darwin14/4.9.1 > -Wl,-rpath,/opt/local/lib/gcc49 -L/opt/local/lib/gcc49 -lgfortran > -lgcc_ext.10.5 -lquadmath -lm -lclang_rt.osx -lmpicxx -lc++ -lclang_rt.osx > -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/MPICH/lib > -L/Users/Taylor/Documents/SOFTWARES/MPICH/lib -ldl -lmpi -lpmpi -lSystem > -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin > -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin > -lclang_rt.osx -ldl > ----------------------------------------- > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.mayhem23 at gmail.com Thu Jan 14 07:37:11 2016 From: dave.mayhem23 at gmail.com (Dave May) Date: Thu, 14 Jan 2016 14:37:11 +0100 Subject: [petsc-users] HPCToolKit/HPCViewer on OS X In-Reply-To: References: <669480D3-5CB2-4A0E-B6BD-FF4B702BB5A7@ad.unc.edu> <6780B32F-E05B-4FDB-9671-1A0C963B3B85@ad.unc.edu> Message-ID: On 14 January 2016 at 14:24, Matthew Knepley wrote: > On Wed, Jan 13, 2016 at 11:12 PM, Bhalla, Amneet Pal S < > amneetb at live.unc.edu> wrote: > >> >> >> On Jan 13, 2016, at 6:22 PM, Matthew Knepley wrote: >> >> Can you mail us a -log_summary for a rough cut? Sometimes its hard >> to interpret the data avalanche from one of those tools without a simple >> map. >> >> >> Does this indicate some hot spots? >> > > 1) There is a misspelled option -stokes_ib_pc_level_ksp_ > richardson_self_scae > > You can try to avoid this by giving -options_left > > 2) Are you using any custom code during the solve? There is a gaping whole > in the timing. It take 9s to > do PCApply(), but something like a collective 1s to do everything we > time under that. > You are looking at the timing from a debug build. The results from the optimized build don't have such a gaping hole. > > Since this is serial, we can use something like kcachegrind to look at > performance as well, which should > at least tell us what is sucking up this time so we can put a PETSc even > on it. > > Thanks, > > Matt > > > >> >> ************************************************************************************************************************ >> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r >> -fCourier9' to print this document *** >> >> ************************************************************************************************************************ >> >> ---------------------------------------------- PETSc Performance Summary: >> ---------------------------------------------- >> >> ./main2d on a darwin-dbg named Amneets-MBP.attlocal.net with 1 >> processor, by Taylor Wed Jan 13 21:07:43 2016 >> Using Petsc Development GIT revision: v3.6.1-2556-g6721a46 GIT Date: >> 2015-11-16 13:07:08 -0600 >> >> Max Max/Min Avg Total >> Time (sec): 1.039e+01 1.00000 1.039e+01 >> Objects: 2.834e+03 1.00000 2.834e+03 >> Flops: 3.552e+08 1.00000 3.552e+08 3.552e+08 >> Flops/sec: 3.418e+07 1.00000 3.418e+07 3.418e+07 >> Memory: 3.949e+07 1.00000 3.949e+07 >> MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00 >> MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00 >> MPI Reductions: 0.000e+00 0.00000 >> >> Flop counting convention: 1 flop = 1 real number operation of type >> (multiply/divide/add/subtract) >> e.g., VecAXPY() for real vectors of length N >> --> 2N flops >> and VecAXPY() for complex vectors of length N >> --> 8N flops >> >> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages >> --- -- Message Lengths -- -- Reductions -- >> Avg %Total Avg %Total counts >> %Total Avg %Total counts %Total >> 0: Main Stage: 1.0391e+01 100.0% 3.5520e+08 100.0% 0.000e+00 >> 0.0% 0.000e+00 0.0% 0.000e+00 0.0% >> >> >> ------------------------------------------------------------------------------------------------------------------------ >> See the 'Profiling' chapter of the users' manual for details on >> interpreting output. >> Phase summary info: >> Count: number of times phase was executed >> Time and Flops: Max - maximum over all processors >> Ratio - ratio of maximum to minimum over all processors >> Mess: number of messages sent >> Avg. len: average message length (bytes) >> Reduct: number of global reductions >> Global: entire computation >> Stage: stages of a computation. Set stages with PetscLogStagePush() >> and PetscLogStagePop(). >> %T - percent time in this phase %F - percent flops in this >> phase >> %M - percent messages in this phase %L - percent message >> lengths in this phase >> %R - percent reductions in this phase >> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time >> over all processors) >> >> ------------------------------------------------------------------------------------------------------------------------ >> >> >> ########################################################## >> # # >> # WARNING!!! # >> # # >> # This code was compiled with a debugging option, # >> # To get timing results run ./configure # >> # using --with-debugging=no, the performance will # >> # be generally two or three times faster. # >> # # >> ########################################################## >> >> >> Event Count Time (sec) Flops >> --- Global --- --- Stage --- Total >> Max Ratio Max Ratio Max Ratio Mess Avg len >> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >> >> ------------------------------------------------------------------------------------------------------------------------ >> >> --- Event Stage 0: Main Stage >> >> VecDot 4 1.0 9.0525e-04 1.0 3.31e+04 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 37 >> VecMDot 533 1.0 1.5936e-02 1.0 5.97e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 2 0 0 0 0 2 0 0 0 375 >> VecNorm 412 1.0 9.2107e-03 1.0 3.57e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 1 0 0 0 0 1 0 0 0 388 >> VecScale 331 1.0 5.8195e-01 1.0 1.41e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 6 0 0 0 0 6 0 0 0 0 2 >> VecCopy 116 1.0 1.9983e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecSet 18362 1.0 1.5249e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >> VecAXPY 254 1.0 4.3961e-01 1.0 1.95e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 4 1 0 0 0 4 1 0 0 0 4 >> VecAYPX 92 1.0 2.5167e-03 1.0 2.66e+05 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 106 >> VecAXPBYCZ 36 1.0 8.6242e-04 1.0 2.94e+05 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 341 >> VecWAXPY 58 1.0 1.2539e-03 1.0 2.47e+05 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 197 >> VecMAXPY 638 1.0 2.3439e-02 1.0 7.68e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 2 0 0 0 0 2 0 0 0 328 >> VecSwap 111 1.0 1.9721e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >> VecAssemblyBegin 607 1.0 3.8150e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecAssemblyEnd 607 1.0 8.3705e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecScatterBegin 26434 1.0 3.0096e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 >> VecNormalize 260 1.0 4.9754e-01 1.0 3.84e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 5 1 0 0 0 5 1 0 0 0 8 >> BuildTwoSidedF 600 1.0 1.8942e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatMult 365 1.0 6.0306e-01 1.0 6.26e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 6 18 0 0 0 6 18 0 0 0 104 >> MatSolve 8775 1.0 6.8506e-01 1.0 2.25e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 7 63 0 0 0 7 63 0 0 0 328 >> MatLUFactorSym 85 1.0 1.0664e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >> MatLUFactorNum 85 1.0 1.2066e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 1 12 0 0 0 1 12 0 0 0 350 >> MatScale 4 1.0 4.0145e-04 1.0 2.51e+05 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 625 >> MatAssemblyBegin 108 1.0 4.8849e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatAssemblyEnd 108 1.0 9.8455e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatGetRow 33120 1.0 1.4157e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >> MatGetRowIJ 85 1.0 2.6060e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatGetSubMatrice 4 1.0 4.2922e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatGetOrdering 85 1.0 3.1230e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatAXPY 4 1.0 4.0459e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 4 0 0 0 0 4 0 0 0 0 0 >> MatPtAP 4 1.0 1.1362e-01 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 1 1 0 0 0 1 1 0 0 0 44 >> MatPtAPSymbolic 4 1.0 6.4973e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >> MatPtAPNumeric 4 1.0 4.8521e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 1 0 0 0 0 1 0 0 0 103 >> MatGetSymTrans 4 1.0 5.9780e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> KSPGMRESOrthog 182 1.0 2.0538e-02 1.0 5.11e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 1 0 0 0 0 1 0 0 0 249 >> KSPSetUp 90 1.0 2.1210e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> KSPSolve 1 1.0 9.5567e+00 1.0 3.50e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 92 98 0 0 0 92 98 0 0 0 37 >> PCSetUp 90 1.0 4.0597e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 4 12 0 0 0 4 12 0 0 0 104 >> PCSetUpOnBlocks 91 1.0 2.9886e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 3 12 0 0 0 3 12 0 0 0 141 >> PCApply 13 1.0 9.0558e+00 1.0 3.49e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 87 98 0 0 0 87 98 0 0 0 39 >> SNESSolve 1 1.0 9.5729e+00 1.0 3.50e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 92 98 0 0 0 92 98 0 0 0 37 >> SNESFunctionEval 2 1.0 1.3347e-02 1.0 4.68e+04 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 4 >> SNESJacobianEval 1 1.0 2.4613e-03 1.0 4.26e+03 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 2 >> >> ------------------------------------------------------------------------------------------------------------------------ >> >> Memory usage is given in bytes: >> >> Object Type Creations Destructions Memory Descendants' >> Mem. >> Reports information only for process 0. >> >> --- Event Stage 0: Main Stage >> >> Vector 870 762 13314200 0. >> Vector Scatter 290 289 189584 0. >> Index Set 1171 823 951096 0. >> IS L to G Mapping 110 109 2156656 0. >> Application Order 6 6 99952 0. >> MatMFFD 1 1 776 0. >> Matrix 189 189 24202324 0. >> Matrix Null Space 4 4 2432 0. >> Krylov Solver 90 90 190080 0. >> DMKSP interface 1 1 648 0. >> Preconditioner 90 90 89128 0. >> SNES 1 1 1328 0. >> SNESLineSearch 1 1 856 0. >> DMSNES 1 1 664 0. >> Distributed Mesh 2 2 9024 0. >> Star Forest Bipartite Graph 4 4 3168 0. >> Discrete System 2 2 1696 0. >> Viewer 1 0 0 0. >> >> ======================================================================================================================== >> Average time to get PetscTime(): 4.74e-08 >> #PETSc Option Table entries: >> -ib_ksp_converged_reason >> -ib_ksp_monitor_true_residual >> -ib_snes_type ksponly >> -log_summary >> -stokes_ib_pc_level_ksp_richardson_self_scae >> -stokes_ib_pc_level_ksp_type gmres >> -stokes_ib_pc_level_pc_asm_local_type additive >> -stokes_ib_pc_level_pc_asm_type interpolate >> -stokes_ib_pc_level_sub_pc_factor_nonzeros_along_diagonal >> -stokes_ib_pc_level_sub_pc_type lu >> #End of PETSc Option Table entries >> Compiled without FORTRAN kernels >> Compiled with full precision matrices (default) >> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 >> sizeof(PetscScalar) 8 sizeof(PetscInt) 4 >> Configure options: --CC=mpicc --CXX=mpicxx --FC=mpif90 >> --PETSC_ARCH=darwin-dbg --with-debugging=1 --with-c++-support=1 >> --with-hypre=1 --download-hypre=1 --with-hdf5=yes >> --with-hdf5-dir=/Users/Taylor/Documents/SOFTWARES/HDF5/ >> ----------------------------------------- >> Libraries compiled on Mon Nov 16 15:11:21 2015 on d209.math.ucdavis.edu >> Machine characteristics: Darwin-14.5.0-x86_64-i386-64bit >> Using PETSc directory: >> /Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc >> Using PETSc arch: darwin-dbg >> ----------------------------------------- >> >> Using C compiler: mpicc -g ${COPTFLAGS} ${CFLAGS} >> Using Fortran compiler: mpif90 -g ${FOPTFLAGS} ${FFLAGS} >> ----------------------------------------- >> >> Using include paths: >> -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/include >> -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/include >> -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/include >> -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/include >> -I/opt/X11/include -I/Users/Taylor/Documents/SOFTWARES/HDF5/include >> -I/opt/local/include -I/Users/Taylor/Documents/SOFTWARES/MPICH/include >> ----------------------------------------- >> >> Using C linker: mpicc >> Using Fortran linker: mpif90 >> Using libraries: >> -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib >> -L/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib >> -lpetsc >> -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib >> -L/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib >> -lHYPRE -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/MPICH/lib >> -L/Users/Taylor/Documents/SOFTWARES/MPICH/lib >> -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/6.1.0/lib/darwin >> -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/6.1.0/lib/darwin >> -lclang_rt.osx -lmpicxx -lc++ >> -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin >> -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin >> -lclang_rt.osx -llapack -lblas -Wl,-rpath,/opt/X11/lib -L/opt/X11/lib -lX11 >> -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/HDF5/lib >> -L/Users/Taylor/Documents/SOFTWARES/HDF5/lib -lhdf5_hl -lhdf5 -lssl >> -lcrypto -lmpifort -lgfortran >> -Wl,-rpath,/opt/local/lib/gcc49/gcc/x86_64-apple-darwin14/4.9.1 >> -L/opt/local/lib/gcc49/gcc/x86_64-apple-darwin14/4.9.1 >> -Wl,-rpath,/opt/local/lib/gcc49 -L/opt/local/lib/gcc49 -lgfortran >> -lgcc_ext.10.5 -lquadmath -lm -lclang_rt.osx -lmpicxx -lc++ -lclang_rt.osx >> -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/MPICH/lib >> -L/Users/Taylor/Documents/SOFTWARES/MPICH/lib -ldl -lmpi -lpmpi -lSystem >> -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin >> -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin >> -lclang_rt.osx -ldl >> ----------------------------------------- >> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gideon.simpson at gmail.com Thu Jan 14 07:39:17 2016 From: gideon.simpson at gmail.com (Gideon Simpson) Date: Thu, 14 Jan 2016 08:39:17 -0500 Subject: [petsc-users] compiler error In-Reply-To: References: <18F5EB28-AE2E-4E28-B95E-2D6AD1DBECEE@gmail.com> Message-ID: I know I did a git pull recently, but when did that change? What?s the fifth argument represent? -gideon > On Jan 13, 2016, at 11:54 PM, Satish Balay wrote: > > On Wed, 13 Jan 2016, Gideon Simpson wrote: > >> I haven?t seen this before: >> >> /mnt/HA/opt/openmpi/intel/2015/1.8.1-mlnx-ofed/bin/mpicc -o fixed_batch.o -c -fPIC -wd1572 -g -I/home/simpson/software/petsc/include -I/home/simpson/software/petsc/arch-linux2-c-debug/include -I/mnt/HA/opt/openmpi/intel/2015/1.8.1-mlnx-ofed/include -Wall `pwd`/fixed_batch.c >> /home/simpson/projects/dnls/petsc/fixed_batch.c(44): warning #167: argument of type "PetscScalar={PetscReal={double}} *" is incompatible with parameter of type "const char *" >> PetscOptionsGetScalar(NULL,"-xmax",&xmax,NULL); >> ^ >> >> /home/simpson/projects/dnls/petsc/fixed_batch.c(44): error #165: too few arguments in function call >> PetscOptionsGetScalar(NULL,"-xmax",&xmax,NULL); > > Try: > > PetscOptionsGetScalar(NULL,NULL,"-xmax",&xmax,NULL); > > Satish -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Jan 14 07:44:47 2016 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 14 Jan 2016 07:44:47 -0600 Subject: [petsc-users] HPCToolKit/HPCViewer on OS X In-Reply-To: References: <669480D3-5CB2-4A0E-B6BD-FF4B702BB5A7@ad.unc.edu> <6780B32F-E05B-4FDB-9671-1A0C963B3B85@ad.unc.edu> Message-ID: On Thu, Jan 14, 2016 at 7:37 AM, Dave May wrote: > > > On 14 January 2016 at 14:24, Matthew Knepley wrote: > >> On Wed, Jan 13, 2016 at 11:12 PM, Bhalla, Amneet Pal S < >> amneetb at live.unc.edu> wrote: >> >>> >>> >>> On Jan 13, 2016, at 6:22 PM, Matthew Knepley wrote: >>> >>> Can you mail us a -log_summary for a rough cut? Sometimes its hard >>> to interpret the data avalanche from one of those tools without a simple >>> map. >>> >>> >>> Does this indicate some hot spots? >>> >> >> 1) There is a misspelled option -stokes_ib_pc_level_ksp_ >> richardson_self_scae >> >> You can try to avoid this by giving -options_left >> >> 2) Are you using any custom code during the solve? There is a gaping >> whole in the timing. It take 9s to >> do PCApply(), but something like a collective 1s to do everything we >> time under that. >> > > > You are looking at the timing from a debug build. > The results from the optimized build don't have such a gaping hole. > It still looks like 50% of the runtime to me. Matt > >> Since this is serial, we can use something like kcachegrind to look at >> performance as well, which should >> at least tell us what is sucking up this time so we can put a PETSc even >> on it. >> >> Thanks, >> >> Matt >> >> >> >>> >>> ************************************************************************************************************************ >>> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r >>> -fCourier9' to print this document *** >>> >>> ************************************************************************************************************************ >>> >>> ---------------------------------------------- PETSc Performance >>> Summary: ---------------------------------------------- >>> >>> ./main2d on a darwin-dbg named Amneets-MBP.attlocal.net with 1 >>> processor, by Taylor Wed Jan 13 21:07:43 2016 >>> Using Petsc Development GIT revision: v3.6.1-2556-g6721a46 GIT Date: >>> 2015-11-16 13:07:08 -0600 >>> >>> Max Max/Min Avg Total >>> Time (sec): 1.039e+01 1.00000 1.039e+01 >>> Objects: 2.834e+03 1.00000 2.834e+03 >>> Flops: 3.552e+08 1.00000 3.552e+08 3.552e+08 >>> Flops/sec: 3.418e+07 1.00000 3.418e+07 3.418e+07 >>> Memory: 3.949e+07 1.00000 3.949e+07 >>> MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00 >>> MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00 >>> MPI Reductions: 0.000e+00 0.00000 >>> >>> Flop counting convention: 1 flop = 1 real number operation of type >>> (multiply/divide/add/subtract) >>> e.g., VecAXPY() for real vectors of length N >>> --> 2N flops >>> and VecAXPY() for complex vectors of length >>> N --> 8N flops >>> >>> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages >>> --- -- Message Lengths -- -- Reductions -- >>> Avg %Total Avg %Total counts >>> %Total Avg %Total counts %Total >>> 0: Main Stage: 1.0391e+01 100.0% 3.5520e+08 100.0% 0.000e+00 >>> 0.0% 0.000e+00 0.0% 0.000e+00 0.0% >>> >>> >>> ------------------------------------------------------------------------------------------------------------------------ >>> See the 'Profiling' chapter of the users' manual for details on >>> interpreting output. >>> Phase summary info: >>> Count: number of times phase was executed >>> Time and Flops: Max - maximum over all processors >>> Ratio - ratio of maximum to minimum over all >>> processors >>> Mess: number of messages sent >>> Avg. len: average message length (bytes) >>> Reduct: number of global reductions >>> Global: entire computation >>> Stage: stages of a computation. Set stages with PetscLogStagePush() >>> and PetscLogStagePop(). >>> %T - percent time in this phase %F - percent flops in this >>> phase >>> %M - percent messages in this phase %L - percent message >>> lengths in this phase >>> %R - percent reductions in this phase >>> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time >>> over all processors) >>> >>> ------------------------------------------------------------------------------------------------------------------------ >>> >>> >>> ########################################################## >>> # # >>> # WARNING!!! # >>> # # >>> # This code was compiled with a debugging option, # >>> # To get timing results run ./configure # >>> # using --with-debugging=no, the performance will # >>> # be generally two or three times faster. # >>> # # >>> ########################################################## >>> >>> >>> Event Count Time (sec) Flops >>> --- Global --- --- Stage --- Total >>> Max Ratio Max Ratio Max Ratio Mess Avg len >>> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >>> >>> ------------------------------------------------------------------------------------------------------------------------ >>> >>> --- Event Stage 0: Main Stage >>> >>> VecDot 4 1.0 9.0525e-04 1.0 3.31e+04 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 37 >>> VecMDot 533 1.0 1.5936e-02 1.0 5.97e+06 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 2 0 0 0 0 2 0 0 0 375 >>> VecNorm 412 1.0 9.2107e-03 1.0 3.57e+06 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 1 0 0 0 0 1 0 0 0 388 >>> VecScale 331 1.0 5.8195e-01 1.0 1.41e+06 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 6 0 0 0 0 6 0 0 0 0 2 >>> VecCopy 116 1.0 1.9983e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecSet 18362 1.0 1.5249e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>> VecAXPY 254 1.0 4.3961e-01 1.0 1.95e+06 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 4 1 0 0 0 4 1 0 0 0 4 >>> VecAYPX 92 1.0 2.5167e-03 1.0 2.66e+05 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 106 >>> VecAXPBYCZ 36 1.0 8.6242e-04 1.0 2.94e+05 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 341 >>> VecWAXPY 58 1.0 1.2539e-03 1.0 2.47e+05 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 197 >>> VecMAXPY 638 1.0 2.3439e-02 1.0 7.68e+06 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 2 0 0 0 0 2 0 0 0 328 >>> VecSwap 111 1.0 1.9721e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >>> VecAssemblyBegin 607 1.0 3.8150e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecAssemblyEnd 607 1.0 8.3705e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecScatterBegin 26434 1.0 3.0096e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 >>> VecNormalize 260 1.0 4.9754e-01 1.0 3.84e+06 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 5 1 0 0 0 5 1 0 0 0 8 >>> BuildTwoSidedF 600 1.0 1.8942e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatMult 365 1.0 6.0306e-01 1.0 6.26e+07 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 6 18 0 0 0 6 18 0 0 0 104 >>> MatSolve 8775 1.0 6.8506e-01 1.0 2.25e+08 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 7 63 0 0 0 7 63 0 0 0 328 >>> MatLUFactorSym 85 1.0 1.0664e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>> MatLUFactorNum 85 1.0 1.2066e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 1 12 0 0 0 1 12 0 0 0 350 >>> MatScale 4 1.0 4.0145e-04 1.0 2.51e+05 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 625 >>> MatAssemblyBegin 108 1.0 4.8849e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatAssemblyEnd 108 1.0 9.8455e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatGetRow 33120 1.0 1.4157e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>> MatGetRowIJ 85 1.0 2.6060e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatGetSubMatrice 4 1.0 4.2922e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatGetOrdering 85 1.0 3.1230e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatAXPY 4 1.0 4.0459e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 4 0 0 0 0 4 0 0 0 0 0 >>> MatPtAP 4 1.0 1.1362e-01 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 1 1 0 0 0 1 1 0 0 0 44 >>> MatPtAPSymbolic 4 1.0 6.4973e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>> MatPtAPNumeric 4 1.0 4.8521e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 1 0 0 0 0 1 0 0 0 103 >>> MatGetSymTrans 4 1.0 5.9780e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> KSPGMRESOrthog 182 1.0 2.0538e-02 1.0 5.11e+06 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 1 0 0 0 0 1 0 0 0 249 >>> KSPSetUp 90 1.0 2.1210e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> KSPSolve 1 1.0 9.5567e+00 1.0 3.50e+08 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 92 98 0 0 0 92 98 0 0 0 37 >>> PCSetUp 90 1.0 4.0597e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 4 12 0 0 0 4 12 0 0 0 104 >>> PCSetUpOnBlocks 91 1.0 2.9886e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 3 12 0 0 0 3 12 0 0 0 141 >>> PCApply 13 1.0 9.0558e+00 1.0 3.49e+08 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 87 98 0 0 0 87 98 0 0 0 39 >>> SNESSolve 1 1.0 9.5729e+00 1.0 3.50e+08 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 92 98 0 0 0 92 98 0 0 0 37 >>> SNESFunctionEval 2 1.0 1.3347e-02 1.0 4.68e+04 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 4 >>> SNESJacobianEval 1 1.0 2.4613e-03 1.0 4.26e+03 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 2 >>> >>> ------------------------------------------------------------------------------------------------------------------------ >>> >>> Memory usage is given in bytes: >>> >>> Object Type Creations Destructions Memory Descendants' >>> Mem. >>> Reports information only for process 0. >>> >>> --- Event Stage 0: Main Stage >>> >>> Vector 870 762 13314200 0. >>> Vector Scatter 290 289 189584 0. >>> Index Set 1171 823 951096 0. >>> IS L to G Mapping 110 109 2156656 0. >>> Application Order 6 6 99952 0. >>> MatMFFD 1 1 776 0. >>> Matrix 189 189 24202324 0. >>> Matrix Null Space 4 4 2432 0. >>> Krylov Solver 90 90 190080 0. >>> DMKSP interface 1 1 648 0. >>> Preconditioner 90 90 89128 0. >>> SNES 1 1 1328 0. >>> SNESLineSearch 1 1 856 0. >>> DMSNES 1 1 664 0. >>> Distributed Mesh 2 2 9024 0. >>> Star Forest Bipartite Graph 4 4 3168 0. >>> Discrete System 2 2 1696 0. >>> Viewer 1 0 0 0. >>> >>> ======================================================================================================================== >>> Average time to get PetscTime(): 4.74e-08 >>> #PETSc Option Table entries: >>> -ib_ksp_converged_reason >>> -ib_ksp_monitor_true_residual >>> -ib_snes_type ksponly >>> -log_summary >>> -stokes_ib_pc_level_ksp_richardson_self_scae >>> -stokes_ib_pc_level_ksp_type gmres >>> -stokes_ib_pc_level_pc_asm_local_type additive >>> -stokes_ib_pc_level_pc_asm_type interpolate >>> -stokes_ib_pc_level_sub_pc_factor_nonzeros_along_diagonal >>> -stokes_ib_pc_level_sub_pc_type lu >>> #End of PETSc Option Table entries >>> Compiled without FORTRAN kernels >>> Compiled with full precision matrices (default) >>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 >>> sizeof(PetscScalar) 8 sizeof(PetscInt) 4 >>> Configure options: --CC=mpicc --CXX=mpicxx --FC=mpif90 >>> --PETSC_ARCH=darwin-dbg --with-debugging=1 --with-c++-support=1 >>> --with-hypre=1 --download-hypre=1 --with-hdf5=yes >>> --with-hdf5-dir=/Users/Taylor/Documents/SOFTWARES/HDF5/ >>> ----------------------------------------- >>> Libraries compiled on Mon Nov 16 15:11:21 2015 on d209.math.ucdavis.edu >>> Machine characteristics: Darwin-14.5.0-x86_64-i386-64bit >>> Using PETSc directory: >>> /Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc >>> Using PETSc arch: darwin-dbg >>> ----------------------------------------- >>> >>> Using C compiler: mpicc -g ${COPTFLAGS} ${CFLAGS} >>> Using Fortran compiler: mpif90 -g ${FOPTFLAGS} ${FFLAGS} >>> ----------------------------------------- >>> >>> Using include paths: >>> -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/include >>> -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/include >>> -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/include >>> -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/include >>> -I/opt/X11/include -I/Users/Taylor/Documents/SOFTWARES/HDF5/include >>> -I/opt/local/include -I/Users/Taylor/Documents/SOFTWARES/MPICH/include >>> ----------------------------------------- >>> >>> Using C linker: mpicc >>> Using Fortran linker: mpif90 >>> Using libraries: >>> -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib >>> -L/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib >>> -lpetsc >>> -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib >>> -L/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib >>> -lHYPRE -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/MPICH/lib >>> -L/Users/Taylor/Documents/SOFTWARES/MPICH/lib >>> -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/6.1.0/lib/darwin >>> -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/6.1.0/lib/darwin >>> -lclang_rt.osx -lmpicxx -lc++ >>> -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin >>> -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin >>> -lclang_rt.osx -llapack -lblas -Wl,-rpath,/opt/X11/lib -L/opt/X11/lib -lX11 >>> -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/HDF5/lib >>> -L/Users/Taylor/Documents/SOFTWARES/HDF5/lib -lhdf5_hl -lhdf5 -lssl >>> -lcrypto -lmpifort -lgfortran >>> -Wl,-rpath,/opt/local/lib/gcc49/gcc/x86_64-apple-darwin14/4.9.1 >>> -L/opt/local/lib/gcc49/gcc/x86_64-apple-darwin14/4.9.1 >>> -Wl,-rpath,/opt/local/lib/gcc49 -L/opt/local/lib/gcc49 -lgfortran >>> -lgcc_ext.10.5 -lquadmath -lm -lclang_rt.osx -lmpicxx -lc++ -lclang_rt.osx >>> -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/MPICH/lib >>> -L/Users/Taylor/Documents/SOFTWARES/MPICH/lib -ldl -lmpi -lpmpi -lSystem >>> -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin >>> -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin >>> -lclang_rt.osx -ldl >>> ----------------------------------------- >>> >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From boyceg at email.unc.edu Thu Jan 14 08:30:38 2016 From: boyceg at email.unc.edu (Griffith, Boyce Eugene) Date: Thu, 14 Jan 2016 14:30:38 +0000 Subject: [petsc-users] HPCToolKit/HPCViewer on OS X In-Reply-To: References: <669480D3-5CB2-4A0E-B6BD-FF4B702BB5A7@ad.unc.edu> <6780B32F-E05B-4FDB-9671-1A0C963B3B85@ad.unc.edu> Message-ID: <960248D8-89F9-492D-A7EA-C503722E73C9@email.unc.edu> On Jan 14, 2016, at 8:44 AM, Matthew Knepley > wrote: On Thu, Jan 14, 2016 at 7:37 AM, Dave May > wrote: On 14 January 2016 at 14:24, Matthew Knepley > wrote: On Wed, Jan 13, 2016 at 11:12 PM, Bhalla, Amneet Pal S > wrote: On Jan 13, 2016, at 6:22 PM, Matthew Knepley > wrote: Can you mail us a -log_summary for a rough cut? Sometimes its hard to interpret the data avalanche from one of those tools without a simple map. Does this indicate some hot spots? 1) There is a misspelled option -stokes_ib_pc_level_ksp_richardson_self_scae You can try to avoid this by giving -options_left 2) Are you using any custom code during the solve? There is a gaping whole in the timing. It take 9s to do PCApply(), but something like a collective 1s to do everything we time under that. You are looking at the timing from a debug build. The results from the optimized build don't have such a gaping hole. It still looks like 50% of the runtime to me. Amneet, on OS X, I would echo Barry and suggest starting out using the timer profiler instrument (accessible through the Instruments app). -- Boyce Matt Since this is serial, we can use something like kcachegrind to look at performance as well, which should at least tell us what is sucking up this time so we can put a PETSc even on it. Thanks, Matt ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./main2d on a darwin-dbg named Amneets-MBP.attlocal.net with 1 processor, by Taylor Wed Jan 13 21:07:43 2016 Using Petsc Development GIT revision: v3.6.1-2556-g6721a46 GIT Date: 2015-11-16 13:07:08 -0600 Max Max/Min Avg Total Time (sec): 1.039e+01 1.00000 1.039e+01 Objects: 2.834e+03 1.00000 2.834e+03 Flops: 3.552e+08 1.00000 3.552e+08 3.552e+08 Flops/sec: 3.418e+07 1.00000 3.418e+07 3.418e+07 Memory: 3.949e+07 1.00000 3.949e+07 MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00 MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00 MPI Reductions: 0.000e+00 0.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 1.0391e+01 100.0% 3.5520e+08 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ ########################################################## # # # WARNING!!! # # # # This code was compiled with a debugging option, # # To get timing results run ./configure # # using --with-debugging=no, the performance will # # be generally two or three times faster. # # # ########################################################## Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage VecDot 4 1.0 9.0525e-04 1.0 3.31e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 37 VecMDot 533 1.0 1.5936e-02 1.0 5.97e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 375 VecNorm 412 1.0 9.2107e-03 1.0 3.57e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 388 VecScale 331 1.0 5.8195e-01 1.0 1.41e+06 1.0 0.0e+00 0.0e+00 0.0e+00 6 0 0 0 0 6 0 0 0 0 2 VecCopy 116 1.0 1.9983e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 18362 1.0 1.5249e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecAXPY 254 1.0 4.3961e-01 1.0 1.95e+06 1.0 0.0e+00 0.0e+00 0.0e+00 4 1 0 0 0 4 1 0 0 0 4 VecAYPX 92 1.0 2.5167e-03 1.0 2.66e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 106 VecAXPBYCZ 36 1.0 8.6242e-04 1.0 2.94e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 341 VecWAXPY 58 1.0 1.2539e-03 1.0 2.47e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 197 VecMAXPY 638 1.0 2.3439e-02 1.0 7.68e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 328 VecSwap 111 1.0 1.9721e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 VecAssemblyBegin 607 1.0 3.8150e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 607 1.0 8.3705e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 26434 1.0 3.0096e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 VecNormalize 260 1.0 4.9754e-01 1.0 3.84e+06 1.0 0.0e+00 0.0e+00 0.0e+00 5 1 0 0 0 5 1 0 0 0 8 BuildTwoSidedF 600 1.0 1.8942e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatMult 365 1.0 6.0306e-01 1.0 6.26e+07 1.0 0.0e+00 0.0e+00 0.0e+00 6 18 0 0 0 6 18 0 0 0 104 MatSolve 8775 1.0 6.8506e-01 1.0 2.25e+08 1.0 0.0e+00 0.0e+00 0.0e+00 7 63 0 0 0 7 63 0 0 0 328 MatLUFactorSym 85 1.0 1.0664e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 MatLUFactorNum 85 1.0 1.2066e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 12 0 0 0 1 12 0 0 0 350 MatScale 4 1.0 4.0145e-04 1.0 2.51e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 625 MatAssemblyBegin 108 1.0 4.8849e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 108 1.0 9.8455e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetRow 33120 1.0 1.4157e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 MatGetRowIJ 85 1.0 2.6060e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetSubMatrice 4 1.0 4.2922e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 85 1.0 3.1230e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAXPY 4 1.0 4.0459e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 4 0 0 0 0 4 0 0 0 0 0 MatPtAP 4 1.0 1.1362e-01 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 44 MatPtAPSymbolic 4 1.0 6.4973e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 MatPtAPNumeric 4 1.0 4.8521e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 103 MatGetSymTrans 4 1.0 5.9780e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPGMRESOrthog 182 1.0 2.0538e-02 1.0 5.11e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 249 KSPSetUp 90 1.0 2.1210e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 9.5567e+00 1.0 3.50e+08 1.0 0.0e+00 0.0e+00 0.0e+00 92 98 0 0 0 92 98 0 0 0 37 PCSetUp 90 1.0 4.0597e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00 0.0e+00 4 12 0 0 0 4 12 0 0 0 104 PCSetUpOnBlocks 91 1.0 2.9886e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00 0.0e+00 3 12 0 0 0 3 12 0 0 0 141 PCApply 13 1.0 9.0558e+00 1.0 3.49e+08 1.0 0.0e+00 0.0e+00 0.0e+00 87 98 0 0 0 87 98 0 0 0 39 SNESSolve 1 1.0 9.5729e+00 1.0 3.50e+08 1.0 0.0e+00 0.0e+00 0.0e+00 92 98 0 0 0 92 98 0 0 0 37 SNESFunctionEval 2 1.0 1.3347e-02 1.0 4.68e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 4 SNESJacobianEval 1 1.0 2.4613e-03 1.0 4.26e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Vector 870 762 13314200 0. Vector Scatter 290 289 189584 0. Index Set 1171 823 951096 0. IS L to G Mapping 110 109 2156656 0. Application Order 6 6 99952 0. MatMFFD 1 1 776 0. Matrix 189 189 24202324 0. Matrix Null Space 4 4 2432 0. Krylov Solver 90 90 190080 0. DMKSP interface 1 1 648 0. Preconditioner 90 90 89128 0. SNES 1 1 1328 0. SNESLineSearch 1 1 856 0. DMSNES 1 1 664 0. Distributed Mesh 2 2 9024 0. Star Forest Bipartite Graph 4 4 3168 0. Discrete System 2 2 1696 0. Viewer 1 0 0 0. ======================================================================================================================== Average time to get PetscTime(): 4.74e-08 #PETSc Option Table entries: -ib_ksp_converged_reason -ib_ksp_monitor_true_residual -ib_snes_type ksponly -log_summary -stokes_ib_pc_level_ksp_richardson_self_scae -stokes_ib_pc_level_ksp_type gmres -stokes_ib_pc_level_pc_asm_local_type additive -stokes_ib_pc_level_pc_asm_type interpolate -stokes_ib_pc_level_sub_pc_factor_nonzeros_along_diagonal -stokes_ib_pc_level_sub_pc_type lu #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --CC=mpicc --CXX=mpicxx --FC=mpif90 --PETSC_ARCH=darwin-dbg --with-debugging=1 --with-c++-support=1 --with-hypre=1 --download-hypre=1 --with-hdf5=yes --with-hdf5-dir=/Users/Taylor/Documents/SOFTWARES/HDF5/ ----------------------------------------- Libraries compiled on Mon Nov 16 15:11:21 2015 on d209.math.ucdavis.edu Machine characteristics: Darwin-14.5.0-x86_64-i386-64bit Using PETSc directory: /Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc Using PETSc arch: darwin-dbg ----------------------------------------- Using C compiler: mpicc -g ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: mpif90 -g ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/include -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/include -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/include -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/include -I/opt/X11/include -I/Users/Taylor/Documents/SOFTWARES/HDF5/include -I/opt/local/include -I/Users/Taylor/Documents/SOFTWARES/MPICH/include ----------------------------------------- Using C linker: mpicc Using Fortran linker: mpif90 Using libraries: -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib -L/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib -lpetsc -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib -L/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib -lHYPRE -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/MPICH/lib -L/Users/Taylor/Documents/SOFTWARES/MPICH/lib -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/6.1.0/lib/darwin -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/6.1.0/lib/darwin -lclang_rt.osx -lmpicxx -lc++ -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin -lclang_rt.osx -llapack -lblas -Wl,-rpath,/opt/X11/lib -L/opt/X11/lib -lX11 -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/HDF5/lib -L/Users/Taylor/Documents/SOFTWARES/HDF5/lib -lhdf5_hl -lhdf5 -lssl -lcrypto -lmpifort -lgfortran -Wl,-rpath,/opt/local/lib/gcc49/gcc/x86_64-apple-darwin14/4.9.1 -L/opt/local/lib/gcc49/gcc/x86_64-apple-darwin14/4.9.1 -Wl,-rpath,/opt/local/lib/gcc49 -L/opt/local/lib/gcc49 -lgfortran -lgcc_ext.10.5 -lquadmath -lm -lclang_rt.osx -lmpicxx -lc++ -lclang_rt.osx -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/MPICH/lib -L/Users/Taylor/Documents/SOFTWARES/MPICH/lib -ldl -lmpi -lpmpi -lSystem -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin -lclang_rt.osx -ldl ----------------------------------------- -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Thu Jan 14 09:42:44 2016 From: balay at mcs.anl.gov (Satish Balay) Date: Thu, 14 Jan 2016 09:42:44 -0600 Subject: [petsc-users] compiler error In-Reply-To: References: <18F5EB28-AE2E-4E28-B95E-2D6AD1DBECEE@gmail.com> Message-ID: Hopefully all changes should be documented in the changes file.. http://www.mcs.anl.gov/petsc/documentation/changes/dev.html You can use git to find out more info.. balay at asterix /home/balay/petsc (master=) $ git grep PetscOptionsGetScalar include/ include/petscoptions.h:PETSC_EXTERN PetscErrorCode PetscOptionsGetScalar(PetscOptions,const char[],const char[],PetscScalar *,PetscBool *); include/petscoptions.h:PETSC_EXTERN PetscErrorCode PetscOptionsGetScalarArray(PetscOptions,const char[],const char[],PetscScalar[],PetscInt *,PetscBool *); balay at asterix /home/balay/petsc (hzhang/update-networkex=) $ git annotate include/petscoptions.h |grep PetscOptionsGetScalar c5929fdf (Barry Smith 2015-10-30 21:20:21 -0500 17)PETSC_EXTERN PetscErrorCode PetscOptionsGetScalar(PetscOptions,const char[],const char[],PetscScalar *,PetscBool *); c5929fdf (Barry Smith 2015-10-30 21:20:21 -0500 20)PETSC_EXTERN PetscErrorCode PetscOptionsGetScalarArray(PetscOptions,const char[],const char[],PetscScalar[],PetscInt *,PetscBool *); balay at asterix /home/balay/petsc (hzhang/update-networkex=) $ git show -q c5929fdf commit c5929fdf3082647d199855a5c1d0286204349b03 Author: Barry Smith Date: Fri Oct 30 21:20:21 2015 -0500 Complete update to new PetscOptions interface balay at asterix /home/balay/petsc (hzhang/update-networkex=) $ gitk c5929fdf etc.. Satish On Thu, 14 Jan 2016, Gideon Simpson wrote: > I know I did a git pull recently, but when did that change? What?s the fifth argument represent? > > -gideon > > > On Jan 13, 2016, at 11:54 PM, Satish Balay wrote: > > > > On Wed, 13 Jan 2016, Gideon Simpson wrote: > > > >> I haven?t seen this before: > >> > >> /mnt/HA/opt/openmpi/intel/2015/1.8.1-mlnx-ofed/bin/mpicc -o fixed_batch.o -c -fPIC -wd1572 -g -I/home/simpson/software/petsc/include -I/home/simpson/software/petsc/arch-linux2-c-debug/include -I/mnt/HA/opt/openmpi/intel/2015/1.8.1-mlnx-ofed/include -Wall `pwd`/fixed_batch.c > >> /home/simpson/projects/dnls/petsc/fixed_batch.c(44): warning #167: argument of type "PetscScalar={PetscReal={double}} *" is incompatible with parameter of type "const char *" > >> PetscOptionsGetScalar(NULL,"-xmax",&xmax,NULL); > >> ^ > >> > >> /home/simpson/projects/dnls/petsc/fixed_batch.c(44): error #165: too few arguments in function call > >> PetscOptionsGetScalar(NULL,"-xmax",&xmax,NULL); > > > > Try: > > > > PetscOptionsGetScalar(NULL,NULL,"-xmax",&xmax,NULL); > > > > Satish > > From knepley at gmail.com Thu Jan 14 10:09:46 2016 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 14 Jan 2016 10:09:46 -0600 Subject: [petsc-users] undefined reference error in make test In-Reply-To: References: Message-ID: On Thu, Jan 14, 2016 at 12:03 AM, praveen kumar wrote: > I?ve written a fortan code (F90) for domain decomposition.* I've > specified **the paths of include files and libraries, but the > compiler/linker still * > > > *complained about undefined references.undefined reference to `vectorset_'* > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecSet.html > > *undefined reference to `dmdagetlocalinfo_'* > This function is not supported in Fortran since it takes a structure. Thanks, Matt > I?m attaching makefile and code. any help will be appreciated. > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Jan 14 10:20:30 2016 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 14 Jan 2016 10:20:30 -0600 Subject: [petsc-users] Why use MATMPIBAIJ? In-Reply-To: References: <2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov> Message-ID: On Thu, Jan 14, 2016 at 5:04 AM, Hoang Giang Bui wrote: > This is a very interesting thread because use of block matrix improves the > performance of AMG a lot. In my case is the elasticity problem. > > One more question I like to ask, which is more on the performance of the > solver. That if I have a coupled problem, says the point block is [u_x u_y > u_z p] in which entries of p block in stiffness matrix is in a much smaller > scale than u (p~1e-6, u~1e+8), then AMG with hypre in PETSc still scale? > Also, is there a utility in PETSc which does automatic scaling of variables? > You could use PC Jacobi, or perhaps http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/KSP/KSPSetDiagonalScale.html Thanks, Matt > Giang > > On Thu, Jan 14, 2016 at 7:13 AM, Justin Chang wrote: > >> Okay that makes sense, thanks >> >> On Wed, Jan 13, 2016 at 10:12 PM, Barry Smith wrote: >> >>> >>> > On Jan 13, 2016, at 10:24 PM, Justin Chang >>> wrote: >>> > >>> > Thanks Barry, >>> > >>> > 1) So for block matrices, the ja array is smaller. But what's the >>> "hardware" explanation for this performance improvement? Does it have to do >>> with spatial locality where you are more likely to reuse data in that ja >>> array, or does it have to do with the fact that loading/storing smaller >>> arrays are less likely to invoke a cache miss, thus reducing the amount of >>> bandwidth? >>> >>> There are two distinct reasons for the improvement: >>> >>> 1) For 5 by 5 blocks the ja array is 1/25th the size. The "hardware" >>> savings is that you have to load something that is much smaller than >>> before. Cache/spatial locality have nothing to do with this particular >>> improvement. >>> >>> 2) The other improvement comes from the reuse of each x[j] value >>> multiplied by 5 values (a column) of the little block. The hardware >>> explanation is that x[j] can be reused in a register for the 5 multiplies >>> (while otherwise it would have to come from cache to register 5 times and >>> sometimes might even have been flushed from the cache so would have to come >>> from memory). This is why we have code like >>> >>> for (j=0; j>> xb = x + 5*(*idx++); >>> x1 = xb[0]; x2 = xb[1]; x3 = xb[2]; x4 = xb[3]; x5 = xb[4]; >>> sum1 += v[0]*x1 + v[5]*x2 + v[10]*x3 + v[15]*x4 + v[20]*x5; >>> sum2 += v[1]*x1 + v[6]*x2 + v[11]*x3 + v[16]*x4 + v[21]*x5; >>> sum3 += v[2]*x1 + v[7]*x2 + v[12]*x3 + v[17]*x4 + v[22]*x5; >>> sum4 += v[3]*x1 + v[8]*x2 + v[13]*x3 + v[18]*x4 + v[23]*x5; >>> sum5 += v[4]*x1 + v[9]*x2 + v[14]*x3 + v[19]*x4 + v[24]*x5; >>> v += 25; >>> } >>> >>> to do the block multiple. >>> >>> > >>> > 2) So if one wants to assemble a monolithic matrix (i.e., aggregation >>> of more than one dof per point) then using the BAIJ format is highly >>> advisable. But if I want to form a nested matrix, say I am solving Stokes >>> equation, then each "submatrix" is of AIJ format? Can these sub matrices >>> also be BAIJ? >>> >>> Sure, but if you have separated all the variables of pressure, >>> velocity_x, velocity_y, etc into there own regions of the vector then the >>> block size for the sub matrices would be 1 so BAIJ does not help. >>> >>> There are Stokes solvers that use Vanka smoothing that keep the >>> variables interlaced and hence would use BAIJ and NOT use fieldsplit >>> >>> >>> > >>> > Thanks, >>> > Justin >>> > >>> > On Wed, Jan 13, 2016 at 9:12 PM, Barry Smith >>> wrote: >>> > >>> > > On Jan 13, 2016, at 9:57 PM, Justin Chang >>> wrote: >>> > > >>> > > Hi all, >>> > > >>> > > 1) I am guessing MATMPIBAIJ could theoretically have better >>> performance than simply using MATMPIAIJ. Why is that? Is it similar to the >>> reasoning that block (dense) matrix-vector multiply is "faster" than simple >>> matrix-vector? >>> > >>> > See for example table 1 in >>> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.38.7668&rep=rep1&type=pdf >>> > >>> > > >>> > > 2) I am looking through the manual and online documentation and it >>> seems the term "block" used everywhere. In the section on "block matrices" >>> (3.1.3 of the manual), it refers to field splitting, where you could either >>> have a monolithic matrix or a nested matrix. Does that concept have >>> anything to do with MATMPIBAIJ? >>> > >>> > Unfortunately the numerical analysis literature uses the term block >>> in multiple ways. For small blocks, sometimes called "point-block" with >>> BAIJ and for very large blocks (where the blocks are sparse themselves). I >>> used fieldsplit for big sparse blocks to try to avoid confusion in PETSc. >>> > > >>> > > It makes sense to me that one could create a BAIJ where if you have >>> 5 dofs of the same type of physics (e.g., five different primary species of >>> a geochemical reaction) per grid point, you could create a block size of 5. >>> And if you have different physics (e.g., velocity and pressure) you would >>> ideally want to separate them out (i.e., nested matrices) for better >>> preconditioning. >>> > >>> > Sometimes you put them together with BAIJ and sometimes you keep >>> them separate with nested matrices. >>> > >>> > > >>> > > Thanks, >>> > > Justin >>> > >>> > >>> >>> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From gideon.simpson at gmail.com Thu Jan 14 11:40:08 2016 From: gideon.simpson at gmail.com (Gideon Simpson) Date: Thu, 14 Jan 2016 12:40:08 -0500 Subject: [petsc-users] compiler error In-Reply-To: References: <18F5EB28-AE2E-4E28-B95E-2D6AD1DBECEE@gmail.com> Message-ID: <128DCD06-7F08-41A0-A581-CD081A14B429@gmail.com> Is this change going to be part of the next patch release, or the eventual 3.7? -gideon > On Jan 14, 2016, at 10:42 AM, Satish Balay wrote: > > Hopefully all changes should be documented in the changes file.. > > http://www.mcs.anl.gov/petsc/documentation/changes/dev.html > > You can use git to find out more info.. > > balay at asterix /home/balay/petsc (master=) > $ git grep PetscOptionsGetScalar include/ > include/petscoptions.h:PETSC_EXTERN PetscErrorCode PetscOptionsGetScalar(PetscOptions,const char[],const char[],PetscScalar *,PetscBool *); > include/petscoptions.h:PETSC_EXTERN PetscErrorCode PetscOptionsGetScalarArray(PetscOptions,const char[],const char[],PetscScalar[],PetscInt *,PetscBool *); > balay at asterix /home/balay/petsc (hzhang/update-networkex=) > $ git annotate include/petscoptions.h |grep PetscOptionsGetScalar > c5929fdf (Barry Smith 2015-10-30 21:20:21 -0500 17)PETSC_EXTERN PetscErrorCode PetscOptionsGetScalar(PetscOptions,const char[],const char[],PetscScalar *,PetscBool *); > c5929fdf (Barry Smith 2015-10-30 21:20:21 -0500 20)PETSC_EXTERN PetscErrorCode PetscOptionsGetScalarArray(PetscOptions,const char[],const char[],PetscScalar[],PetscInt *,PetscBool *); > balay at asterix /home/balay/petsc (hzhang/update-networkex=) > $ git show -q c5929fdf > commit c5929fdf3082647d199855a5c1d0286204349b03 > Author: Barry Smith > Date: Fri Oct 30 21:20:21 2015 -0500 > > Complete update to new PetscOptions interface > balay at asterix /home/balay/petsc (hzhang/update-networkex=) > $ gitk c5929fdf > > etc.. > > Satish > > > On Thu, 14 Jan 2016, Gideon Simpson wrote: > >> I know I did a git pull recently, but when did that change? What?s the fifth argument represent? >> >> -gideon >> >>> On Jan 13, 2016, at 11:54 PM, Satish Balay wrote: >>> >>> On Wed, 13 Jan 2016, Gideon Simpson wrote: >>> >>>> I haven?t seen this before: >>>> >>>> /mnt/HA/opt/openmpi/intel/2015/1.8.1-mlnx-ofed/bin/mpicc -o fixed_batch.o -c -fPIC -wd1572 -g -I/home/simpson/software/petsc/include -I/home/simpson/software/petsc/arch-linux2-c-debug/include -I/mnt/HA/opt/openmpi/intel/2015/1.8.1-mlnx-ofed/include -Wall `pwd`/fixed_batch.c >>>> /home/simpson/projects/dnls/petsc/fixed_batch.c(44): warning #167: argument of type "PetscScalar={PetscReal={double}} *" is incompatible with parameter of type "const char *" >>>> PetscOptionsGetScalar(NULL,"-xmax",&xmax,NULL); >>>> ^ >>>> >>>> /home/simpson/projects/dnls/petsc/fixed_batch.c(44): error #165: too few arguments in function call >>>> PetscOptionsGetScalar(NULL,"-xmax",&xmax,NULL); >>> >>> Try: >>> >>> PetscOptionsGetScalar(NULL,NULL,"-xmax",&xmax,NULL); >>> >>> Satish >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Thu Jan 14 11:46:35 2016 From: balay at mcs.anl.gov (Satish Balay) Date: Thu, 14 Jan 2016 11:46:35 -0600 Subject: [petsc-users] compiler error In-Reply-To: <128DCD06-7F08-41A0-A581-CD081A14B429@gmail.com> References: <18F5EB28-AE2E-4E28-B95E-2D6AD1DBECEE@gmail.com> <128DCD06-7F08-41A0-A581-CD081A14B429@gmail.com> Message-ID: future full release [3.7] will be from 'master' branch. future patch fix release [3.6.x] will be from 'maint' branch. You can choose the branch to use - based on your need.. Satish On Thu, 14 Jan 2016, Gideon Simpson wrote: > Is this change going to be part of the next patch release, or the eventual 3.7? > > -gideon > > > On Jan 14, 2016, at 10:42 AM, Satish Balay wrote: > > > > Hopefully all changes should be documented in the changes file.. > > > > http://www.mcs.anl.gov/petsc/documentation/changes/dev.html > > > > You can use git to find out more info.. > > > > balay at asterix /home/balay/petsc (master=) > > $ git grep PetscOptionsGetScalar include/ > > include/petscoptions.h:PETSC_EXTERN PetscErrorCode PetscOptionsGetScalar(PetscOptions,const char[],const char[],PetscScalar *,PetscBool *); > > include/petscoptions.h:PETSC_EXTERN PetscErrorCode PetscOptionsGetScalarArray(PetscOptions,const char[],const char[],PetscScalar[],PetscInt *,PetscBool *); > > balay at asterix /home/balay/petsc (hzhang/update-networkex=) > > $ git annotate include/petscoptions.h |grep PetscOptionsGetScalar > > c5929fdf (Barry Smith 2015-10-30 21:20:21 -0500 17)PETSC_EXTERN PetscErrorCode PetscOptionsGetScalar(PetscOptions,const char[],const char[],PetscScalar *,PetscBool *); > > c5929fdf (Barry Smith 2015-10-30 21:20:21 -0500 20)PETSC_EXTERN PetscErrorCode PetscOptionsGetScalarArray(PetscOptions,const char[],const char[],PetscScalar[],PetscInt *,PetscBool *); > > balay at asterix /home/balay/petsc (hzhang/update-networkex=) > > $ git show -q c5929fdf > > commit c5929fdf3082647d199855a5c1d0286204349b03 > > Author: Barry Smith > > Date: Fri Oct 30 21:20:21 2015 -0500 > > > > Complete update to new PetscOptions interface > > balay at asterix /home/balay/petsc (hzhang/update-networkex=) > > $ gitk c5929fdf > > > > etc.. > > > > Satish > > > > > > On Thu, 14 Jan 2016, Gideon Simpson wrote: > > > >> I know I did a git pull recently, but when did that change? What?s the fifth argument represent? > >> > >> -gideon > >> > >>> On Jan 13, 2016, at 11:54 PM, Satish Balay wrote: > >>> > >>> On Wed, 13 Jan 2016, Gideon Simpson wrote: > >>> > >>>> I haven?t seen this before: > >>>> > >>>> /mnt/HA/opt/openmpi/intel/2015/1.8.1-mlnx-ofed/bin/mpicc -o fixed_batch.o -c -fPIC -wd1572 -g -I/home/simpson/software/petsc/include -I/home/simpson/software/petsc/arch-linux2-c-debug/include -I/mnt/HA/opt/openmpi/intel/2015/1.8.1-mlnx-ofed/include -Wall `pwd`/fixed_batch.c > >>>> /home/simpson/projects/dnls/petsc/fixed_batch.c(44): warning #167: argument of type "PetscScalar={PetscReal={double}} *" is incompatible with parameter of type "const char *" > >>>> PetscOptionsGetScalar(NULL,"-xmax",&xmax,NULL); > >>>> ^ > >>>> > >>>> /home/simpson/projects/dnls/petsc/fixed_batch.c(44): error #165: too few arguments in function call > >>>> PetscOptionsGetScalar(NULL,"-xmax",&xmax,NULL); > >>> > >>> Try: > >>> > >>> PetscOptionsGetScalar(NULL,NULL,"-xmax",&xmax,NULL); > >>> > >>> Satish > >> > >> > > From knepley at gmail.com Thu Jan 14 11:52:44 2016 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 14 Jan 2016 11:52:44 -0600 Subject: [petsc-users] compiler error In-Reply-To: <128DCD06-7F08-41A0-A581-CD081A14B429@gmail.com> References: <18F5EB28-AE2E-4E28-B95E-2D6AD1DBECEE@gmail.com> <128DCD06-7F08-41A0-A581-CD081A14B429@gmail.com> Message-ID: On Thu, Jan 14, 2016 at 11:40 AM, Gideon Simpson wrote: > Is this change going to be part of the next patch release, or the eventual > 3.7? > Its in master, so it will be 3.7 Thanks, Matt > -gideon > > On Jan 14, 2016, at 10:42 AM, Satish Balay wrote: > > Hopefully all changes should be documented in the changes file.. > > http://www.mcs.anl.gov/petsc/documentation/changes/dev.html > > You can use git to find out more info.. > > balay at asterix /home/balay/petsc (master=) > $ git grep PetscOptionsGetScalar include/ > include/petscoptions.h:PETSC_EXTERN PetscErrorCode > PetscOptionsGetScalar(PetscOptions,const char[],const char[],PetscScalar > *,PetscBool *); > include/petscoptions.h:PETSC_EXTERN PetscErrorCode > PetscOptionsGetScalarArray(PetscOptions,const char[],const > char[],PetscScalar[],PetscInt *,PetscBool *); > balay at asterix /home/balay/petsc (hzhang/update-networkex=) > $ git annotate include/petscoptions.h |grep PetscOptionsGetScalar > c5929fdf (Barry Smith 2015-10-30 21:20:21 -0500 > 17)PETSC_EXTERN PetscErrorCode PetscOptionsGetScalar(PetscOptions,const > char[],const char[],PetscScalar *,PetscBool *); > c5929fdf (Barry Smith 2015-10-30 21:20:21 -0500 > 20)PETSC_EXTERN PetscErrorCode > PetscOptionsGetScalarArray(PetscOptions,const char[],const > char[],PetscScalar[],PetscInt *,PetscBool *); > balay at asterix /home/balay/petsc (hzhang/update-networkex=) > $ git show -q c5929fdf > commit c5929fdf3082647d199855a5c1d0286204349b03 > Author: Barry Smith > Date: Fri Oct 30 21:20:21 2015 -0500 > > Complete update to new PetscOptions interface > balay at asterix /home/balay/petsc (hzhang/update-networkex=) > $ gitk c5929fdf > > etc.. > > Satish > > > On Thu, 14 Jan 2016, Gideon Simpson wrote: > > I know I did a git pull recently, but when did that change? What?s the > fifth argument represent? > > -gideon > > On Jan 13, 2016, at 11:54 PM, Satish Balay wrote: > > On Wed, 13 Jan 2016, Gideon Simpson wrote: > > I haven?t seen this before: > > /mnt/HA/opt/openmpi/intel/2015/1.8.1-mlnx-ofed/bin/mpicc -o fixed_batch.o > -c -fPIC -wd1572 -g -I/home/simpson/software/petsc/include > -I/home/simpson/software/petsc/arch-linux2-c-debug/include > -I/mnt/HA/opt/openmpi/intel/2015/1.8.1-mlnx-ofed/include -Wall > `pwd`/fixed_batch.c > /home/simpson/projects/dnls/petsc/fixed_batch.c(44): warning #167: > argument of type "PetscScalar={PetscReal={double}} *" is incompatible with > parameter of type "const char *" > PetscOptionsGetScalar(NULL,"-xmax",&xmax,NULL); > ^ > > /home/simpson/projects/dnls/petsc/fixed_batch.c(44): error #165: too few > arguments in function call > PetscOptionsGetScalar(NULL,"-xmax",&xmax,NULL); > > > Try: > > PetscOptionsGetScalar(NULL,NULL,"-xmax",&xmax,NULL); > > Satish > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu Jan 14 12:50:50 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 14 Jan 2016 12:50:50 -0600 Subject: [petsc-users] Why use MATMPIBAIJ? In-Reply-To: References: <2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov> Message-ID: > On Jan 14, 2016, at 5:04 AM, Hoang Giang Bui wrote: > > This is a very interesting thread because use of block matrix improves the performance of AMG a lot. In my case is the elasticity problem. > > One more question I like to ask, which is more on the performance of the solver. That if I have a coupled problem, says the point block is [u_x u_y u_z p] in which entries of p block in stiffness matrix is in a much smaller scale than u (p~1e-6, u~1e+8), then AMG with hypre in PETSc still scale? Also, is there a utility in PETSc which does automatic scaling of variables? We highly recommend scaling in your MODEL (as much as possible) to have similar scaling of the various variables see https://en.wikipedia.org/wiki/Nondimensionalization. The problem with trying to do the scaling numerically after you have discretized your model is that the effect of the finite arithmetic as you "rescale" means that you lose possibly all the accuracy during the rescaling. For example say your "badly scaled" matrix J has a condition number of 1.e15; now you apply a numerical algorithm to "rescale" the variables to get a much better conditioned matrix. Since the accuracy of the numerical algorithm depends on the conditioning of J it will give you essentially no digits correct (due to the finite arithmetic) in your new J prime and in the transformation between your new and old variables. Barry > > Giang > > On Thu, Jan 14, 2016 at 7:13 AM, Justin Chang wrote: > Okay that makes sense, thanks > > On Wed, Jan 13, 2016 at 10:12 PM, Barry Smith wrote: > > > On Jan 13, 2016, at 10:24 PM, Justin Chang wrote: > > > > Thanks Barry, > > > > 1) So for block matrices, the ja array is smaller. But what's the "hardware" explanation for this performance improvement? Does it have to do with spatial locality where you are more likely to reuse data in that ja array, or does it have to do with the fact that loading/storing smaller arrays are less likely to invoke a cache miss, thus reducing the amount of bandwidth? > > There are two distinct reasons for the improvement: > > 1) For 5 by 5 blocks the ja array is 1/25th the size. The "hardware" savings is that you have to load something that is much smaller than before. Cache/spatial locality have nothing to do with this particular improvement. > > 2) The other improvement comes from the reuse of each x[j] value multiplied by 5 values (a column) of the little block. The hardware explanation is that x[j] can be reused in a register for the 5 multiplies (while otherwise it would have to come from cache to register 5 times and sometimes might even have been flushed from the cache so would have to come from memory). This is why we have code like > > for (j=0; j xb = x + 5*(*idx++); > x1 = xb[0]; x2 = xb[1]; x3 = xb[2]; x4 = xb[3]; x5 = xb[4]; > sum1 += v[0]*x1 + v[5]*x2 + v[10]*x3 + v[15]*x4 + v[20]*x5; > sum2 += v[1]*x1 + v[6]*x2 + v[11]*x3 + v[16]*x4 + v[21]*x5; > sum3 += v[2]*x1 + v[7]*x2 + v[12]*x3 + v[17]*x4 + v[22]*x5; > sum4 += v[3]*x1 + v[8]*x2 + v[13]*x3 + v[18]*x4 + v[23]*x5; > sum5 += v[4]*x1 + v[9]*x2 + v[14]*x3 + v[19]*x4 + v[24]*x5; > v += 25; > } > > to do the block multiple. > > > > > 2) So if one wants to assemble a monolithic matrix (i.e., aggregation of more than one dof per point) then using the BAIJ format is highly advisable. But if I want to form a nested matrix, say I am solving Stokes equation, then each "submatrix" is of AIJ format? Can these sub matrices also be BAIJ? > > Sure, but if you have separated all the variables of pressure, velocity_x, velocity_y, etc into there own regions of the vector then the block size for the sub matrices would be 1 so BAIJ does not help. > > There are Stokes solvers that use Vanka smoothing that keep the variables interlaced and hence would use BAIJ and NOT use fieldsplit > > > > > > Thanks, > > Justin > > > > On Wed, Jan 13, 2016 at 9:12 PM, Barry Smith wrote: > > > > > On Jan 13, 2016, at 9:57 PM, Justin Chang wrote: > > > > > > Hi all, > > > > > > 1) I am guessing MATMPIBAIJ could theoretically have better performance than simply using MATMPIAIJ. Why is that? Is it similar to the reasoning that block (dense) matrix-vector multiply is "faster" than simple matrix-vector? > > > > See for example table 1 in http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.38.7668&rep=rep1&type=pdf > > > > > > > > 2) I am looking through the manual and online documentation and it seems the term "block" used everywhere. In the section on "block matrices" (3.1.3 of the manual), it refers to field splitting, where you could either have a monolithic matrix or a nested matrix. Does that concept have anything to do with MATMPIBAIJ? > > > > Unfortunately the numerical analysis literature uses the term block in multiple ways. For small blocks, sometimes called "point-block" with BAIJ and for very large blocks (where the blocks are sparse themselves). I used fieldsplit for big sparse blocks to try to avoid confusion in PETSc. > > > > > > It makes sense to me that one could create a BAIJ where if you have 5 dofs of the same type of physics (e.g., five different primary species of a geochemical reaction) per grid point, you could create a block size of 5. And if you have different physics (e.g., velocity and pressure) you would ideally want to separate them out (i.e., nested matrices) for better preconditioning. > > > > Sometimes you put them together with BAIJ and sometimes you keep them separate with nested matrices. > > > > > > > > Thanks, > > > Justin > > > > > > > From jed at jedbrown.org Thu Jan 14 12:57:37 2016 From: jed at jedbrown.org (Jed Brown) Date: Thu, 14 Jan 2016 11:57:37 -0700 Subject: [petsc-users] Why use MATMPIBAIJ? In-Reply-To: References: <2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov> Message-ID: <87fuy07zvi.fsf@jedbrown.org> Hoang Giang Bui writes: > One more question I like to ask, which is more on the performance of the > solver. That if I have a coupled problem, says the point block is [u_x u_y > u_z p] in which entries of p block in stiffness matrix is in a much smaller > scale than u (p~1e-6, u~1e+8), then AMG with hypre in PETSc still scale? You should scale the model (as Barry says). But the names of your variables suggest that the system is a saddle point problem, in which case there's a good chance AMG won't work at all. For example, BoomerAMG produces a singular preconditioner in similar contexts, such that the preconditioned residual drops smoothly while the true residual stagnates (the equations are not solved at all). So be vary careful if you think it's "working". -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From bsmith at mcs.anl.gov Thu Jan 14 13:08:15 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 14 Jan 2016 13:08:15 -0600 Subject: [petsc-users] Why use MATMPIBAIJ? In-Reply-To: <87fuy07zvi.fsf@jedbrown.org> References: <2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov> <87fuy07zvi.fsf@jedbrown.org> Message-ID: > On Jan 14, 2016, at 12:57 PM, Jed Brown wrote: > > Hoang Giang Bui writes: >> One more question I like to ask, which is more on the performance of the >> solver. That if I have a coupled problem, says the point block is [u_x u_y >> u_z p] in which entries of p block in stiffness matrix is in a much smaller >> scale than u (p~1e-6, u~1e+8), then AMG with hypre in PETSc still scale? > > You should scale the model (as Barry says). But the names of your > variables suggest that the system is a saddle point problem, in which > case there's a good chance AMG won't work at all. For example, > BoomerAMG produces a singular preconditioner in similar contexts, such > that the preconditioned residual drops smoothly while the true residual > stagnates (the equations are not solved at all). So be vary careful if > you think it's "working". The PCFIEDSPLIT preconditioner is designed for helping to solve saddle point problems. Barry From bsmith at mcs.anl.gov Thu Jan 14 13:24:25 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 14 Jan 2016 13:24:25 -0600 Subject: [petsc-users] HPCToolKit/HPCViewer on OS X In-Reply-To: References: <669480D3-5CB2-4A0E-B6BD-FF4B702BB5A7@ad.unc.edu> <6780B32F-E05B-4FDB-9671-1A0C963B3B85@ad.unc.edu> <2A7F8BF5-7E17-4774-A13E-E75394109F95@email.unc.edu> Message-ID: <3586468C-25DB-46B6-B9E6-04853D234025@mcs.anl.gov> Matt is right, there is a lot of "missing" time from the output. Please send the output from -ksp_view so we can see exactly what solver is being used. From the output we have: Nonlinear solver 78 % of the time (so your "setup code" outside of PETSC is taking about 22% of the time) Linear solver 77 % of the time (this is reasonable pretty much the entire cost of the nonlinear solve is the linear solve) Time to set up the preconditioner is 19% (10 + 9) Time of iteration in KSP 35 % (this is the sum of the vector operations and MatMult() and MatSolve()) So 77 - (19 + 35) = 23 % unexplained time inside the linear solver (custom preconditioner???) Also getting the results with Instruments or HPCToolkit would be useful (so long as we don't need to install HPCTool ourselves to see the results). Barry > On Jan 14, 2016, at 1:26 AM, Bhalla, Amneet Pal S wrote: > > > >> On Jan 13, 2016, at 9:17 PM, Griffith, Boyce Eugene wrote: >> >> I see one hot spot: > > > Here is with opt build > > ************************************************************************************************************************ > *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** > ************************************************************************************************************************ > > ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- > > ./main2d on a linux-opt named aorta with 1 processor, by amneetb Thu Jan 14 02:24:43 2016 > Using Petsc Development GIT revision: v3.6.3-3098-ga3ecda2 GIT Date: 2016-01-13 21:30:26 -0600 > > Max Max/Min Avg Total > Time (sec): 1.018e+00 1.00000 1.018e+00 > Objects: 2.935e+03 1.00000 2.935e+03 > Flops: 4.957e+08 1.00000 4.957e+08 4.957e+08 > Flops/sec: 4.868e+08 1.00000 4.868e+08 4.868e+08 > MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00 > MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00 > MPI Reductions: 0.000e+00 0.00000 > > Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) > e.g., VecAXPY() for real vectors of length N --> 2N flops > and VecAXPY() for complex vectors of length N --> 8N flops > > Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- > Avg %Total Avg %Total counts %Total Avg %Total counts %Total > 0: Main Stage: 1.0183e+00 100.0% 4.9570e+08 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% > > ------------------------------------------------------------------------------------------------------------------------ > See the 'Profiling' chapter of the users' manual for details on interpreting output. > Phase summary info: > Count: number of times phase was executed > Time and Flops: Max - maximum over all processors > Ratio - ratio of maximum to minimum over all processors > Mess: number of messages sent > Avg. len: average message length (bytes) > Reduct: number of global reductions > Global: entire computation > Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). > %T - percent time in this phase %F - percent flops in this phase > %M - percent messages in this phase %L - percent message lengths in this phase > %R - percent reductions in this phase > Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) > ------------------------------------------------------------------------------------------------------------------------ > Event Count Time (sec) Flops --- Global --- --- Stage --- Total > Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > ------------------------------------------------------------------------------------------------------------------------ > > --- Event Stage 0: Main Stage > > VecDot 4 1.0 2.9564e-05 1.0 3.31e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1120 > VecDotNorm2 272 1.0 1.4565e-03 1.0 4.25e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 2920 > VecMDot 624 1.0 8.4300e-03 1.0 5.29e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 627 > VecNorm 565 1.0 3.8033e-03 1.0 4.38e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1151 > VecScale 86 1.0 5.5480e-04 1.0 1.55e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 279 > VecCopy 28 1.0 5.2261e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecSet 14567 1.0 1.2443e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > VecAXPY 903 1.0 4.2996e-03 1.0 6.66e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1550 > VecAYPX 225 1.0 1.2550e-03 1.0 8.55e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 681 > VecAXPBYCZ 42 1.0 1.7118e-04 1.0 3.45e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2014 > VecWAXPY 70 1.0 1.9503e-04 1.0 2.98e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1528 > VecMAXPY 641 1.0 1.1136e-02 1.0 5.29e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 475 > VecSwap 135 1.0 4.5896e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecAssemblyBegin 745 1.0 4.9477e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecAssemblyEnd 745 1.0 9.2411e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecScatterBegin 40831 1.0 3.4502e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 > BuildTwoSidedF 738 1.0 2.6712e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatMult 513 1.0 9.1235e-02 1.0 7.75e+07 1.0 0.0e+00 0.0e+00 0.0e+00 9 16 0 0 0 9 16 0 0 0 849 > MatSolve 13568 1.0 2.3605e-01 1.0 3.45e+08 1.0 0.0e+00 0.0e+00 0.0e+00 23 70 0 0 0 23 70 0 0 0 1460 > MatLUFactorSym 84 1.0 3.7430e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 4 0 0 0 0 4 0 0 0 0 0 > MatLUFactorNum 85 1.0 3.9623e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00 4 8 0 0 0 4 8 0 0 0 1058 > MatILUFactorSym 1 1.0 3.3617e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatScale 4 1.0 2.5511e-04 1.0 2.51e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 984 > MatAssemblyBegin 108 1.0 6.3658e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyEnd 108 1.0 2.9490e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatGetRow 33120 1.0 2.0157e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 > MatGetRowIJ 85 1.0 1.2145e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatGetSubMatrice 4 1.0 8.4379e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > MatGetOrdering 85 1.0 7.7887e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > MatAXPY 4 1.0 4.9596e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 5 0 0 0 0 5 0 0 0 0 0 > MatPtAP 4 1.0 4.4426e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 0.0e+00 4 1 0 0 0 4 1 0 0 0 112 > MatPtAPSymbolic 4 1.0 2.7664e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 > MatPtAPNumeric 4 1.0 1.6732e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 298 > MatGetSymTrans 4 1.0 3.6621e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPGMRESOrthog 16 1.0 9.7778e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > KSPSetUp 90 1.0 5.7650e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSolve 1 1.0 7.8831e-01 1.0 4.90e+08 1.0 0.0e+00 0.0e+00 0.0e+00 77 99 0 0 0 77 99 0 0 0 622 > PCSetUp 90 1.0 9.9725e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00 10 8 0 0 0 10 8 0 0 0 420 > PCSetUpOnBlocks 112 1.0 8.7547e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00 9 8 0 0 0 9 8 0 0 0 479 > PCApply 16 1.0 7.1952e-01 1.0 4.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00 71 99 0 0 0 71 99 0 0 0 680 > SNESSolve 1 1.0 7.9225e-01 1.0 4.90e+08 1.0 0.0e+00 0.0e+00 0.0e+00 78 99 0 0 0 78 99 0 0 0 619 > SNESFunctionEval 2 1.0 3.2940e-03 1.0 4.68e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 14 > SNESJacobianEval 1 1.0 4.7255e-04 1.0 4.26e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 9 > ------------------------------------------------------------------------------------------------------------------------ > > Memory usage is given in bytes: > > Object Type Creations Destructions Memory Descendants' Mem. > Reports information only for process 0. > > --- Event Stage 0: Main Stage > > Vector 971 839 15573352 0. > Vector Scatter 290 289 189584 0. > Index Set 1171 823 951928 0. > IS L to G Mapping 110 109 2156656 0. > Application Order 6 6 99952 0. > MatMFFD 1 1 776 0. > Matrix 189 189 24083332 0. > Matrix Null Space 4 4 2432 0. > Krylov Solver 90 90 122720 0. > DMKSP interface 1 1 648 0. > Preconditioner 90 90 89872 0. > SNES 1 1 1328 0. > SNESLineSearch 1 1 984 0. > DMSNES 1 1 664 0. > Distributed Mesh 2 2 9168 0. > Star Forest Bipartite Graph 4 4 3168 0. > Discrete System 2 2 1712 0. > Viewer 1 0 0 0. > ======================================================================================================================== > Average time to get PetscTime(): 9.53674e-07 > #PETSc Option Table entries: > -ib_ksp_converged_reason > -ib_ksp_monitor_true_residual > -ib_snes_type ksponly > -log_summary > -stokes_ib_pc_level_0_sub_pc_factor_nonzeros_along_diagonal > -stokes_ib_pc_level_0_sub_pc_type ilu > -stokes_ib_pc_level_ksp_richardson_self_scale > -stokes_ib_pc_level_ksp_type richardson > -stokes_ib_pc_level_pc_asm_local_type additive > -stokes_ib_pc_level_sub_pc_factor_nonzeros_along_diagonal > -stokes_ib_pc_level_sub_pc_type lu > #End of PETSc Option Table entries > Compiled without FORTRAN kernels > Compiled with full precision matrices (default) > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 > Configure options: --CC=mpicc --CXX=mpicxx --FC=mpif90 --with-default-arch=0 --PETSC_ARCH=linux-opt --with-debugging=0 --with-c++-support=1 --with-hypre=1 --download-hypre=1 --with-hdf5=yes --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3 > ----------------------------------------- > Libraries compiled on Thu Jan 14 01:29:56 2016 on aorta > Machine characteristics: Linux-3.13.0-63-generic-x86_64-with-Ubuntu-14.04-trusty > Using PETSc directory: /not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc > Using PETSc arch: linux-opt > ----------------------------------------- > > Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -Qunused-arguments -O3 ${COPTFLAGS} ${CFLAGS} > Using Fortran compiler: mpif90 -fPIC -Wall -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -O3 ${FOPTFLAGS} ${FFLAGS} > ----------------------------------------- > > Using include paths: -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/include -I/not_backed_up/softwares/MPICH/include > ----------------------------------------- > > Using C linker: mpicc > Using Fortran linker: mpif90 > Using libraries: -Wl,-rpath,/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -L/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -lpetsc -Wl,-rpath,/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -L/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -lHYPRE -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -L/not_backed_up/softwares/MPICH/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpicxx -lstdc++ -llapack -lblas -lpthread -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lX11 -lm -lmpifort -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpicxx -lstdc++ -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -L/not_backed_up/softwares/MPICH/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -ldl -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -lmpi -lgcc_s -ldl > ----------------------------------------- From boyceg at email.unc.edu Thu Jan 14 14:01:10 2016 From: boyceg at email.unc.edu (Griffith, Boyce Eugene) Date: Thu, 14 Jan 2016 20:01:10 +0000 Subject: [petsc-users] HPCToolKit/HPCViewer on OS X In-Reply-To: <3586468C-25DB-46B6-B9E6-04853D234025@mcs.anl.gov> References: <669480D3-5CB2-4A0E-B6BD-FF4B702BB5A7@ad.unc.edu> <6780B32F-E05B-4FDB-9671-1A0C963B3B85@ad.unc.edu> <2A7F8BF5-7E17-4774-A13E-E75394109F95@email.unc.edu> <3586468C-25DB-46B6-B9E6-04853D234025@mcs.anl.gov> Message-ID: <61E5EA75-AC79-4CB4-8114-645D978EEEC6@email.unc.edu> > On Jan 14, 2016, at 2:24 PM, Barry Smith wrote: > > > Matt is right, there is a lot of "missing" time from the output. Please send the output from -ksp_view so we can see exactly what solver is being used. > > From the output we have: > > Nonlinear solver 78 % of the time (so your "setup code" outside of PETSC is taking about 22% of the time) > Linear solver 77 % of the time (this is reasonable pretty much the entire cost of the nonlinear solve is the linear solve) > Time to set up the preconditioner is 19% (10 + 9) > Time of iteration in KSP 35 % (this is the sum of the vector operations and MatMult() and MatSolve()) > > So 77 - (19 + 35) = 23 % unexplained time inside the linear solver (custom preconditioner???) > > Also getting the results with Instruments or HPCToolkit would be useful (so long as we don't need to install HPCTool ourselves to see the results). Thanks, Barry (& Matt & Dave) --- This is a solver that is mixing some matrix-based stuff implemented using PETSc along with some matrix-free stuff that is built on top of SAMRAI. Amneet and I should take a look at performance off-list first. -- Boyce > > > Barry > >> On Jan 14, 2016, at 1:26 AM, Bhalla, Amneet Pal S wrote: >> >> >> >>> On Jan 13, 2016, at 9:17 PM, Griffith, Boyce Eugene wrote: >>> >>> I see one hot spot: >> >> >> Here is with opt build >> >> ************************************************************************************************************************ >> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** >> ************************************************************************************************************************ >> >> ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- >> >> ./main2d on a linux-opt named aorta with 1 processor, by amneetb Thu Jan 14 02:24:43 2016 >> Using Petsc Development GIT revision: v3.6.3-3098-ga3ecda2 GIT Date: 2016-01-13 21:30:26 -0600 >> >> Max Max/Min Avg Total >> Time (sec): 1.018e+00 1.00000 1.018e+00 >> Objects: 2.935e+03 1.00000 2.935e+03 >> Flops: 4.957e+08 1.00000 4.957e+08 4.957e+08 >> Flops/sec: 4.868e+08 1.00000 4.868e+08 4.868e+08 >> MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00 >> MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00 >> MPI Reductions: 0.000e+00 0.00000 >> >> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) >> e.g., VecAXPY() for real vectors of length N --> 2N flops >> and VecAXPY() for complex vectors of length N --> 8N flops >> >> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- >> Avg %Total Avg %Total counts %Total Avg %Total counts %Total >> 0: Main Stage: 1.0183e+00 100.0% 4.9570e+08 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% >> >> ------------------------------------------------------------------------------------------------------------------------ >> See the 'Profiling' chapter of the users' manual for details on interpreting output. >> Phase summary info: >> Count: number of times phase was executed >> Time and Flops: Max - maximum over all processors >> Ratio - ratio of maximum to minimum over all processors >> Mess: number of messages sent >> Avg. len: average message length (bytes) >> Reduct: number of global reductions >> Global: entire computation >> Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). >> %T - percent time in this phase %F - percent flops in this phase >> %M - percent messages in this phase %L - percent message lengths in this phase >> %R - percent reductions in this phase >> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) >> ------------------------------------------------------------------------------------------------------------------------ >> Event Count Time (sec) Flops --- Global --- --- Stage --- Total >> Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >> ------------------------------------------------------------------------------------------------------------------------ >> >> --- Event Stage 0: Main Stage >> >> VecDot 4 1.0 2.9564e-05 1.0 3.31e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1120 >> VecDotNorm2 272 1.0 1.4565e-03 1.0 4.25e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 2920 >> VecMDot 624 1.0 8.4300e-03 1.0 5.29e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 627 >> VecNorm 565 1.0 3.8033e-03 1.0 4.38e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1151 >> VecScale 86 1.0 5.5480e-04 1.0 1.55e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 279 >> VecCopy 28 1.0 5.2261e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecSet 14567 1.0 1.2443e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >> VecAXPY 903 1.0 4.2996e-03 1.0 6.66e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1550 >> VecAYPX 225 1.0 1.2550e-03 1.0 8.55e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 681 >> VecAXPBYCZ 42 1.0 1.7118e-04 1.0 3.45e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2014 >> VecWAXPY 70 1.0 1.9503e-04 1.0 2.98e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1528 >> VecMAXPY 641 1.0 1.1136e-02 1.0 5.29e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 475 >> VecSwap 135 1.0 4.5896e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecAssemblyBegin 745 1.0 4.9477e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecAssemblyEnd 745 1.0 9.2411e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecScatterBegin 40831 1.0 3.4502e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 >> BuildTwoSidedF 738 1.0 2.6712e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatMult 513 1.0 9.1235e-02 1.0 7.75e+07 1.0 0.0e+00 0.0e+00 0.0e+00 9 16 0 0 0 9 16 0 0 0 849 >> MatSolve 13568 1.0 2.3605e-01 1.0 3.45e+08 1.0 0.0e+00 0.0e+00 0.0e+00 23 70 0 0 0 23 70 0 0 0 1460 >> MatLUFactorSym 84 1.0 3.7430e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 4 0 0 0 0 4 0 0 0 0 0 >> MatLUFactorNum 85 1.0 3.9623e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00 4 8 0 0 0 4 8 0 0 0 1058 >> MatILUFactorSym 1 1.0 3.3617e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatScale 4 1.0 2.5511e-04 1.0 2.51e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 984 >> MatAssemblyBegin 108 1.0 6.3658e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatAssemblyEnd 108 1.0 2.9490e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatGetRow 33120 1.0 2.0157e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >> MatGetRowIJ 85 1.0 1.2145e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatGetSubMatrice 4 1.0 8.4379e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >> MatGetOrdering 85 1.0 7.7887e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >> MatAXPY 4 1.0 4.9596e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 5 0 0 0 0 5 0 0 0 0 0 >> MatPtAP 4 1.0 4.4426e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 0.0e+00 4 1 0 0 0 4 1 0 0 0 112 >> MatPtAPSymbolic 4 1.0 2.7664e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 >> MatPtAPNumeric 4 1.0 1.6732e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 298 >> MatGetSymTrans 4 1.0 3.6621e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> KSPGMRESOrthog 16 1.0 9.7778e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >> KSPSetUp 90 1.0 5.7650e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> KSPSolve 1 1.0 7.8831e-01 1.0 4.90e+08 1.0 0.0e+00 0.0e+00 0.0e+00 77 99 0 0 0 77 99 0 0 0 622 >> PCSetUp 90 1.0 9.9725e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00 10 8 0 0 0 10 8 0 0 0 420 >> PCSetUpOnBlocks 112 1.0 8.7547e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00 9 8 0 0 0 9 8 0 0 0 479 >> PCApply 16 1.0 7.1952e-01 1.0 4.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00 71 99 0 0 0 71 99 0 0 0 680 >> SNESSolve 1 1.0 7.9225e-01 1.0 4.90e+08 1.0 0.0e+00 0.0e+00 0.0e+00 78 99 0 0 0 78 99 0 0 0 619 >> SNESFunctionEval 2 1.0 3.2940e-03 1.0 4.68e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 14 >> SNESJacobianEval 1 1.0 4.7255e-04 1.0 4.26e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 9 >> ------------------------------------------------------------------------------------------------------------------------ >> >> Memory usage is given in bytes: >> >> Object Type Creations Destructions Memory Descendants' Mem. >> Reports information only for process 0. >> >> --- Event Stage 0: Main Stage >> >> Vector 971 839 15573352 0. >> Vector Scatter 290 289 189584 0. >> Index Set 1171 823 951928 0. >> IS L to G Mapping 110 109 2156656 0. >> Application Order 6 6 99952 0. >> MatMFFD 1 1 776 0. >> Matrix 189 189 24083332 0. >> Matrix Null Space 4 4 2432 0. >> Krylov Solver 90 90 122720 0. >> DMKSP interface 1 1 648 0. >> Preconditioner 90 90 89872 0. >> SNES 1 1 1328 0. >> SNESLineSearch 1 1 984 0. >> DMSNES 1 1 664 0. >> Distributed Mesh 2 2 9168 0. >> Star Forest Bipartite Graph 4 4 3168 0. >> Discrete System 2 2 1712 0. >> Viewer 1 0 0 0. >> ======================================================================================================================== >> Average time to get PetscTime(): 9.53674e-07 >> #PETSc Option Table entries: >> -ib_ksp_converged_reason >> -ib_ksp_monitor_true_residual >> -ib_snes_type ksponly >> -log_summary >> -stokes_ib_pc_level_0_sub_pc_factor_nonzeros_along_diagonal >> -stokes_ib_pc_level_0_sub_pc_type ilu >> -stokes_ib_pc_level_ksp_richardson_self_scale >> -stokes_ib_pc_level_ksp_type richardson >> -stokes_ib_pc_level_pc_asm_local_type additive >> -stokes_ib_pc_level_sub_pc_factor_nonzeros_along_diagonal >> -stokes_ib_pc_level_sub_pc_type lu >> #End of PETSc Option Table entries >> Compiled without FORTRAN kernels >> Compiled with full precision matrices (default) >> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 >> Configure options: --CC=mpicc --CXX=mpicxx --FC=mpif90 --with-default-arch=0 --PETSC_ARCH=linux-opt --with-debugging=0 --with-c++-support=1 --with-hypre=1 --download-hypre=1 --with-hdf5=yes --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3 >> ----------------------------------------- >> Libraries compiled on Thu Jan 14 01:29:56 2016 on aorta >> Machine characteristics: Linux-3.13.0-63-generic-x86_64-with-Ubuntu-14.04-trusty >> Using PETSc directory: /not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc >> Using PETSc arch: linux-opt >> ----------------------------------------- >> >> Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -Qunused-arguments -O3 ${COPTFLAGS} ${CFLAGS} >> Using Fortran compiler: mpif90 -fPIC -Wall -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -O3 ${FOPTFLAGS} ${FFLAGS} >> ----------------------------------------- >> >> Using include paths: -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/include -I/not_backed_up/softwares/MPICH/include >> ----------------------------------------- >> >> Using C linker: mpicc >> Using Fortran linker: mpif90 >> Using libraries: -Wl,-rpath,/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -L/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -lpetsc -Wl,-rpath,/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -L/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -lHYPRE -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -L/not_backed_up/softwares/MPICH/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpicxx -lstdc++ -llapack -lblas -lpthread -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lX11 -lm -lmpifort -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpicxx -lstdc++ -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -L/not_backed_up/softwares/MPICH/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -ldl -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -lmpi -lgcc_s -ldl >> ----------------------------------------- From bsmith at mcs.anl.gov Thu Jan 14 14:09:12 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 14 Jan 2016 14:09:12 -0600 Subject: [petsc-users] HPCToolKit/HPCViewer on OS X In-Reply-To: <61E5EA75-AC79-4CB4-8114-645D978EEEC6@email.unc.edu> References: <669480D3-5CB2-4A0E-B6BD-FF4B702BB5A7@ad.unc.edu> <6780B32F-E05B-4FDB-9671-1A0C963B3B85@ad.unc.edu> <2A7F8BF5-7E17-4774-A13E-E75394109F95@email.unc.edu> <3586468C-25DB-46B6-B9E6-04853D234025@mcs.anl.gov> <61E5EA75-AC79-4CB4-8114-645D978EEEC6@email.unc.edu> Message-ID: > On Jan 14, 2016, at 2:01 PM, Griffith, Boyce Eugene wrote: > >> >> On Jan 14, 2016, at 2:24 PM, Barry Smith wrote: >> >> >> Matt is right, there is a lot of "missing" time from the output. Please send the output from -ksp_view so we can see exactly what solver is being used. >> >> From the output we have: >> >> Nonlinear solver 78 % of the time (so your "setup code" outside of PETSC is taking about 22% of the time) >> Linear solver 77 % of the time (this is reasonable pretty much the entire cost of the nonlinear solve is the linear solve) >> Time to set up the preconditioner is 19% (10 + 9) >> Time of iteration in KSP 35 % (this is the sum of the vector operations and MatMult() and MatSolve()) >> >> So 77 - (19 + 35) = 23 % unexplained time inside the linear solver (custom preconditioner???) >> >> Also getting the results with Instruments or HPCToolkit would be useful (so long as we don't need to install HPCTool ourselves to see the results). > > Thanks, Barry (& Matt & Dave) --- This is a solver that is mixing some matrix-based stuff implemented using PETSc along with some matrix-free stuff that is built on top of SAMRAI. Amneet and I should take a look at performance off-list first. Just put an PetscLogEvent() in (or several) to track that part. Plus put an event or two outside the SNESSolve to track the outside PETSc setup time. The PETSc time looks reasonable at most I can only image any optimizations we could do bringing it down a small percentage. Barry > > -- Boyce > >> >> >> Barry >> >>> On Jan 14, 2016, at 1:26 AM, Bhalla, Amneet Pal S wrote: >>> >>> >>> >>>> On Jan 13, 2016, at 9:17 PM, Griffith, Boyce Eugene wrote: >>>> >>>> I see one hot spot: >>> >>> >>> Here is with opt build >>> >>> ************************************************************************************************************************ >>> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** >>> ************************************************************************************************************************ >>> >>> ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- >>> >>> ./main2d on a linux-opt named aorta with 1 processor, by amneetb Thu Jan 14 02:24:43 2016 >>> Using Petsc Development GIT revision: v3.6.3-3098-ga3ecda2 GIT Date: 2016-01-13 21:30:26 -0600 >>> >>> Max Max/Min Avg Total >>> Time (sec): 1.018e+00 1.00000 1.018e+00 >>> Objects: 2.935e+03 1.00000 2.935e+03 >>> Flops: 4.957e+08 1.00000 4.957e+08 4.957e+08 >>> Flops/sec: 4.868e+08 1.00000 4.868e+08 4.868e+08 >>> MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00 >>> MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00 >>> MPI Reductions: 0.000e+00 0.00000 >>> >>> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) >>> e.g., VecAXPY() for real vectors of length N --> 2N flops >>> and VecAXPY() for complex vectors of length N --> 8N flops >>> >>> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- >>> Avg %Total Avg %Total counts %Total Avg %Total counts %Total >>> 0: Main Stage: 1.0183e+00 100.0% 4.9570e+08 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% >>> >>> ------------------------------------------------------------------------------------------------------------------------ >>> See the 'Profiling' chapter of the users' manual for details on interpreting output. >>> Phase summary info: >>> Count: number of times phase was executed >>> Time and Flops: Max - maximum over all processors >>> Ratio - ratio of maximum to minimum over all processors >>> Mess: number of messages sent >>> Avg. len: average message length (bytes) >>> Reduct: number of global reductions >>> Global: entire computation >>> Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). >>> %T - percent time in this phase %F - percent flops in this phase >>> %M - percent messages in this phase %L - percent message lengths in this phase >>> %R - percent reductions in this phase >>> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) >>> ------------------------------------------------------------------------------------------------------------------------ >>> Event Count Time (sec) Flops --- Global --- --- Stage --- Total >>> Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >>> ------------------------------------------------------------------------------------------------------------------------ >>> >>> --- Event Stage 0: Main Stage >>> >>> VecDot 4 1.0 2.9564e-05 1.0 3.31e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1120 >>> VecDotNorm2 272 1.0 1.4565e-03 1.0 4.25e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 2920 >>> VecMDot 624 1.0 8.4300e-03 1.0 5.29e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 627 >>> VecNorm 565 1.0 3.8033e-03 1.0 4.38e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1151 >>> VecScale 86 1.0 5.5480e-04 1.0 1.55e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 279 >>> VecCopy 28 1.0 5.2261e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecSet 14567 1.0 1.2443e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>> VecAXPY 903 1.0 4.2996e-03 1.0 6.66e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1550 >>> VecAYPX 225 1.0 1.2550e-03 1.0 8.55e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 681 >>> VecAXPBYCZ 42 1.0 1.7118e-04 1.0 3.45e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2014 >>> VecWAXPY 70 1.0 1.9503e-04 1.0 2.98e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1528 >>> VecMAXPY 641 1.0 1.1136e-02 1.0 5.29e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 475 >>> VecSwap 135 1.0 4.5896e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecAssemblyBegin 745 1.0 4.9477e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecAssemblyEnd 745 1.0 9.2411e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecScatterBegin 40831 1.0 3.4502e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 >>> BuildTwoSidedF 738 1.0 2.6712e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatMult 513 1.0 9.1235e-02 1.0 7.75e+07 1.0 0.0e+00 0.0e+00 0.0e+00 9 16 0 0 0 9 16 0 0 0 849 >>> MatSolve 13568 1.0 2.3605e-01 1.0 3.45e+08 1.0 0.0e+00 0.0e+00 0.0e+00 23 70 0 0 0 23 70 0 0 0 1460 >>> MatLUFactorSym 84 1.0 3.7430e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 4 0 0 0 0 4 0 0 0 0 0 >>> MatLUFactorNum 85 1.0 3.9623e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00 4 8 0 0 0 4 8 0 0 0 1058 >>> MatILUFactorSym 1 1.0 3.3617e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatScale 4 1.0 2.5511e-04 1.0 2.51e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 984 >>> MatAssemblyBegin 108 1.0 6.3658e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatAssemblyEnd 108 1.0 2.9490e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatGetRow 33120 1.0 2.0157e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >>> MatGetRowIJ 85 1.0 1.2145e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatGetSubMatrice 4 1.0 8.4379e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>> MatGetOrdering 85 1.0 7.7887e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>> MatAXPY 4 1.0 4.9596e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 5 0 0 0 0 5 0 0 0 0 0 >>> MatPtAP 4 1.0 4.4426e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 0.0e+00 4 1 0 0 0 4 1 0 0 0 112 >>> MatPtAPSymbolic 4 1.0 2.7664e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 >>> MatPtAPNumeric 4 1.0 1.6732e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 298 >>> MatGetSymTrans 4 1.0 3.6621e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> KSPGMRESOrthog 16 1.0 9.7778e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>> KSPSetUp 90 1.0 5.7650e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> KSPSolve 1 1.0 7.8831e-01 1.0 4.90e+08 1.0 0.0e+00 0.0e+00 0.0e+00 77 99 0 0 0 77 99 0 0 0 622 >>> PCSetUp 90 1.0 9.9725e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00 10 8 0 0 0 10 8 0 0 0 420 >>> PCSetUpOnBlocks 112 1.0 8.7547e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00 9 8 0 0 0 9 8 0 0 0 479 >>> PCApply 16 1.0 7.1952e-01 1.0 4.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00 71 99 0 0 0 71 99 0 0 0 680 >>> SNESSolve 1 1.0 7.9225e-01 1.0 4.90e+08 1.0 0.0e+00 0.0e+00 0.0e+00 78 99 0 0 0 78 99 0 0 0 619 >>> SNESFunctionEval 2 1.0 3.2940e-03 1.0 4.68e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 14 >>> SNESJacobianEval 1 1.0 4.7255e-04 1.0 4.26e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 9 >>> ------------------------------------------------------------------------------------------------------------------------ >>> >>> Memory usage is given in bytes: >>> >>> Object Type Creations Destructions Memory Descendants' Mem. >>> Reports information only for process 0. >>> >>> --- Event Stage 0: Main Stage >>> >>> Vector 971 839 15573352 0. >>> Vector Scatter 290 289 189584 0. >>> Index Set 1171 823 951928 0. >>> IS L to G Mapping 110 109 2156656 0. >>> Application Order 6 6 99952 0. >>> MatMFFD 1 1 776 0. >>> Matrix 189 189 24083332 0. >>> Matrix Null Space 4 4 2432 0. >>> Krylov Solver 90 90 122720 0. >>> DMKSP interface 1 1 648 0. >>> Preconditioner 90 90 89872 0. >>> SNES 1 1 1328 0. >>> SNESLineSearch 1 1 984 0. >>> DMSNES 1 1 664 0. >>> Distributed Mesh 2 2 9168 0. >>> Star Forest Bipartite Graph 4 4 3168 0. >>> Discrete System 2 2 1712 0. >>> Viewer 1 0 0 0. >>> ======================================================================================================================== >>> Average time to get PetscTime(): 9.53674e-07 >>> #PETSc Option Table entries: >>> -ib_ksp_converged_reason >>> -ib_ksp_monitor_true_residual >>> -ib_snes_type ksponly >>> -log_summary >>> -stokes_ib_pc_level_0_sub_pc_factor_nonzeros_along_diagonal >>> -stokes_ib_pc_level_0_sub_pc_type ilu >>> -stokes_ib_pc_level_ksp_richardson_self_scale >>> -stokes_ib_pc_level_ksp_type richardson >>> -stokes_ib_pc_level_pc_asm_local_type additive >>> -stokes_ib_pc_level_sub_pc_factor_nonzeros_along_diagonal >>> -stokes_ib_pc_level_sub_pc_type lu >>> #End of PETSc Option Table entries >>> Compiled without FORTRAN kernels >>> Compiled with full precision matrices (default) >>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 >>> Configure options: --CC=mpicc --CXX=mpicxx --FC=mpif90 --with-default-arch=0 --PETSC_ARCH=linux-opt --with-debugging=0 --with-c++-support=1 --with-hypre=1 --download-hypre=1 --with-hdf5=yes --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3 >>> ----------------------------------------- >>> Libraries compiled on Thu Jan 14 01:29:56 2016 on aorta >>> Machine characteristics: Linux-3.13.0-63-generic-x86_64-with-Ubuntu-14.04-trusty >>> Using PETSc directory: /not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc >>> Using PETSc arch: linux-opt >>> ----------------------------------------- >>> >>> Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -Qunused-arguments -O3 ${COPTFLAGS} ${CFLAGS} >>> Using Fortran compiler: mpif90 -fPIC -Wall -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -O3 ${FOPTFLAGS} ${FFLAGS} >>> ----------------------------------------- >>> >>> Using include paths: -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/include -I/not_backed_up/softwares/MPICH/include >>> ----------------------------------------- >>> >>> Using C linker: mpicc >>> Using Fortran linker: mpif90 >>> Using libraries: -Wl,-rpath,/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -L/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -lpetsc -Wl,-rpath,/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -L/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -lHYPRE -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -L/not_backed_up/softwares/MPICH/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpicxx -lstdc++ -llapack -lblas -lpthread -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lX11 -lm -lmpifort -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpicxx -lstdc++ -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -L/not_backed_up/softwares/MPICH/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -ldl -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -lmpi -lgcc_s -ldl >>> ----------------------------------------- From boyceg at email.unc.edu Thu Jan 14 14:30:54 2016 From: boyceg at email.unc.edu (Griffith, Boyce Eugene) Date: Thu, 14 Jan 2016 20:30:54 +0000 Subject: [petsc-users] HPCToolKit/HPCViewer on OS X In-Reply-To: References: <669480D3-5CB2-4A0E-B6BD-FF4B702BB5A7@ad.unc.edu> <6780B32F-E05B-4FDB-9671-1A0C963B3B85@ad.unc.edu> <2A7F8BF5-7E17-4774-A13E-E75394109F95@email.unc.edu> <3586468C-25DB-46B6-B9E6-04853D234025@mcs.anl.gov> <61E5EA75-AC79-4CB4-8114-645D978EEEC6@email.unc.edu> Message-ID: <376E2C56-9E41-4508-BAE6-9920526820BC@email.unc.edu> On Jan 14, 2016, at 3:09 PM, Barry Smith > wrote: On Jan 14, 2016, at 2:01 PM, Griffith, Boyce Eugene > wrote: On Jan 14, 2016, at 2:24 PM, Barry Smith > wrote: Matt is right, there is a lot of "missing" time from the output. Please send the output from -ksp_view so we can see exactly what solver is being used. >From the output we have: Nonlinear solver 78 % of the time (so your "setup code" outside of PETSC is taking about 22% of the time) Linear solver 77 % of the time (this is reasonable pretty much the entire cost of the nonlinear solve is the linear solve) Time to set up the preconditioner is 19% (10 + 9) Time of iteration in KSP 35 % (this is the sum of the vector operations and MatMult() and MatSolve()) So 77 - (19 + 35) = 23 % unexplained time inside the linear solver (custom preconditioner???) Also getting the results with Instruments or HPCToolkit would be useful (so long as we don't need to install HPCTool ourselves to see the results). Thanks, Barry (& Matt & Dave) --- This is a solver that is mixing some matrix-based stuff implemented using PETSc along with some matrix-free stuff that is built on top of SAMRAI. Amneet and I should take a look at performance off-list first. Just put an PetscLogEvent() in (or several) to track that part. Plus put an event or two outside the SNESSolve to track the outside PETSc setup time. The PETSc time looks reasonable at most I can only image any optimizations we could do bringing it down a small percentage. Here is a bit more info about what we are trying to do: This is a Vanka-type MG preconditioner for a Stokes-like system on a structured grid. (Currently just uniform grids, but hopefully soon with AMR.) For the smoother, we are using damped Richardson + ASM with relatively small block subdomains --- e.g., all DOFs associated with 8x8 cells in 2D (~300 DOFs), or 8x8x8 in 3D (~2500 DOFs). Unfortunately, MG iteration counts really tank when using smaller subdomains. I can't remember whether we have quantified this carefully, but PCASM seems to bog down with smaller subdomains. A question is whether there are different implementation choices that could make the case of "lots of little subdomains" run faster. But before we get to that, Amneet and I should take a more careful look at overall solver performance. (We are also starting to play around with PCFIELDSPLIT for this problem too, although we don't have many ideas about how to handle the Schur complement.) Thanks, -- Boyce Barry -- Boyce Barry On Jan 14, 2016, at 1:26 AM, Bhalla, Amneet Pal S > wrote: On Jan 13, 2016, at 9:17 PM, Griffith, Boyce Eugene > wrote: I see one hot spot: Here is with opt build ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./main2d on a linux-opt named aorta with 1 processor, by amneetb Thu Jan 14 02:24:43 2016 Using Petsc Development GIT revision: v3.6.3-3098-ga3ecda2 GIT Date: 2016-01-13 21:30:26 -0600 Max Max/Min Avg Total Time (sec): 1.018e+00 1.00000 1.018e+00 Objects: 2.935e+03 1.00000 2.935e+03 Flops: 4.957e+08 1.00000 4.957e+08 4.957e+08 Flops/sec: 4.868e+08 1.00000 4.868e+08 4.868e+08 MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00 MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00 MPI Reductions: 0.000e+00 0.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 1.0183e+00 100.0% 4.9570e+08 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage VecDot 4 1.0 2.9564e-05 1.0 3.31e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1120 VecDotNorm2 272 1.0 1.4565e-03 1.0 4.25e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 2920 VecMDot 624 1.0 8.4300e-03 1.0 5.29e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 627 VecNorm 565 1.0 3.8033e-03 1.0 4.38e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1151 VecScale 86 1.0 5.5480e-04 1.0 1.55e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 279 VecCopy 28 1.0 5.2261e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 14567 1.0 1.2443e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecAXPY 903 1.0 4.2996e-03 1.0 6.66e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1550 VecAYPX 225 1.0 1.2550e-03 1.0 8.55e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 681 VecAXPBYCZ 42 1.0 1.7118e-04 1.0 3.45e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2014 VecWAXPY 70 1.0 1.9503e-04 1.0 2.98e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1528 VecMAXPY 641 1.0 1.1136e-02 1.0 5.29e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 475 VecSwap 135 1.0 4.5896e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyBegin 745 1.0 4.9477e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 745 1.0 9.2411e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 40831 1.0 3.4502e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 BuildTwoSidedF 738 1.0 2.6712e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatMult 513 1.0 9.1235e-02 1.0 7.75e+07 1.0 0.0e+00 0.0e+00 0.0e+00 9 16 0 0 0 9 16 0 0 0 849 MatSolve 13568 1.0 2.3605e-01 1.0 3.45e+08 1.0 0.0e+00 0.0e+00 0.0e+00 23 70 0 0 0 23 70 0 0 0 1460 MatLUFactorSym 84 1.0 3.7430e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 4 0 0 0 0 4 0 0 0 0 0 MatLUFactorNum 85 1.0 3.9623e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00 4 8 0 0 0 4 8 0 0 0 1058 MatILUFactorSym 1 1.0 3.3617e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatScale 4 1.0 2.5511e-04 1.0 2.51e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 984 MatAssemblyBegin 108 1.0 6.3658e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 108 1.0 2.9490e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetRow 33120 1.0 2.0157e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 MatGetRowIJ 85 1.0 1.2145e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetSubMatrice 4 1.0 8.4379e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 MatGetOrdering 85 1.0 7.7887e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 MatAXPY 4 1.0 4.9596e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 5 0 0 0 0 5 0 0 0 0 0 MatPtAP 4 1.0 4.4426e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 0.0e+00 4 1 0 0 0 4 1 0 0 0 112 MatPtAPSymbolic 4 1.0 2.7664e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 MatPtAPNumeric 4 1.0 1.6732e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 298 MatGetSymTrans 4 1.0 3.6621e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPGMRESOrthog 16 1.0 9.7778e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 KSPSetUp 90 1.0 5.7650e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 7.8831e-01 1.0 4.90e+08 1.0 0.0e+00 0.0e+00 0.0e+00 77 99 0 0 0 77 99 0 0 0 622 PCSetUp 90 1.0 9.9725e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00 10 8 0 0 0 10 8 0 0 0 420 PCSetUpOnBlocks 112 1.0 8.7547e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00 9 8 0 0 0 9 8 0 0 0 479 PCApply 16 1.0 7.1952e-01 1.0 4.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00 71 99 0 0 0 71 99 0 0 0 680 SNESSolve 1 1.0 7.9225e-01 1.0 4.90e+08 1.0 0.0e+00 0.0e+00 0.0e+00 78 99 0 0 0 78 99 0 0 0 619 SNESFunctionEval 2 1.0 3.2940e-03 1.0 4.68e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 14 SNESJacobianEval 1 1.0 4.7255e-04 1.0 4.26e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 9 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Vector 971 839 15573352 0. Vector Scatter 290 289 189584 0. Index Set 1171 823 951928 0. IS L to G Mapping 110 109 2156656 0. Application Order 6 6 99952 0. MatMFFD 1 1 776 0. Matrix 189 189 24083332 0. Matrix Null Space 4 4 2432 0. Krylov Solver 90 90 122720 0. DMKSP interface 1 1 648 0. Preconditioner 90 90 89872 0. SNES 1 1 1328 0. SNESLineSearch 1 1 984 0. DMSNES 1 1 664 0. Distributed Mesh 2 2 9168 0. Star Forest Bipartite Graph 4 4 3168 0. Discrete System 2 2 1712 0. Viewer 1 0 0 0. ======================================================================================================================== Average time to get PetscTime(): 9.53674e-07 #PETSc Option Table entries: -ib_ksp_converged_reason -ib_ksp_monitor_true_residual -ib_snes_type ksponly -log_summary -stokes_ib_pc_level_0_sub_pc_factor_nonzeros_along_diagonal -stokes_ib_pc_level_0_sub_pc_type ilu -stokes_ib_pc_level_ksp_richardson_self_scale -stokes_ib_pc_level_ksp_type richardson -stokes_ib_pc_level_pc_asm_local_type additive -stokes_ib_pc_level_sub_pc_factor_nonzeros_along_diagonal -stokes_ib_pc_level_sub_pc_type lu #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --CC=mpicc --CXX=mpicxx --FC=mpif90 --with-default-arch=0 --PETSC_ARCH=linux-opt --with-debugging=0 --with-c++-support=1 --with-hypre=1 --download-hypre=1 --with-hdf5=yes --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3 ----------------------------------------- Libraries compiled on Thu Jan 14 01:29:56 2016 on aorta Machine characteristics: Linux-3.13.0-63-generic-x86_64-with-Ubuntu-14.04-trusty Using PETSc directory: /not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc Using PETSc arch: linux-opt ----------------------------------------- Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -Qunused-arguments -O3 ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: mpif90 -fPIC -Wall -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -O3 ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/include -I/not_backed_up/softwares/MPICH/include ----------------------------------------- Using C linker: mpicc Using Fortran linker: mpif90 Using libraries: -Wl,-rpath,/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -L/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -lpetsc -Wl,-rpath,/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -L/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -lHYPRE -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -L/not_backed_up/softwares/MPICH/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpicxx -lstdc++ -llapack -lblas -lpthread -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lX11 -lm -lmpifort -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpicxx -lstdc++ -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -L/not_backed_up/softwares/MPICH/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -ldl -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -lmpi -lgcc_s -ldl ----------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From song.gao2 at mail.mcgill.ca Thu Jan 14 15:01:00 2016 From: song.gao2 at mail.mcgill.ca (Song Gao) Date: Thu, 14 Jan 2016 16:01:00 -0500 Subject: [petsc-users] Profile a matrix-free solver. Message-ID: Hello, I am profiling a finite element Navier-Stokes solver. It uses the Jacobian-free Newton Krylov method and a custom preconditoner LU-SGS (a matrix-free version of Symmetic Gauss-Seidel ). The log summary is attached. Four events are registered. compute_rhs is compute rhs (used by MatMult_MFFD). SURFINT and VOLINT are parts of compute_rhs. LU-SGS is the custom preconditioner. I didn't call PetscLogFlops so these flops are zeros. I'm wondering, is the percent time of the events reasonable in the table? I see 69% time is spent on matmult_mffd. Is it expected in matrix-free method? What might be a good starting point of profiling this solver? Thank you in advance. Song Gao -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: log_summary Type: application/octet-stream Size: 9058 bytes Desc: not available URL: From bsmith at mcs.anl.gov Thu Jan 14 15:13:40 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 14 Jan 2016 15:13:40 -0600 Subject: [petsc-users] HPCToolKit/HPCViewer on OS X In-Reply-To: <376E2C56-9E41-4508-BAE6-9920526820BC@email.unc.edu> References: <669480D3-5CB2-4A0E-B6BD-FF4B702BB5A7@ad.unc.edu> <6780B32F-E05B-4FDB-9671-1A0C963B3B85@ad.unc.edu> <2A7F8BF5-7E17-4774-A13E-E75394109F95@email.unc.edu> <3586468C-25DB-46B6-B9E6-04853D234025@mcs.anl.gov> <61E5EA75-AC79-4CB4-8114-645D978EEEC6@email.unc.edu> <376E2C56-9E41-4508-BAE6-9920526820BC@email.unc.edu> Message-ID: > On Jan 14, 2016, at 2:30 PM, Griffith, Boyce Eugene wrote: > >> >> On Jan 14, 2016, at 3:09 PM, Barry Smith wrote: >> >>> >>> On Jan 14, 2016, at 2:01 PM, Griffith, Boyce Eugene wrote: >>> >>>> >>>> On Jan 14, 2016, at 2:24 PM, Barry Smith wrote: >>>> >>>> >>>> Matt is right, there is a lot of "missing" time from the output. Please send the output from -ksp_view so we can see exactly what solver is being used. >>>> >>>> From the output we have: >>>> >>>> Nonlinear solver 78 % of the time (so your "setup code" outside of PETSC is taking about 22% of the time) >>>> Linear solver 77 % of the time (this is reasonable pretty much the entire cost of the nonlinear solve is the linear solve) >>>> Time to set up the preconditioner is 19% (10 + 9) >>>> Time of iteration in KSP 35 % (this is the sum of the vector operations and MatMult() and MatSolve()) >>>> >>>> So 77 - (19 + 35) = 23 % unexplained time inside the linear solver (custom preconditioner???) >>>> >>>> Also getting the results with Instruments or HPCToolkit would be useful (so long as we don't need to install HPCTool ourselves to see the results). >>> >>> Thanks, Barry (& Matt & Dave) --- This is a solver that is mixing some matrix-based stuff implemented using PETSc along with some matrix-free stuff that is built on top of SAMRAI. Amneet and I should take a look at performance off-list first. >> >> Just put an PetscLogEvent() in (or several) to track that part. Plus put an event or two outside the SNESSolve to track the outside PETSc setup time. >> >> The PETSc time looks reasonable at most I can only image any optimizations we could do bringing it down a small percentage. > > Here is a bit more info about what we are trying to do: > > This is a Vanka-type MG preconditioner for a Stokes-like system on a structured grid. (Currently just uniform grids, but hopefully soon with AMR.) For the smoother, we are using damped Richardson + ASM with relatively small block subdomains --- e.g., all DOFs associated with 8x8 cells in 2D (~300 DOFs), or 8x8x8 in 3D (~2500 DOFs). Unfortunately, MG iteration counts really tank when using smaller subdomains. > > I can't remember whether we have quantified this carefully, but PCASM seems to bog down with smaller subdomains. A question is whether there are different implementation choices that could make the case of "lots of little subdomains" run faster. This is possibly somewhere where WE (PETSc) could perhaps due a better job. When originally written we definitely were biased to a small number of large subdomains so things like getting the sub matrices (and even iterating over them) could possibly be optimized when there are many. However, as you note, in the current runs this is definitely not the issue. Barry > But before we get to that, Amneet and I should take a more careful look at overall solver performance. > > (We are also starting to play around with PCFIELDSPLIT for this problem too, although we don't have many ideas about how to handle the Schur complement.) > > Thanks, > > -- Boyce > >> >> >> Barry >> >>> >>> -- Boyce >>> >>>> >>>> >>>> Barry >>>> >>>>> On Jan 14, 2016, at 1:26 AM, Bhalla, Amneet Pal S wrote: >>>>> >>>>> >>>>> >>>>>> On Jan 13, 2016, at 9:17 PM, Griffith, Boyce Eugene wrote: >>>>>> >>>>>> I see one hot spot: >>>>> >>>>> >>>>> Here is with opt build >>>>> >>>>> ************************************************************************************************************************ >>>>> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** >>>>> ************************************************************************************************************************ >>>>> >>>>> ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- >>>>> >>>>> ./main2d on a linux-opt named aorta with 1 processor, by amneetb Thu Jan 14 02:24:43 2016 >>>>> Using Petsc Development GIT revision: v3.6.3-3098-ga3ecda2 GIT Date: 2016-01-13 21:30:26 -0600 >>>>> >>>>> Max Max/Min Avg Total >>>>> Time (sec): 1.018e+00 1.00000 1.018e+00 >>>>> Objects: 2.935e+03 1.00000 2.935e+03 >>>>> Flops: 4.957e+08 1.00000 4.957e+08 4.957e+08 >>>>> Flops/sec: 4.868e+08 1.00000 4.868e+08 4.868e+08 >>>>> MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00 >>>>> MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00 >>>>> MPI Reductions: 0.000e+00 0.00000 >>>>> >>>>> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) >>>>> e.g., VecAXPY() for real vectors of length N --> 2N flops >>>>> and VecAXPY() for complex vectors of length N --> 8N flops >>>>> >>>>> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- >>>>> Avg %Total Avg %Total counts %Total Avg %Total counts %Total >>>>> 0: Main Stage: 1.0183e+00 100.0% 4.9570e+08 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% >>>>> >>>>> ------------------------------------------------------------------------------------------------------------------------ >>>>> See the 'Profiling' chapter of the users' manual for details on interpreting output. >>>>> Phase summary info: >>>>> Count: number of times phase was executed >>>>> Time and Flops: Max - maximum over all processors >>>>> Ratio - ratio of maximum to minimum over all processors >>>>> Mess: number of messages sent >>>>> Avg. len: average message length (bytes) >>>>> Reduct: number of global reductions >>>>> Global: entire computation >>>>> Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). >>>>> %T - percent time in this phase %F - percent flops in this phase >>>>> %M - percent messages in this phase %L - percent message lengths in this phase >>>>> %R - percent reductions in this phase >>>>> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) >>>>> ------------------------------------------------------------------------------------------------------------------------ >>>>> Event Count Time (sec) Flops --- Global --- --- Stage --- Total >>>>> Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >>>>> ------------------------------------------------------------------------------------------------------------------------ >>>>> >>>>> --- Event Stage 0: Main Stage >>>>> >>>>> VecDot 4 1.0 2.9564e-05 1.0 3.31e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1120 >>>>> VecDotNorm2 272 1.0 1.4565e-03 1.0 4.25e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 2920 >>>>> VecMDot 624 1.0 8.4300e-03 1.0 5.29e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 627 >>>>> VecNorm 565 1.0 3.8033e-03 1.0 4.38e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1151 >>>>> VecScale 86 1.0 5.5480e-04 1.0 1.55e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 279 >>>>> VecCopy 28 1.0 5.2261e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>> VecSet 14567 1.0 1.2443e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>>>> VecAXPY 903 1.0 4.2996e-03 1.0 6.66e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1550 >>>>> VecAYPX 225 1.0 1.2550e-03 1.0 8.55e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 681 >>>>> VecAXPBYCZ 42 1.0 1.7118e-04 1.0 3.45e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2014 >>>>> VecWAXPY 70 1.0 1.9503e-04 1.0 2.98e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1528 >>>>> VecMAXPY 641 1.0 1.1136e-02 1.0 5.29e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 475 >>>>> VecSwap 135 1.0 4.5896e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>> VecAssemblyBegin 745 1.0 4.9477e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>> VecAssemblyEnd 745 1.0 9.2411e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>> VecScatterBegin 40831 1.0 3.4502e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 >>>>> BuildTwoSidedF 738 1.0 2.6712e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>> MatMult 513 1.0 9.1235e-02 1.0 7.75e+07 1.0 0.0e+00 0.0e+00 0.0e+00 9 16 0 0 0 9 16 0 0 0 849 >>>>> MatSolve 13568 1.0 2.3605e-01 1.0 3.45e+08 1.0 0.0e+00 0.0e+00 0.0e+00 23 70 0 0 0 23 70 0 0 0 1460 >>>>> MatLUFactorSym 84 1.0 3.7430e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 4 0 0 0 0 4 0 0 0 0 0 >>>>> MatLUFactorNum 85 1.0 3.9623e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00 4 8 0 0 0 4 8 0 0 0 1058 >>>>> MatILUFactorSym 1 1.0 3.3617e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>> MatScale 4 1.0 2.5511e-04 1.0 2.51e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 984 >>>>> MatAssemblyBegin 108 1.0 6.3658e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>> MatAssemblyEnd 108 1.0 2.9490e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>> MatGetRow 33120 1.0 2.0157e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >>>>> MatGetRowIJ 85 1.0 1.2145e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>> MatGetSubMatrice 4 1.0 8.4379e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>>>> MatGetOrdering 85 1.0 7.7887e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>>>> MatAXPY 4 1.0 4.9596e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 5 0 0 0 0 5 0 0 0 0 0 >>>>> MatPtAP 4 1.0 4.4426e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 0.0e+00 4 1 0 0 0 4 1 0 0 0 112 >>>>> MatPtAPSymbolic 4 1.0 2.7664e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 >>>>> MatPtAPNumeric 4 1.0 1.6732e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 298 >>>>> MatGetSymTrans 4 1.0 3.6621e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>> KSPGMRESOrthog 16 1.0 9.7778e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>>>> KSPSetUp 90 1.0 5.7650e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>> KSPSolve 1 1.0 7.8831e-01 1.0 4.90e+08 1.0 0.0e+00 0.0e+00 0.0e+00 77 99 0 0 0 77 99 0 0 0 622 >>>>> PCSetUp 90 1.0 9.9725e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00 10 8 0 0 0 10 8 0 0 0 420 >>>>> PCSetUpOnBlocks 112 1.0 8.7547e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00 9 8 0 0 0 9 8 0 0 0 479 >>>>> PCApply 16 1.0 7.1952e-01 1.0 4.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00 71 99 0 0 0 71 99 0 0 0 680 >>>>> SNESSolve 1 1.0 7.9225e-01 1.0 4.90e+08 1.0 0.0e+00 0.0e+00 0.0e+00 78 99 0 0 0 78 99 0 0 0 619 >>>>> SNESFunctionEval 2 1.0 3.2940e-03 1.0 4.68e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 14 >>>>> SNESJacobianEval 1 1.0 4.7255e-04 1.0 4.26e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 9 >>>>> ------------------------------------------------------------------------------------------------------------------------ >>>>> >>>>> Memory usage is given in bytes: >>>>> >>>>> Object Type Creations Destructions Memory Descendants' Mem. >>>>> Reports information only for process 0. >>>>> >>>>> --- Event Stage 0: Main Stage >>>>> >>>>> Vector 971 839 15573352 0. >>>>> Vector Scatter 290 289 189584 0. >>>>> Index Set 1171 823 951928 0. >>>>> IS L to G Mapping 110 109 2156656 0. >>>>> Application Order 6 6 99952 0. >>>>> MatMFFD 1 1 776 0. >>>>> Matrix 189 189 24083332 0. >>>>> Matrix Null Space 4 4 2432 0. >>>>> Krylov Solver 90 90 122720 0. >>>>> DMKSP interface 1 1 648 0. >>>>> Preconditioner 90 90 89872 0. >>>>> SNES 1 1 1328 0. >>>>> SNESLineSearch 1 1 984 0. >>>>> DMSNES 1 1 664 0. >>>>> Distributed Mesh 2 2 9168 0. >>>>> Star Forest Bipartite Graph 4 4 3168 0. >>>>> Discrete System 2 2 1712 0. >>>>> Viewer 1 0 0 0. >>>>> ======================================================================================================================== >>>>> Average time to get PetscTime(): 9.53674e-07 >>>>> #PETSc Option Table entries: >>>>> -ib_ksp_converged_reason >>>>> -ib_ksp_monitor_true_residual >>>>> -ib_snes_type ksponly >>>>> -log_summary >>>>> -stokes_ib_pc_level_0_sub_pc_factor_nonzeros_along_diagonal >>>>> -stokes_ib_pc_level_0_sub_pc_type ilu >>>>> -stokes_ib_pc_level_ksp_richardson_self_scale >>>>> -stokes_ib_pc_level_ksp_type richardson >>>>> -stokes_ib_pc_level_pc_asm_local_type additive >>>>> -stokes_ib_pc_level_sub_pc_factor_nonzeros_along_diagonal >>>>> -stokes_ib_pc_level_sub_pc_type lu >>>>> #End of PETSc Option Table entries >>>>> Compiled without FORTRAN kernels >>>>> Compiled with full precision matrices (default) >>>>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 >>>>> Configure options: --CC=mpicc --CXX=mpicxx --FC=mpif90 --with-default-arch=0 --PETSC_ARCH=linux-opt --with-debugging=0 --with-c++-support=1 --with-hypre=1 --download-hypre=1 --with-hdf5=yes --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3 >>>>> ----------------------------------------- >>>>> Libraries compiled on Thu Jan 14 01:29:56 2016 on aorta >>>>> Machine characteristics: Linux-3.13.0-63-generic-x86_64-with-Ubuntu-14.04-trusty >>>>> Using PETSc directory: /not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc >>>>> Using PETSc arch: linux-opt >>>>> ----------------------------------------- >>>>> >>>>> Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -Qunused-arguments -O3 ${COPTFLAGS} ${CFLAGS} >>>>> Using Fortran compiler: mpif90 -fPIC -Wall -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -O3 ${FOPTFLAGS} ${FFLAGS} >>>>> ----------------------------------------- >>>>> >>>>> Using include paths: -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/include -I/not_backed_up/softwares/MPICH/include >>>>> ----------------------------------------- >>>>> >>>>> Using C linker: mpicc >>>>> Using Fortran linker: mpif90 >>>>> Using libraries: -Wl,-rpath,/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -L/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -lpetsc -Wl,-rpath,/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -L/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -lHYPRE -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -L/not_backed_up/softwares/MPICH/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpicxx -lstdc++ -llapack -lblas -lpthread -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lX11 -lm -lmpifort -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpicxx -lstdc++ -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -L/not_backed_up/softwares/MPICH/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -ldl -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -lmpi -lgcc_s -ldl >>>>> ----------------------------------------- From bsmith at mcs.anl.gov Thu Jan 14 15:24:08 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 14 Jan 2016 15:24:08 -0600 Subject: [petsc-users] Profile a matrix-free solver. In-Reply-To: References: Message-ID: <84EECF20-BB7A-40BD-820B-1FC44E847034@mcs.anl.gov> So KSPSolve is 96 % and MatMult is 70 % + PCApply 24 % = 94 % so this makes sense; the solver time is essentially the multiply time plus the PCApply time. compute_rhs 1823 1.0 4.2119e+02 1.0 0.00e+00 0.0 4.4e+04 5.4e+04 1.1e+04 71 0100100 39 71 0100100 39 0 LU-SGS 1647 1.0 1.3590e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 23 0 0 0 0 23 0 0 0 0 0 SURFINT 1823 1.0 1.0647e+02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 17 0 0 0 0 17 0 0 0 0 0 VOLINT 1823 1.0 2.2373e+02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 35 0 0 0 0 35 0 0 0 0 0 Depending on the "quality" of the preconditioner (if it is really good) one expects the preconditioner time to be larger than the MatMult(). Only for simple preconditioners (like Jacobi) does one see it being much less than the MatMult(). For matrix based solvers the amount of work in SGS is as large as the amount of work in the MatMult() if not more, so I would expect the time of the preconditioner to be higher than the time of the multiply. So based on knowing almost nothing I think the MatMult_ is taking more time then it should unless you are ignoring (skipping) a lot of the terms in your matrix-free SGS; then it is probably reasonable. Barry > On Jan 14, 2016, at 3:01 PM, Song Gao wrote: > > Hello, > > I am profiling a finite element Navier-Stokes solver. It uses the Jacobian-free Newton Krylov method and a custom preconditoner LU-SGS (a matrix-free version of Symmetic Gauss-Seidel ). The log summary is attached. Four events are registered. compute_rhs is compute rhs (used by MatMult_MFFD). SURFINT and VOLINT are parts of compute_rhs. LU-SGS is the custom preconditioner. I didn't call PetscLogFlops so these flops are zeros. > > I'm wondering, is the percent time of the events reasonable in the table? I see 69% time is spent on matmult_mffd. Is it expected in matrix-free method? What might be a good starting point of profiling this solver? Thank you in advance. > > > Song Gao > From knepley at gmail.com Thu Jan 14 15:25:00 2016 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 14 Jan 2016 15:25:00 -0600 Subject: [petsc-users] Profile a matrix-free solver. In-Reply-To: References: Message-ID: On Thu, Jan 14, 2016 at 3:01 PM, Song Gao wrote: > Hello, > > I am profiling a finite element Navier-Stokes solver. It uses the > Jacobian-free Newton Krylov method and a custom preconditoner LU-SGS (a > matrix-free version of Symmetic Gauss-Seidel ). The log summary is > attached. Four events are registered. compute_rhs is compute rhs (used by > MatMult_MFFD). SURFINT and VOLINT are parts of compute_rhs. LU-SGS is the > custom preconditioner. I didn't call PetscLogFlops so these flops are > zeros. > > I'm wondering, is the percent time of the events reasonable in the table? > I see 69% time is spent on matmult_mffd. Is it expected in matrix-free > method? What might be a good starting point of profiling this solver? Thank > you in advance. > The way I read this, you are taking about 23 iterates/solve, and most of your work is residual computation which should be highly parallelizable/vectorizable. This seems great to me. Matt > > > Song Gao > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From amneetb at live.unc.edu Thu Jan 14 18:31:58 2016 From: amneetb at live.unc.edu (Bhalla, Amneet Pal S) Date: Fri, 15 Jan 2016 00:31:58 +0000 Subject: [petsc-users] HPCToolKit/HPCViewer on OS X In-Reply-To: <3586468C-25DB-46B6-B9E6-04853D234025@mcs.anl.gov> References: <669480D3-5CB2-4A0E-B6BD-FF4B702BB5A7@ad.unc.edu> <6780B32F-E05B-4FDB-9671-1A0C963B3B85@ad.unc.edu> <2A7F8BF5-7E17-4774-A13E-E75394109F95@email.unc.edu> <3586468C-25DB-46B6-B9E6-04853D234025@mcs.anl.gov> Message-ID: <8B3D6FE1-2EE7-4B5A-AA88-E1C18CBD61C4@ad.unc.edu> On Jan 14, 2016, at 11:24 AM, Barry Smith > wrote: Also getting the results with Instruments or HPCToolkit would be useful (so long as we don't need to install HPCTool ourselves to see the results). @Barry ? Attached is the -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: main2dProfiling.numbers Type: application/octet-stream Size: 207814 bytes Desc: main2dProfiling.numbers URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From amneetb at live.unc.edu Thu Jan 14 18:36:14 2016 From: amneetb at live.unc.edu (Bhalla, Amneet Pal S) Date: Fri, 15 Jan 2016 00:36:14 +0000 Subject: [petsc-users] HPCToolKit/HPCViewer on OS X In-Reply-To: <8B3D6FE1-2EE7-4B5A-AA88-E1C18CBD61C4@ad.unc.edu> References: <669480D3-5CB2-4A0E-B6BD-FF4B702BB5A7@ad.unc.edu> <6780B32F-E05B-4FDB-9671-1A0C963B3B85@ad.unc.edu> <2A7F8BF5-7E17-4774-A13E-E75394109F95@email.unc.edu> <3586468C-25DB-46B6-B9E6-04853D234025@mcs.anl.gov> <8B3D6FE1-2EE7-4B5A-AA88-E1C18CBD61C4@ad.unc.edu> Message-ID: And the PETSc log summary for comparison ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./main2d on a linux-opt named aorta with 1 processor, by amneetb Thu Jan 14 19:34:38 2016 Using Petsc Development GIT revision: v3.6.3-3098-ga3ecda2 GIT Date: 2016-01-13 21:30:26 -0600 Max Max/Min Avg Total Time (sec): 6.223e-01 1.00000 6.223e-01 Objects: 2.618e+03 1.00000 2.618e+03 Flops: 1.948e+08 1.00000 1.948e+08 1.948e+08 Flops/sec: 3.129e+08 1.00000 3.129e+08 3.129e+08 MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00 MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00 MPI Reductions: 0.000e+00 0.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 6.2232e-01 100.0% 1.9476e+08 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage VecDot 4 1.0 2.9087e-05 1.0 3.31e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1139 VecDotNorm2 180 1.0 1.0626e-03 1.0 3.17e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 2983 VecMDot 288 1.0 3.8970e-03 1.0 2.38e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 611 VecNorm 113 1.0 9.6560e-04 1.0 1.36e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 140 VecScale 66 1.0 4.0913e-04 1.0 1.21e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 295 VecCopy 24 1.0 3.8338e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 12855 1.0 1.0173e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 VecAXPY 607 1.0 2.9583e-03 1.0 4.97e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 1680 VecAYPX 169 1.0 8.6975e-04 1.0 6.41e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 737 VecAXPBYCZ 34 1.0 1.1325e-04 1.0 2.77e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2443 VecWAXPY 54 1.0 1.4043e-04 1.0 2.30e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1637 VecMAXPY 301 1.0 5.6567e-03 1.0 2.38e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 421 VecSwap 103 1.0 3.2711e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyBegin 561 1.0 3.5629e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecAssemblyEnd 561 1.0 6.4468e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 18427 1.0 1.5277e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 BuildTwoSidedF 554 1.0 1.9150e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatMult 361 1.0 6.2765e-02 1.0 5.80e+07 1.0 0.0e+00 0.0e+00 0.0e+00 10 30 0 0 0 10 30 0 0 0 924 MatSolve 6108 1.0 6.3529e-02 1.0 9.53e+07 1.0 0.0e+00 0.0e+00 0.0e+00 10 49 0 0 0 10 49 0 0 0 1500 MatLUFactorSym 85 1.0 2.0353e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 MatLUFactorNum 85 1.0 2.2882e-02 1.0 2.28e+07 1.0 0.0e+00 0.0e+00 0.0e+00 4 12 0 0 0 4 12 0 0 0 995 MatScale 4 1.0 2.2912e-04 1.0 2.51e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1096 MatAssemblyBegin 108 1.0 6.7949e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 108 1.0 2.9209e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetRow 33120 1.0 2.0407e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 MatGetRowIJ 85 1.0 1.2467e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetSubMatrice 4 1.0 8.2304e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 MatGetOrdering 85 1.0 7.8776e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 MatAXPY 4 1.0 4.9517e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 8 0 0 0 0 8 0 0 0 0 0 MatPtAP 4 1.0 4.4372e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 0.0e+00 7 3 0 0 0 7 3 0 0 0 112 MatPtAPSymbolic 4 1.0 2.7586e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 4 0 0 0 0 4 0 0 0 0 0 MatPtAPNumeric 4 1.0 1.6756e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 0.0e+00 3 3 0 0 0 3 3 0 0 0 298 MatGetSymTrans 4 1.0 3.6120e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPGMRESOrthog 12 1.0 4.9458e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 KSPSetUp 90 1.0 5.6815e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 3.8819e-01 1.0 1.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00 62 97 0 0 0 62 97 0 0 0 488 PCSetUp 90 1.0 6.4402e-02 1.0 2.28e+07 1.0 0.0e+00 0.0e+00 0.0e+00 10 12 0 0 0 10 12 0 0 0 354 PCSetUpOnBlocks 84 1.0 5.2499e-02 1.0 2.28e+07 1.0 0.0e+00 0.0e+00 0.0e+00 8 12 0 0 0 8 12 0 0 0 434 PCApply 12 1.0 3.4369e-01 1.0 1.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00 55 97 0 0 0 55 97 0 0 0 549 SNESSolve 1 1.0 3.9208e-01 1.0 1.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00 63 97 0 0 0 63 97 0 0 0 483 SNESFunctionEval 2 1.0 3.2527e-03 1.0 4.68e+04 1.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 14 SNESJacobianEval 1 1.0 4.6706e-04 1.0 4.26e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 9 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Vector 739 639 9087400 0. Vector Scatter 290 289 189584 0. Index Set 1086 738 885136 0. IS L to G Mapping 110 109 2156656 0. Application Order 6 6 99952 0. MatMFFD 1 1 776 0. Matrix 189 189 19106368 0. Matrix Null Space 4 4 2432 0. Krylov Solver 90 90 122720 0. DMKSP interface 1 1 648 0. Preconditioner 90 90 89864 0. SNES 1 1 1328 0. SNESLineSearch 1 1 984 0. DMSNES 1 1 664 0. Distributed Mesh 2 2 9168 0. Star Forest Bipartite Graph 4 4 3168 0. Discrete System 2 2 1712 0. Viewer 1 0 0 0. ======================================================================================================================== Average time to get PetscTime(): 7.15256e-07 #PETSc Option Table entries: -ib_ksp_converged_reason -ib_ksp_monitor_true_residual -ib_snes_type ksponly -log_summary -stokes_ib_pc_level_ksp_richardson_self_scale -stokes_ib_pc_level_ksp_type richardson -stokes_ib_pc_level_pc_asm_local_type additive -stokes_ib_pc_level_pc_asm_type interpolate -stokes_ib_pc_level_sub_pc_factor_shift_type nonzero -stokes_ib_pc_level_sub_pc_type lu #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --CC=mpicc --CXX=mpicxx --FC=mpif90 --with-default-arch=0 --PETSC_ARCH=linux-opt --with-debugging=0 --with-c++-support=1 --with-hypre=1 --download-hypre=1 --with-hdf5=yes --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3 ----------------------------------------- Libraries compiled on Thu Jan 14 01:29:56 2016 on aorta Machine characteristics: Linux-3.13.0-63-generic-x86_64-with-Ubuntu-14.04-trusty Using PETSc directory: /not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc Using PETSc arch: linux-opt ----------------------------------------- Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -Qunused-arguments -O3 ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: mpif90 -fPIC -Wall -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -O3 ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/include -I/not_backed_up/softwares/MPICH/include ----------------------------------------- Using C linker: mpicc Using Fortran linker: mpif90 Using libraries: -Wl,-rpath,/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -L/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -lpetsc -Wl,-rpath,/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -L/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -lHYPRE -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -L/not_backed_up/softwares/MPICH/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpicxx -lstdc++ -llapack -lblas -lpthread -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lX11 -lm -lmpifort -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpicxx -lstdc++ -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -L/not_backed_up/softwares/MPICH/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -ldl -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -lmpi -lgcc_s -ldl ----------------------------------------- On Jan 14, 2016, at 4:31 PM, Bhalla, Amneet Pal Singh > wrote: On Jan 14, 2016, at 11:24 AM, Barry Smith > wrote: Also getting the results with Instruments or HPCToolkit would be useful (so long as we don't need to install HPCTool ourselves to see the results). @Barry ? Attached is the output from HPCToolkit profiler for all the operations done in solving 1 timestep Stokes+IB simulation. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu Jan 14 21:16:57 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 14 Jan 2016 21:16:57 -0600 Subject: [petsc-users] HPCToolKit/HPCViewer on OS X In-Reply-To: <8B3D6FE1-2EE7-4B5A-AA88-E1C18CBD61C4@ad.unc.edu> References: <669480D3-5CB2-4A0E-B6BD-FF4B702BB5A7@ad.unc.edu> <6780B32F-E05B-4FDB-9671-1A0C963B3B85@ad.unc.edu> <2A7F8BF5-7E17-4774-A13E-E75394109F95@email.unc.edu> <3586468C-25DB-46B6-B9E6-04853D234025@mcs.anl.gov> <8B3D6FE1-2EE7-4B5A-AA88-E1C18CBD61C4@ad.unc.edu> Message-ID: Ok, thanks. From the PETSc side of things this doesn't tell us anything new but does show the "missing" time in the solver (attached). > On Jan 14, 2016, at 6:31 PM, Bhalla, Amneet Pal S wrote: > > > >> On Jan 14, 2016, at 11:24 AM, Barry Smith wrote: >> >> Also getting the results with Instruments or HPCToolkit would be useful (so long as we don't need to install HPCTool ourselves to see the results). > > @Barry ? Attached is the > > output from HPCToolkit profiler for all the operations done in solving 1 timestep Stokes+IB simulation. > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Untitled.png Type: image/png Size: 11090 bytes Desc: not available URL: From praveenpetsc at gmail.com Fri Jan 15 00:18:29 2016 From: praveenpetsc at gmail.com (praveen kumar) Date: Fri, 15 Jan 2016 11:48:29 +0530 Subject: [petsc-users] undefined reference error in make test In-Reply-To: References: Message-ID: I?m struggling to figure out *undefined reference to `vectorset_*. I?ve included both petscvec.h and petscvec.h90 but the error appears again. I?m attaching makefile and code. any help will be appreciated. On Thu, Jan 14, 2016 at 9:39 PM, Matthew Knepley wrote: > On Thu, Jan 14, 2016 at 12:03 AM, praveen kumar > wrote: > >> I?ve written a fortan code (F90) for domain decomposition.* I've >> specified **the paths of include files and libraries, but the >> compiler/linker still * >> >> >> *complained about undefined references.undefined reference to >> `vectorset_'* >> > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecSet.html > > >> >> *undefined reference to `dmdagetlocalinfo_'* >> > > This function is not supported in Fortran since it takes a structure. > > Thanks, > > Matt > > >> I?m attaching makefile and code. any help will be appreciated. >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: makefile Type: application/octet-stream Size: 477 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test.F90 Type: text/x-fortran Size: 3139 bytes Desc: not available URL: From balay at mcs.anl.gov Fri Jan 15 00:35:09 2016 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 15 Jan 2016 00:35:09 -0600 Subject: [petsc-users] undefined reference error in make test In-Reply-To: References: Message-ID: Matt already responded to this. You should be using VecSet() - not VectorSet(). I'm not sure where you got VectorSet() from... > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecSet.html Satish On Fri, 15 Jan 2016, praveen kumar wrote: > I?m struggling to figure out *undefined reference to `vectorset_*. I?ve > included both petscvec.h and petscvec.h90 but the error appears again. > I?m attaching makefile and code. any help will be appreciated. > > On Thu, Jan 14, 2016 at 9:39 PM, Matthew Knepley wrote: > > > On Thu, Jan 14, 2016 at 12:03 AM, praveen kumar > > wrote: > > > >> I?ve written a fortan code (F90) for domain decomposition.* I've > >> specified **the paths of include files and libraries, but the > >> compiler/linker still * > >> > >> > >> *complained about undefined references.undefined reference to > >> `vectorset_'* > >> > > > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecSet.html > > > > > >> > >> *undefined reference to `dmdagetlocalinfo_'* > >> > > > > This function is not supported in Fortran since it takes a structure. > > > > Thanks, > > > > Matt > > > > > >> I?m attaching makefile and code. any help will be appreciated. > >> > > > > > > > > -- > > What most experimenters take for granted before they begin their > > experiments is infinitely more interesting than any results to which their > > experiments lead. > > -- Norbert Wiener > > > From praveenpetsc at gmail.com Fri Jan 15 00:41:42 2016 From: praveenpetsc at gmail.com (praveen kumar) Date: Fri, 15 Jan 2016 12:11:42 +0530 Subject: [petsc-users] undefined reference error in make test In-Reply-To: References: Message-ID: Thanks a lot Satish. On Fri, Jan 15, 2016 at 12:05 PM, Satish Balay wrote: > Matt already responded to this. You should be using VecSet() - not > VectorSet(). > I'm not sure where you got VectorSet() from... > > > > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecSet.html > > Satish > > On Fri, 15 Jan 2016, praveen kumar wrote: > > > I?m struggling to figure out *undefined reference to `vectorset_*. I?ve > > included both petscvec.h and petscvec.h90 but the error appears again. > > I?m attaching makefile and code. any help will be appreciated. > > > > On Thu, Jan 14, 2016 at 9:39 PM, Matthew Knepley > wrote: > > > > > On Thu, Jan 14, 2016 at 12:03 AM, praveen kumar < > praveenpetsc at gmail.com> > > > wrote: > > > > > >> I?ve written a fortan code (F90) for domain decomposition.* I've > > >> specified **the paths of include files and libraries, but the > > >> compiler/linker still * > > >> > > >> > > >> *complained about undefined references.undefined reference to > > >> `vectorset_'* > > >> > > > > > > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecSet.html > > > > > > > > >> > > >> *undefined reference to `dmdagetlocalinfo_'* > > >> > > > > > > This function is not supported in Fortran since it takes a structure. > > > > > > Thanks, > > > > > > Matt > > > > > > > > >> I?m attaching makefile and code. any help will be appreciated. > > >> > > > > > > > > > > > > -- > > > What most experimenters take for granted before they begin their > > > experiments is infinitely more interesting than any results to which > their > > > experiments lead. > > > -- Norbert Wiener > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From song.gao2 at mail.mcgill.ca Fri Jan 15 10:52:29 2016 From: song.gao2 at mail.mcgill.ca (Song Gao) Date: Fri, 15 Jan 2016 11:52:29 -0500 Subject: [petsc-users] Profile a matrix-free solver. In-Reply-To: <84EECF20-BB7A-40BD-820B-1FC44E847034@mcs.anl.gov> References: <84EECF20-BB7A-40BD-820B-1FC44E847034@mcs.anl.gov> Message-ID: Hello, Barry, Thanks for your prompt reply. I ran the matrix-based solver with matrix-based SGS precondioner. I see your point. The profiling table is below and attached. So Matmult takes 4% time and PCApply takes 43% time. MatMult 636 1.0 9.0361e+00 1.0 9.21e+09 1.0 7.6e+03 1.1e+04 0.0e+00 4 85 52 17 0 4 85 52 17 0 3980 PCApply 636 1.0 8.7006e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.9e+03 43 0 0 0 24 43 0 0 0 24 0 The way I see it, the matrix-free solver spends most of the time (70%) on matmult or equivalently rhs evaluation. Every KSP iteration, one rhs evaluation is performed. This is much more costly than a matrix vector product in a matrix-based solver. Perhaps this is expected in matrix-free solver. I will start look at the rhs evaluation since it takes the most time. Thanks. Song Gao 2016-01-14 16:24 GMT-05:00 Barry Smith : > > So > > KSPSolve is 96 % and MatMult is 70 % + PCApply 24 % = 94 % so this > makes sense; the solver time is essentially the > multiply time plus the PCApply time. > > compute_rhs 1823 1.0 4.2119e+02 1.0 0.00e+00 0.0 4.4e+04 5.4e+04 > 1.1e+04 71 0100100 39 71 0100100 39 0 > LU-SGS 1647 1.0 1.3590e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 23 0 0 0 0 23 0 0 0 0 0 > SURFINT 1823 1.0 1.0647e+02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 17 0 0 0 0 17 0 0 0 0 0 > VOLINT 1823 1.0 2.2373e+02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 35 0 0 0 0 35 0 0 0 0 0 > > Depending on the "quality" of the preconditioner (if it is really good) > one expects the preconditioner time to be larger than the MatMult(). Only > for simple preconditioners (like Jacobi) does one see it being much less > than the MatMult(). For matrix based solvers the amount of work in SGS is > as large as the amount of work in the MatMult() if not more, so I would > expect the time of the preconditioner to be higher than the time of the > multiply. > > So based on knowing almost nothing I think the MatMult_ is taking more > time then it should unless you are ignoring (skipping) a lot of the terms > in your matrix-free SGS; then it is probably reasonable. > > Barry > > > > > On Jan 14, 2016, at 3:01 PM, Song Gao wrote: > > > > Hello, > > > > I am profiling a finite element Navier-Stokes solver. It uses the > Jacobian-free Newton Krylov method and a custom preconditoner LU-SGS (a > matrix-free version of Symmetic Gauss-Seidel ). The log summary is > attached. Four events are registered. compute_rhs is compute rhs (used by > MatMult_MFFD). SURFINT and VOLINT are parts of compute_rhs. LU-SGS is the > custom preconditioner. I didn't call PetscLogFlops so these flops are zeros. > > > > I'm wondering, is the percent time of the events reasonable in the > table? I see 69% time is spent on matmult_mffd. Is it expected in > matrix-free method? What might be a good starting point of profiling this > solver? Thank you in advance. > > > > > > Song Gao > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: log_summary_matrix_based_version Type: application/octet-stream Size: 9473 bytes Desc: not available URL: From bsmith at mcs.anl.gov Fri Jan 15 13:42:34 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 15 Jan 2016 13:42:34 -0600 Subject: [petsc-users] Profile a matrix-free solver. In-Reply-To: References: <84EECF20-BB7A-40BD-820B-1FC44E847034@mcs.anl.gov> Message-ID: <97BFBC85-B01B-4A64-B162-05F340E9BCD6@mcs.anl.gov> > On Jan 15, 2016, at 10:52 AM, Song Gao wrote: > > Hello, Barry, > > Thanks for your prompt reply. I ran the matrix-based solver with matrix-based SGS precondioner. I see your point. The profiling table is below and attached. > > So Matmult takes 4% time and PCApply takes 43% time. > > MatMult 636 1.0 9.0361e+00 1.0 9.21e+09 1.0 7.6e+03 1.1e+04 0.0e+00 4 85 52 17 0 4 85 52 17 0 3980 > PCApply 636 1.0 8.7006e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.9e+03 43 0 0 0 24 43 0 0 0 24 0 > > > The way I see it, the matrix-free solver spends most of the time (70%) on matmult or equivalently rhs evaluation. Every KSP iteration, one rhs evaluation is performed. This is much more costly than a matrix vector product in a matrix-based solver. Sure, but if the matrix-free SGS mimics all the work of the right hand side function evaluation (which is has to if it truly is a a SGS sweep and not some approximation (where you drop certain terms in the right hand side function when you compute the SGS)) then the matrix-free SGS should be at least as expensive as the right hand side evaluation. Barry My guess is your SGS drops some terms so is only and approximation, but is still good enough as a preconditioner. > Perhaps this is expected in matrix-free solver. > > I will start look at the rhs evaluation since it takes the most time. > > Thanks. > Song Gao > > > > 2016-01-14 16:24 GMT-05:00 Barry Smith : > > So > > KSPSolve is 96 % and MatMult is 70 % + PCApply 24 % = 94 % so this makes sense; the solver time is essentially the > multiply time plus the PCApply time. > > compute_rhs 1823 1.0 4.2119e+02 1.0 0.00e+00 0.0 4.4e+04 5.4e+04 1.1e+04 71 0100100 39 71 0100100 39 0 > LU-SGS 1647 1.0 1.3590e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 23 0 0 0 0 23 0 0 0 0 0 > SURFINT 1823 1.0 1.0647e+02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 17 0 0 0 0 17 0 0 0 0 0 > VOLINT 1823 1.0 2.2373e+02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 35 0 0 0 0 35 0 0 0 0 0 > > Depending on the "quality" of the preconditioner (if it is really good) one expects the preconditioner time to be larger than the MatMult(). Only for simple preconditioners (like Jacobi) does one see it being much less than the MatMult(). For matrix based solvers the amount of work in SGS is as large as the amount of work in the MatMult() if not more, so I would expect the time of the preconditioner to be higher than the time of the multiply. > > So based on knowing almost nothing I think the MatMult_ is taking more time then it should unless you are ignoring (skipping) a lot of the terms in your matrix-free SGS; then it is probably reasonable. > > Barry > > > > > On Jan 14, 2016, at 3:01 PM, Song Gao wrote: > > > > Hello, > > > > I am profiling a finite element Navier-Stokes solver. It uses the Jacobian-free Newton Krylov method and a custom preconditoner LU-SGS (a matrix-free version of Symmetic Gauss-Seidel ). The log summary is attached. Four events are registered. compute_rhs is compute rhs (used by MatMult_MFFD). SURFINT and VOLINT are parts of compute_rhs. LU-SGS is the custom preconditioner. I didn't call PetscLogFlops so these flops are zeros. > > > > I'm wondering, is the percent time of the events reasonable in the table? I see 69% time is spent on matmult_mffd. Is it expected in matrix-free method? What might be a good starting point of profiling this solver? Thank you in advance. > > > > > > Song Gao > > > > > From jed at jedbrown.org Fri Jan 15 13:56:19 2016 From: jed at jedbrown.org (Jed Brown) Date: Fri, 15 Jan 2016 12:56:19 -0700 Subject: [petsc-users] Profile a matrix-free solver. In-Reply-To: References: Message-ID: <874mee62ho.fsf@jedbrown.org> Matthew Knepley writes: > The way I read this, you are taking about 23 iterates/solve, and most of > your work is residual computation which should > be highly parallelizable/vectorizable. This seems great to me. This in the sense that it's up to you to determine whether your matrix-free residual and preconditioning code is fast. This profile merely says that almost all of the run-time is in *your code*. If your code is fast, then this is good performance. If you can use a different algorithm to converge in fewer iterations, or a different representation to apply the operator faster, then you could do better. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From song.gao2 at mail.mcgill.ca Fri Jan 15 14:33:54 2016 From: song.gao2 at mail.mcgill.ca (Song Gao) Date: Fri, 15 Jan 2016 15:33:54 -0500 Subject: [petsc-users] Profile a matrix-free solver. In-Reply-To: <97BFBC85-B01B-4A64-B162-05F340E9BCD6@mcs.anl.gov> References: <84EECF20-BB7A-40BD-820B-1FC44E847034@mcs.anl.gov> <97BFBC85-B01B-4A64-B162-05F340E9BCD6@mcs.anl.gov> Message-ID: Yes, you are right. In matrix-free SGS, the AUSM 2nd order inviscid fluxes are replace by a simpler first order numerical fluxes. 2016-01-15 14:42 GMT-05:00 Barry Smith : > > > On Jan 15, 2016, at 10:52 AM, Song Gao wrote: > > > > Hello, Barry, > > > > Thanks for your prompt reply. I ran the matrix-based solver with > matrix-based SGS precondioner. I see your point. The profiling table is > below and attached. > > > > So Matmult takes 4% time and PCApply takes 43% time. > > > > MatMult 636 1.0 9.0361e+00 1.0 9.21e+09 1.0 7.6e+03 > 1.1e+04 0.0e+00 4 85 52 17 0 4 85 52 17 0 3980 > > PCApply 636 1.0 8.7006e+01 1.0 0.00e+00 0.0 0.0e+00 > 0.0e+00 1.9e+03 43 0 0 0 24 43 0 0 0 24 0 > > > > > > The way I see it, the matrix-free solver spends most of the time (70%) > on matmult or equivalently rhs evaluation. Every KSP iteration, one rhs > evaluation is performed. This is much more costly than a matrix vector > product in a matrix-based solver. > > Sure, but if the matrix-free SGS mimics all the work of the right hand > side function evaluation (which is has to if it truly is a a SGS sweep and > not some approximation (where you drop certain terms in the right hand side > function when you compute the SGS)) then the matrix-free SGS should be at > least as expensive as the right hand side evaluation. > > Barry > > > My guess is your SGS drops some terms so is only and approximation, but is > still good enough as a preconditioner. > > > Perhaps this is expected in matrix-free solver. > > > > I will start look at the rhs evaluation since it takes the most time. > > > > Thanks. > > Song Gao > > > > > > > > 2016-01-14 16:24 GMT-05:00 Barry Smith : > > > > So > > > > KSPSolve is 96 % and MatMult is 70 % + PCApply 24 % = 94 % so this > makes sense; the solver time is essentially the > > multiply time plus the PCApply time. > > > > compute_rhs 1823 1.0 4.2119e+02 1.0 0.00e+00 0.0 4.4e+04 5.4e+04 > 1.1e+04 71 0100100 39 71 0100100 39 0 > > LU-SGS 1647 1.0 1.3590e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 23 0 0 0 0 23 0 0 0 0 0 > > SURFINT 1823 1.0 1.0647e+02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 17 0 0 0 0 17 0 0 0 0 0 > > VOLINT 1823 1.0 2.2373e+02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 35 0 0 0 0 35 0 0 0 0 0 > > > > Depending on the "quality" of the preconditioner (if it is really > good) one expects the preconditioner time to be larger than the MatMult(). > Only for simple preconditioners (like Jacobi) does one see it being much > less than the MatMult(). For matrix based solvers the amount of work in > SGS is as large as the amount of work in the MatMult() if not more, so I > would expect the time of the preconditioner to be higher than the time of > the multiply. > > > > So based on knowing almost nothing I think the MatMult_ is taking more > time then it should unless you are ignoring (skipping) a lot of the terms > in your matrix-free SGS; then it is probably reasonable. > > > > Barry > > > > > > > > > On Jan 14, 2016, at 3:01 PM, Song Gao > wrote: > > > > > > Hello, > > > > > > I am profiling a finite element Navier-Stokes solver. It uses the > Jacobian-free Newton Krylov method and a custom preconditoner LU-SGS (a > matrix-free version of Symmetic Gauss-Seidel ). The log summary is > attached. Four events are registered. compute_rhs is compute rhs (used by > MatMult_MFFD). SURFINT and VOLINT are parts of compute_rhs. LU-SGS is the > custom preconditioner. I didn't call PetscLogFlops so these flops are zeros. > > > > > > I'm wondering, is the percent time of the events reasonable in the > table? I see 69% time is spent on matmult_mffd. Is it expected in > matrix-free method? What might be a good starting point of profiling this > solver? Thank you in advance. > > > > > > > > > Song Gao > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From song.gao2 at mail.mcgill.ca Fri Jan 15 14:34:58 2016 From: song.gao2 at mail.mcgill.ca (Song Gao) Date: Fri, 15 Jan 2016 15:34:58 -0500 Subject: [petsc-users] Profile a matrix-free solver. In-Reply-To: <874mee62ho.fsf@jedbrown.org> References: <874mee62ho.fsf@jedbrown.org> Message-ID: Thanks. I'll try to improve "my code" 2016-01-15 14:56 GMT-05:00 Jed Brown : > Matthew Knepley writes: > > The way I read this, you are taking about 23 iterates/solve, and most of > > your work is residual computation which should > > be highly parallelizable/vectorizable. This seems great to me. > > This in the sense that it's up to you to determine whether your > matrix-free residual and preconditioning code is fast. This profile > merely says that almost all of the run-time is in *your code*. If your > code is fast, then this is good performance. If you can use a different > algorithm to converge in fewer iterations, or a different representation > to apply the operator faster, then you could do better. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri Jan 15 14:38:43 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 15 Jan 2016 14:38:43 -0600 Subject: [petsc-users] Profile a matrix-free solver. In-Reply-To: References: <874mee62ho.fsf@jedbrown.org> Message-ID: > On Jan 15, 2016, at 2:34 PM, Song Gao wrote: > > Thanks. I'll try to improve "my code" Here you can benefit from Instruments since it can show line by line and loop level hotspots in your compute rhs Barry > > 2016-01-15 14:56 GMT-05:00 Jed Brown : > Matthew Knepley writes: > > The way I read this, you are taking about 23 iterates/solve, and most of > > your work is residual computation which should > > be highly parallelizable/vectorizable. This seems great to me. > > This in the sense that it's up to you to determine whether your > matrix-free residual and preconditioning code is fast. This profile > merely says that almost all of the run-time is in *your code*. If your > code is fast, then this is good performance. If you can use a different > algorithm to converge in fewer iterations, or a different representation > to apply the operator faster, then you could do better. > From ling.zou at inl.gov Fri Jan 15 14:39:40 2016 From: ling.zou at inl.gov (Zou (Non-US), Ling) Date: Fri, 15 Jan 2016 13:39:40 -0700 Subject: [petsc-users] Profile a matrix-free solver. In-Reply-To: References: <874mee62ho.fsf@jedbrown.org> Message-ID: Hi Song, I wonder if you have a reference paper on the preconditioning algorithm you are working on, i.e., using the 1st order flux for preconditioning purpose when your 'true' fluxes are evaluated using the 2nd order AUSM scheme. Best, Ling On Fri, Jan 15, 2016 at 1:34 PM, Song Gao wrote: > Thanks. I'll try to improve "my code" > > 2016-01-15 14:56 GMT-05:00 Jed Brown : > >> Matthew Knepley writes: >> > The way I read this, you are taking about 23 iterates/solve, and most of >> > your work is residual computation which should >> > be highly parallelizable/vectorizable. This seems great to me. >> >> This in the sense that it's up to you to determine whether your >> matrix-free residual and preconditioning code is fast. This profile >> merely says that almost all of the run-time is in *your code*. If your >> code is fast, then this is good performance. If you can use a different >> algorithm to converge in fewer iterations, or a different representation >> to apply the operator faster, then you could do better. >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From song.gao2 at mail.mcgill.ca Fri Jan 15 14:58:20 2016 From: song.gao2 at mail.mcgill.ca (Song Gao) Date: Fri, 15 Jan 2016 15:58:20 -0500 Subject: [petsc-users] Profile a matrix-free solver. In-Reply-To: References: <874mee62ho.fsf@jedbrown.org> Message-ID: Yes. http://www.sciencedirect.com/science/article/pii/S0021999198960764 It's on page 668 equation 4.6. Thanks 2016-01-15 15:39 GMT-05:00 Zou (Non-US), Ling : > Hi Song, I wonder if you have a reference paper on the preconditioning > algorithm you are working on, i.e., using the 1st order flux for > preconditioning purpose when your 'true' fluxes are evaluated using the 2nd > order AUSM scheme. > > Best, > > Ling > > On Fri, Jan 15, 2016 at 1:34 PM, Song Gao > wrote: > >> Thanks. I'll try to improve "my code" >> >> 2016-01-15 14:56 GMT-05:00 Jed Brown : >> >>> Matthew Knepley writes: >>> > The way I read this, you are taking about 23 iterates/solve, and most >>> of >>> > your work is residual computation which should >>> > be highly parallelizable/vectorizable. This seems great to me. >>> >>> This in the sense that it's up to you to determine whether your >>> matrix-free residual and preconditioning code is fast. This profile >>> merely says that almost all of the run-time is in *your code*. If your >>> code is fast, then this is good performance. If you can use a different >>> algorithm to converge in fewer iterations, or a different representation >>> to apply the operator faster, then you could do better. >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ling.zou at inl.gov Fri Jan 15 15:02:42 2016 From: ling.zou at inl.gov (Zou (Non-US), Ling) Date: Fri, 15 Jan 2016 14:02:42 -0700 Subject: [petsc-users] Profile a matrix-free solver. In-Reply-To: References: <874mee62ho.fsf@jedbrown.org> Message-ID: Thank you very much! Ling On Fri, Jan 15, 2016 at 1:58 PM, Song Gao wrote: > Yes. > http://www.sciencedirect.com/science/article/pii/S0021999198960764 > > > It's on page 668 equation 4.6. > > Thanks > > 2016-01-15 15:39 GMT-05:00 Zou (Non-US), Ling : > >> Hi Song, I wonder if you have a reference paper on the preconditioning >> algorithm you are working on, i.e., using the 1st order flux for >> preconditioning purpose when your 'true' fluxes are evaluated using the 2nd >> order AUSM scheme. >> >> Best, >> >> Ling >> >> On Fri, Jan 15, 2016 at 1:34 PM, Song Gao >> wrote: >> >>> Thanks. I'll try to improve "my code" >>> >>> 2016-01-15 14:56 GMT-05:00 Jed Brown : >>> >>>> Matthew Knepley writes: >>>> > The way I read this, you are taking about 23 iterates/solve, and most >>>> of >>>> > your work is residual computation which should >>>> > be highly parallelizable/vectorizable. This seems great to me. >>>> >>>> This in the sense that it's up to you to determine whether your >>>> matrix-free residual and preconditioning code is fast. This profile >>>> merely says that almost all of the run-time is in *your code*. If your >>>> code is fast, then this is good performance. If you can use a different >>>> algorithm to converge in fewer iterations, or a different representation >>>> to apply the operator faster, then you could do better. >>>> >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From amneetb at live.unc.edu Fri Jan 15 17:33:21 2016 From: amneetb at live.unc.edu (Bhalla, Amneet Pal S) Date: Fri, 15 Jan 2016 23:33:21 +0000 Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private() Message-ID: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu> Hi Barry, In our code at each timestep we build MG level smoothers using PETSc KSP solvers. We are using a PETSc function KSPSetFromOptions() after we set some default values to the KSP. However, the profiler is showing that PetscOptionsFindPair_Private() is taking about 14% of total runtime. We ran the code for 100 timesteps, and preconditioner is built everytime step. I am posting a sequence of calls to KSPSolve_Richardson that shows getting PETScOptions adds up to a lot of cost "KSPSolve_Richardson" 8.80e+05 12.7% "PCApplyBAorAB" 5.25e+05 7.6% "PCApply" 4.85e+05 7.0% "PCApply_ASM" 4.85e+05 7.0% "KSPSolve" 4.53e+05 6.6% "KSPSolve_PREONLY" 2.06e+05 3.0% "PetscObjectViewFromOptions" 3.19e+04 0.5% "PetscObjectViewFromOptions" 2.39e+04 0.3% "PetscOptionsGetBool" 2.39e+04 0.3% "PetscOptionsHasName" 2.39e+04 0.3% "PetscOptionsGetBool" 2.39e+04 0.3% "PetscOptionsHasName" 1.60e+04 0.2% "PetscOptionsGetBool" 1.60e+04 0.2% "PetscObjectViewFromOptions" 1.60e+04 0.2% "KSPReasonViewFromOptions" 1.60e+04 0.2% "PetscOptionsGetBool" 1.56e+04 0.2% "PetscObjectViewFromOptions" 7.98e+03 0.1% "KSPSetUpOnBlocks" 7.98e+03 0.1% "PetscOptionsGetBool" 7.98e+03 0.1% "VecSet" 7.97e+03 0.1% "PetscObjectViewFromOptions" 7.92e+03 0.1% Do you have some suggestions as to doing it in a fast way -- maybe parsing options only once in the simulation and making populating KSP options essentially a no-op? Thanks, --Amneet From jed at jedbrown.org Fri Jan 15 17:53:11 2016 From: jed at jedbrown.org (Jed Brown) Date: Fri, 15 Jan 2016 16:53:11 -0700 Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private() In-Reply-To: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu> References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu> Message-ID: <8760yu4cyg.fsf@jedbrown.org> "Bhalla, Amneet Pal S" writes: > Hi Barry, > > In our code at each timestep we build MG level smoothers using PETSc > KSP solvers. Do you need to create new objects versus merely resetting them? I suspect that calling KSPReset() between timesteps instead of creating a new object and calling KSPSetFromOptions() will fix your performance woes. > We are using a PETSc function KSPSetFromOptions() after we set some > default values to the KSP. However, the profiler is showing that > PetscOptionsFindPair_Private() is taking about 14% of total runtime. > We ran the code for 100 timesteps, and preconditioner is built > everytime step. I am posting a sequence of calls to > KSPSolve_Richardson that shows getting PETScOptions adds up to a lot > of cost > > "KSPSolve_Richardson" 8.80e+05 12.7% > "PCApplyBAorAB" 5.25e+05 7.6% > "PCApply" 4.85e+05 7.0% > "PCApply_ASM" 4.85e+05 7.0% > "KSPSolve" 4.53e+05 6.6% > "KSPSolve_PREONLY" 2.06e+05 3.0% > "PetscObjectViewFromOptions" 3.19e+04 0.5% > "PetscObjectViewFromOptions" 2.39e+04 0.3% > "PetscOptionsGetBool" 2.39e+04 0.3% > "PetscOptionsHasName" 2.39e+04 0.3% > "PetscOptionsGetBool" 2.39e+04 0.3% > "PetscOptionsHasName" 1.60e+04 0.2% > "PetscOptionsGetBool" 1.60e+04 0.2% > "PetscObjectViewFromOptions" 1.60e+04 0.2% > "KSPReasonViewFromOptions" 1.60e+04 0.2% > "PetscOptionsGetBool" 1.56e+04 0.2% > "PetscObjectViewFromOptions" 7.98e+03 0.1% > "KSPSetUpOnBlocks" 7.98e+03 0.1% > "PetscOptionsGetBool" 7.98e+03 0.1% > "VecSet" 7.97e+03 0.1% > "PetscObjectViewFromOptions" 7.92e+03 0.1% > > Do you have some suggestions as to doing it in a fast way -- maybe parsing options only once in the simulation and making populating KSP > options essentially a no-op? > > Thanks, > --Amneet -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From bsmith at mcs.anl.gov Fri Jan 15 17:59:21 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 15 Jan 2016 17:59:21 -0600 Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private() In-Reply-To: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu> References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu> Message-ID: <6C48C329-D686-42F8-8C04-20B59485D742@mcs.anl.gov> Amneet, Thanks for bringing this to our attention. The long term design goal in PETSc is that the PetscOptions... calls are all made from the XXXSetFromOptions() calls and not within the numerical solver portions. Unfortunately this is not as easy to do as it might seem; hence there are a bunch of them scattered within the solver portions. In particular the worse culprit is KSPSolve(). You can run the following experiment: edit src/ksp/ksp/interface/itfunc.c and locate the function KSPSolve() now comment out all the lines with the work Option in them (I count about 17 of them) now do make gnumake in that directory (of course with optimized build) then rerun your exact same code that you report for from below. How much faster is the total time and how much percentage are the troublesome Options calls now? In other words how much does this change help? A dramatic difference would motivate us to fix this problem sooner rather than later. Barry > On Jan 15, 2016, at 5:33 PM, Bhalla, Amneet Pal S wrote: > > > Hi Barry, > > In our code at each timestep we build MG level smoothers using PETSc KSP solvers. We are using a PETSc function KSPSetFromOptions() > after we set some default values to the KSP. However, the profiler is showing that PetscOptionsFindPair_Private() is taking about 14% of total runtime. > We ran the code for 100 timesteps, and preconditioner is built everytime step. I am posting a sequence of calls to KSPSolve_Richardson that shows > getting PETScOptions adds up to a lot of cost > > "KSPSolve_Richardson" 8.80e+05 12.7% > "PCApplyBAorAB" 5.25e+05 7.6% > "PCApply" 4.85e+05 7.0% > "PCApply_ASM" 4.85e+05 7.0% > "KSPSolve" 4.53e+05 6.6% > "KSPSolve_PREONLY" 2.06e+05 3.0% > "PetscObjectViewFromOptions" 3.19e+04 0.5% > "PetscObjectViewFromOptions" 2.39e+04 0.3% > "PetscOptionsGetBool" 2.39e+04 0.3% > "PetscOptionsHasName" 2.39e+04 0.3% > "PetscOptionsGetBool" 2.39e+04 0.3% > "PetscOptionsHasName" 1.60e+04 0.2% > "PetscOptionsGetBool" 1.60e+04 0.2% > "PetscObjectViewFromOptions" 1.60e+04 0.2% > "KSPReasonViewFromOptions" 1.60e+04 0.2% > "PetscOptionsGetBool" 1.56e+04 0.2% > "PetscObjectViewFromOptions" 7.98e+03 0.1% > "KSPSetUpOnBlocks" 7.98e+03 0.1% > "PetscOptionsGetBool" 7.98e+03 0.1% > "VecSet" 7.97e+03 0.1% > "PetscObjectViewFromOptions" 7.92e+03 0.1% > > Do you have some suggestions as to doing it in a fast way -- maybe parsing options only once in the simulation and making populating KSP > options essentially a no-op? > > Thanks, > --Amneet From amneetb at live.unc.edu Fri Jan 15 18:15:00 2016 From: amneetb at live.unc.edu (Bhalla, Amneet Pal S) Date: Sat, 16 Jan 2016 00:15:00 +0000 Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private() In-Reply-To: <8760yu4cyg.fsf@jedbrown.org> References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu> <8760yu4cyg.fsf@jedbrown.org> Message-ID: <8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu> On Jan 15, 2016, at 3:53 PM, Jed Brown > wrote: Do you need to create new objects versus merely resetting them? I suspect that calling KSPReset() between timesteps instead of creating a new object and calling KSPSetFromOptions() will fix your performance woes. We definitely need to destroy Mat associated with KSP everytime. This is a dynamic fluid-structure interaction problem on AMR grid, where the Cartesian grid and the structure moves at every timestep. Is it possible to reset a KSP with a different Mat? Are you suggesting to call KSPCreate() and KSPSetFromOptions() only once at the beginning of simulation, KSPReset() after every timestep, and KSPDestroy() at the end of the simulation? -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Jan 15 19:40:51 2016 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 15 Jan 2016 19:40:51 -0600 Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private() In-Reply-To: <8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu> References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu> <8760yu4cyg.fsf@jedbrown.org> <8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu> Message-ID: On Fri, Jan 15, 2016 at 6:15 PM, Bhalla, Amneet Pal S wrote: > > > On Jan 15, 2016, at 3:53 PM, Jed Brown wrote: > > Do you need to create new objects versus merely resetting them? I > suspect that calling KSPReset() between timesteps instead of creating a > new object and calling KSPSetFromOptions() will fix your performance > woes. > > > We definitely need to destroy Mat associated with KSP everytime. This is a > dynamic fluid-structure > interaction problem on AMR grid, where the Cartesian grid and the > structure moves at every timestep. > Is it possible to reset a KSP with a different Mat? Are you suggesting to > call KSPCreate() and > KSPSetFromOptions() only once at the beginning of simulation, KSPReset() > after every timestep, > and KSPDestroy() at the end of the simulation? > That is how KSPReset() is supposed to work, but no one is really exercising it now. I am inclined to try Barry's experiment first, since this may have bugs that we have not yet discovered. However, Jed is correct that this is probably the best design. Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri Jan 15 20:26:17 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 15 Jan 2016 20:26:17 -0600 Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private() In-Reply-To: <8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu> References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu> <8760yu4cyg.fsf@jedbrown.org> <8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu> Message-ID: <91D183F4-2BCE-4A06-A13A-8330619E4207@mcs.anl.gov> SNES/KSPReset() destroys all the vectors and matrices but keeps all the options that have been set for the object. So using it saves rebuilding those objects. For large systems Reset would save only a trivial amount of rebuild. > On Jan 15, 2016, at 6:15 PM, Bhalla, Amneet Pal S wrote: > > > >> On Jan 15, 2016, at 3:53 PM, Jed Brown wrote: >> >> Do you need to create new objects versus merely resetting them? I >> suspect that calling KSPReset() between timesteps instead of creating a >> new object and calling KSPSetFromOptions() will fix your performance >> woes. > > We definitely need to destroy Mat associated with KSP everytime. This is a dynamic fluid-structure > interaction problem on AMR grid, where the Cartesian grid and the structure moves at every timestep. > Is it possible to reset a KSP with a different Mat? Are you suggesting to call KSPCreate() and > KSPSetFromOptions() only once at the beginning of simulation, KSPReset() after every timestep, > and KSPDestroy() at the end of the simulation? From amneetb at live.unc.edu Fri Jan 15 23:34:22 2016 From: amneetb at live.unc.edu (Bhalla, Amneet Pal S) Date: Sat, 16 Jan 2016 05:34:22 +0000 Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private() In-Reply-To: References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu> <8760yu4cyg.fsf@jedbrown.org> <8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu> Message-ID: <906F4402-B8CD-46A5-8858-DE1517F29E46@ad.unc.edu> On Jan 15, 2016, at 5:40 PM, Matthew Knepley > wrote: I am inclined to try Barry's experiment first, since this may have bugs that we have not yet discovered. Ok, I tried Barry?s suggestion. The runtime for PetscOptionsFindPair_Private() fell from 14% to mere 1.6%. If I am getting it right, it?s the petsc options in the KSPSolve() that is sucking up nontrivial amount of time (14 - 1.6) and not KSPSetFromOptions() itself (1.6%). -------------- next part -------------- An HTML attachment was scrubbed... URL: From boyceg at email.unc.edu Sat Jan 16 07:12:03 2016 From: boyceg at email.unc.edu (Griffith, Boyce Eugene) Date: Sat, 16 Jan 2016 13:12:03 +0000 Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private() In-Reply-To: <906F4402-B8CD-46A5-8858-DE1517F29E46@ad.unc.edu> References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu> <8760yu4cyg.fsf@jedbrown.org> <8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu> <906F4402-B8CD-46A5-8858-DE1517F29E46@ad.unc.edu> Message-ID: <4B163265-8984-454E-B88D-6343BDCDD5E4@email.unc.edu> On Jan 16, 2016, at 12:34 AM, Bhalla, Amneet Pal S > wrote: On Jan 15, 2016, at 5:40 PM, Matthew Knepley > wrote: I am inclined to try Barry's experiment first, since this may have bugs that we have not yet discovered. Ok, I tried Barry?s suggestion. The runtime for PetscOptionsFindPair_Private() fell from 14% to mere 1.6%. If I am getting it right, it?s the petsc options in the KSPSolve() that is sucking up nontrivial amount of time (14 - 1.6) and not KSPSetFromOptions() itself (1.6%). Barry / Matt / Jed, if we were using KSPReset here and reusing KSPs, would that also bypass these calls to PetscOptionsXXX? Thanks, -- Boyce -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sat Jan 16 12:20:56 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sat, 16 Jan 2016 12:20:56 -0600 Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private() In-Reply-To: <4B163265-8984-454E-B88D-6343BDCDD5E4@email.unc.edu> References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu> <8760yu4cyg.fsf@jedbrown.org> <8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu> <906F4402-B8CD-46A5-8858-DE1517F29E46@ad.unc.edu> <4B163265-8984-454E-B88D-6343BDCDD5E4@email.unc.edu> Message-ID: > On Jan 16, 2016, at 7:12 AM, Griffith, Boyce Eugene wrote: > > >> On Jan 16, 2016, at 12:34 AM, Bhalla, Amneet Pal S wrote: >> >> >> >>> On Jan 15, 2016, at 5:40 PM, Matthew Knepley wrote: >>> >>> I am inclined to try >>> Barry's experiment first, since this may have bugs that we have not yet discovered. >> >> Ok, I tried Barry?s suggestion. The runtime for PetscOptionsFindPair_Private() fell from 14% to mere 1.6%. >> If I am getting it right, it?s the petsc options in the KSPSolve() that is sucking up nontrivial amount of time (14 - 1.6) >> and not KSPSetFromOptions() itself (1.6%). > > Barry / Matt / Jed, if we were using KSPReset here and reusing KSPs, would that also bypass these calls to PetscOptionsXXX? No that is a different issue. In the short term I recommend when running optimized/production you work with a PETSc with those Options checking in KSPSolve commented out, you don't use them anyways*. Since you are using ASM with many subdomains there are many "fast" calls to KSPSolve which is why for your particular case the the PetscOptionsFindPair_Private takes so much time. Now that you have eliminated this issue I would be very interested in seeing the HPCToolKit or Instruments profiling of the code to see hot spots in the PETSc solver configuration you are using. Thanks Barry * Eventually we'll switch to a KSPPreSolveMonitorSet() and KSPPostSolveMonitorSet() model to eliminate this overhead but still have the functionality. > > Thanks, > > -- Boyce From amneetb at live.unc.edu Sat Jan 16 15:00:08 2016 From: amneetb at live.unc.edu (Bhalla, Amneet Pal S) Date: Sat, 16 Jan 2016 21:00:08 +0000 Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private() In-Reply-To: References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu> <8760yu4cyg.fsf@jedbrown.org> <8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu> <906F4402-B8CD-46A5-8858-DE1517F29E46@ad.unc.edu> <4B163265-8984-454E-B88D-6343BDCDD5E4@email.unc.edu>, Message-ID: --Amneet Bhalla On Jan 16, 2016, at 10:21 AM, Barry Smith > wrote: On Jan 16, 2016, at 7:12 AM, Griffith, Boyce Eugene > wrote: On Jan 16, 2016, at 12:34 AM, Bhalla, Amneet Pal S > wrote: On Jan 15, 2016, at 5:40 PM, Matthew Knepley > wrote: I am inclined to try Barry's experiment first, since this may have bugs that we have not yet discovered. Ok, I tried Barry's suggestion. The runtime for PetscOptionsFindPair_Private() fell from 14% to mere 1.6%. If I am getting it right, it's the petsc options in the KSPSolve() that is sucking up nontrivial amount of time (14 - 1.6) and not KSPSetFromOptions() itself (1.6%). Barry / Matt / Jed, if we were using KSPReset here and reusing KSPs, would that also bypass these calls to PetscOptionsXXX? No that is a different issue. In the short term I recommend when running optimized/production you work with a PETSc with those Options checking in KSPSolve commented out, you don't use them anyways*. Since you are using ASM with many subdomains there are many "fast" calls to KSPSolve which is why for your particular case the the PetscOptionsFindPair_Private takes so much time. Now that you have eliminated this issue I would be very interested in seeing the HPCToolKit or Instruments profiling of the code to see hot spots in the PETSc solver configuration you are using. Thanks Barry --- the best way and the least back and forth way would be if I can send you the files (maybe off-list) that you can view in HPCViewer, which is a light weight java script app. You can view which the calling context (which petsc function calls which internal petsc routine) in a cascade form. If I send you an excel sheet, it would be in a flat view and not that useful for serious profiling. Let me know if you would like to try that. Barry * Eventually we'll switch to a KSPPreSolveMonitorSet() and KSPPostSolveMonitorSet() model to eliminate this overhead but still have the functionality. Thanks, -- Boyce -------------- next part -------------- An HTML attachment was scrubbed... URL: From boyceg at email.unc.edu Sat Jan 16 15:04:37 2016 From: boyceg at email.unc.edu (Griffith, Boyce Eugene) Date: Sat, 16 Jan 2016 21:04:37 +0000 Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private() In-Reply-To: References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu> <8760yu4cyg.fsf@jedbrown.org> <8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu> <906F4402-B8CD-46A5-8858-DE1517F29E46@ad.unc.edu> <4B163265-8984-454E-B88D-6343BDCDD5E4@email.unc.edu> Message-ID: On Jan 16, 2016, at 4:00 PM, Bhalla, Amneet Pal S > wrote: --Amneet Bhalla On Jan 16, 2016, at 10:21 AM, Barry Smith > wrote: On Jan 16, 2016, at 7:12 AM, Griffith, Boyce Eugene > wrote: On Jan 16, 2016, at 12:34 AM, Bhalla, Amneet Pal S > wrote: On Jan 15, 2016, at 5:40 PM, Matthew Knepley > wrote: I am inclined to try Barry's experiment first, since this may have bugs that we have not yet discovered. Ok, I tried Barry?s suggestion. The runtime for PetscOptionsFindPair_Private() fell from 14% to mere 1.6%. If I am getting it right, it?s the petsc options in the KSPSolve() that is sucking up nontrivial amount of time (14 - 1.6) and not KSPSetFromOptions() itself (1.6%). Barry / Matt / Jed, if we were using KSPReset here and reusing KSPs, would that also bypass these calls to PetscOptionsXXX? No that is a different issue. In the short term I recommend when running optimized/production you work with a PETSc with those Options checking in KSPSolve commented out, you don't use them anyways*. Since you are using ASM with many subdomains there are many "fast" calls to KSPSolve which is why for your particular case the the PetscOptionsFindPair_Private takes so much time. Now that you have eliminated this issue I would be very interested in seeing the HPCToolKit or Instruments profiling of the code to see hot spots in the PETSc solver configuration you are using. Thanks Barry --- the best way and the least back and forth way would be if I can send you the files (maybe off-list) that you can view in HPCViewer, which is a light weight java script app. You can view which the calling context (which petsc function calls which internal petsc routine) in a cascade form. If I send you an excel sheet, it would be in a flat view and not that useful for serious profiling. Amneet, can you just run with OS X Instruments, which Barry already knows how to use (right Barry?)? :-) Thanks, -- Boyce Let me know if you would like to try that. Barry * Eventually we'll switch to a KSPPreSolveMonitorSet() and KSPPostSolveMonitorSet() model to eliminate this overhead but still have the functionality. Thanks, -- Boyce -------------- next part -------------- An HTML attachment was scrubbed... URL: From amneetb at live.unc.edu Sat Jan 16 15:06:46 2016 From: amneetb at live.unc.edu (Bhalla, Amneet Pal S) Date: Sat, 16 Jan 2016 21:06:46 +0000 Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private() In-Reply-To: References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu> <8760yu4cyg.fsf@jedbrown.org> <8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu> <906F4402-B8CD-46A5-8858-DE1517F29E46@ad.unc.edu> <4B163265-8984-454E-B88D-6343BDCDD5E4@email.unc.edu> , Message-ID: Does Instruments save results somewhere (like in a cascade view) that I can send to Barry? --Amneet Bhalla On Jan 16, 2016, at 1:04 PM, Griffith, Boyce Eugene > wrote: On Jan 16, 2016, at 4:00 PM, Bhalla, Amneet Pal S > wrote: --Amneet Bhalla On Jan 16, 2016, at 10:21 AM, Barry Smith > wrote: On Jan 16, 2016, at 7:12 AM, Griffith, Boyce Eugene > wrote: On Jan 16, 2016, at 12:34 AM, Bhalla, Amneet Pal S > wrote: On Jan 15, 2016, at 5:40 PM, Matthew Knepley > wrote: I am inclined to try Barry's experiment first, since this may have bugs that we have not yet discovered. Ok, I tried Barry's suggestion. The runtime for PetscOptionsFindPair_Private() fell from 14% to mere 1.6%. If I am getting it right, it's the petsc options in the KSPSolve() that is sucking up nontrivial amount of time (14 - 1.6) and not KSPSetFromOptions() itself (1.6%). Barry / Matt / Jed, if we were using KSPReset here and reusing KSPs, would that also bypass these calls to PetscOptionsXXX? No that is a different issue. In the short term I recommend when running optimized/production you work with a PETSc with those Options checking in KSPSolve commented out, you don't use them anyways*. Since you are using ASM with many subdomains there are many "fast" calls to KSPSolve which is why for your particular case the the PetscOptionsFindPair_Private takes so much time. Now that you have eliminated this issue I would be very interested in seeing the HPCToolKit or Instruments profiling of the code to see hot spots in the PETSc solver configuration you are using. Thanks Barry --- the best way and the least back and forth way would be if I can send you the files (maybe off-list) that you can view in HPCViewer, which is a light weight java script app. You can view which the calling context (which petsc function calls which internal petsc routine) in a cascade form. If I send you an excel sheet, it would be in a flat view and not that useful for serious profiling. Amneet, can you just run with OS X Instruments, which Barry already knows how to use (right Barry?)? :-) Thanks, -- Boyce Let me know if you would like to try that. Barry * Eventually we'll switch to a KSPPreSolveMonitorSet() and KSPPostSolveMonitorSet() model to eliminate this overhead but still have the functionality. Thanks, -- Boyce -------------- next part -------------- An HTML attachment was scrubbed... URL: From boyceg at email.unc.edu Sat Jan 16 15:10:56 2016 From: boyceg at email.unc.edu (Griffith, Boyce Eugene) Date: Sat, 16 Jan 2016 21:10:56 +0000 Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private() In-Reply-To: References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu> <8760yu4cyg.fsf@jedbrown.org> <8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu> <906F4402-B8CD-46A5-8858-DE1517F29E46@ad.unc.edu> <4B163265-8984-454E-B88D-6343BDCDD5E4@email.unc.edu> Message-ID: <258B10DE-29A8-4C89-95D1-9544748819C0@email.unc.edu> On Jan 16, 2016, at 4:06 PM, Bhalla, Amneet Pal S > wrote: Does Instruments save results somewhere (like in a cascade view) that I can send to Barry? Yes --- "save as..." will save the current trace, and then you can open it back up. -- Boyce --Amneet Bhalla On Jan 16, 2016, at 1:04 PM, Griffith, Boyce Eugene > wrote: On Jan 16, 2016, at 4:00 PM, Bhalla, Amneet Pal S > wrote: --Amneet Bhalla On Jan 16, 2016, at 10:21 AM, Barry Smith > wrote: On Jan 16, 2016, at 7:12 AM, Griffith, Boyce Eugene > wrote: On Jan 16, 2016, at 12:34 AM, Bhalla, Amneet Pal S > wrote: On Jan 15, 2016, at 5:40 PM, Matthew Knepley > wrote: I am inclined to try Barry's experiment first, since this may have bugs that we have not yet discovered. Ok, I tried Barry?s suggestion. The runtime for PetscOptionsFindPair_Private() fell from 14% to mere 1.6%. If I am getting it right, it?s the petsc options in the KSPSolve() that is sucking up nontrivial amount of time (14 - 1.6) and not KSPSetFromOptions() itself (1.6%). Barry / Matt / Jed, if we were using KSPReset here and reusing KSPs, would that also bypass these calls to PetscOptionsXXX? No that is a different issue. In the short term I recommend when running optimized/production you work with a PETSc with those Options checking in KSPSolve commented out, you don't use them anyways*. Since you are using ASM with many subdomains there are many "fast" calls to KSPSolve which is why for your particular case the the PetscOptionsFindPair_Private takes so much time. Now that you have eliminated this issue I would be very interested in seeing the HPCToolKit or Instruments profiling of the code to see hot spots in the PETSc solver configuration you are using. Thanks Barry --- the best way and the least back and forth way would be if I can send you the files (maybe off-list) that you can view in HPCViewer, which is a light weight java script app. You can view which the calling context (which petsc function calls which internal petsc routine) in a cascade form. If I send you an excel sheet, it would be in a flat view and not that useful for serious profiling. Amneet, can you just run with OS X Instruments, which Barry already knows how to use (right Barry?)? :-) Thanks, -- Boyce Let me know if you would like to try that. Barry * Eventually we'll switch to a KSPPreSolveMonitorSet() and KSPPostSolveMonitorSet() model to eliminate this overhead but still have the functionality. Thanks, -- Boyce -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sat Jan 16 15:13:38 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sat, 16 Jan 2016 15:13:38 -0600 Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private() In-Reply-To: <258B10DE-29A8-4C89-95D1-9544748819C0@email.unc.edu> References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu> <8760yu4cyg.fsf@jedbrown.org> <8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu> <906F4402-B8CD-46A5-8858-DE1517F29E46@ad.unc.edu> <4B163265-8984-454E-B88D-6343BDCDD5E4@email.unc.edu> <258B10DE-29A8-4C89-95D1-9544748819C0@email.unc.edu> Message-ID: > On Jan 16, 2016, at 3:10 PM, Griffith, Boyce Eugene wrote: > > >> On Jan 16, 2016, at 4:06 PM, Bhalla, Amneet Pal S wrote: >> >> Does Instruments save results somewhere (like in a cascade view) that I can send to Barry? > > Yes --- "save as..." will save the current trace, and then you can open it back up. Either way is fine so long as I don't have to install a ton of stuff; which it sounds like I won't. Barry > > -- Boyce > >> --Amneet Bhalla >> >> On Jan 16, 2016, at 1:04 PM, Griffith, Boyce Eugene wrote: >> >>> >>>> On Jan 16, 2016, at 4:00 PM, Bhalla, Amneet Pal S wrote: >>>> >>>> >>>> >>>> --Amneet Bhalla >>>> >>>> On Jan 16, 2016, at 10:21 AM, Barry Smith wrote: >>>> >>>>> >>>>>> On Jan 16, 2016, at 7:12 AM, Griffith, Boyce Eugene wrote: >>>>>> >>>>>> >>>>>>> On Jan 16, 2016, at 12:34 AM, Bhalla, Amneet Pal S wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On Jan 15, 2016, at 5:40 PM, Matthew Knepley wrote: >>>>>>>> >>>>>>>> I am inclined to try >>>>>>>> Barry's experiment first, since this may have bugs that we have not yet discovered. >>>>>>> >>>>>>> Ok, I tried Barry?s suggestion. The runtime for PetscOptionsFindPair_Private() fell from 14% to mere 1.6%. >>>>>>> If I am getting it right, it?s the petsc options in the KSPSolve() that is sucking up nontrivial amount of time (14 - 1.6) >>>>>>> and not KSPSetFromOptions() itself (1.6%). >>>>>> >>>>>> Barry / Matt / Jed, if we were using KSPReset here and reusing KSPs, would that also bypass these calls to PetscOptionsXXX? >>>>> >>>>> No that is a different issue. >>>>> >>>>> In the short term I recommend when running optimized/production you work with a PETSc with those Options checking in KSPSolve commented out, you don't use them anyways*. Since you are using ASM with many subdomains there are many "fast" calls to KSPSolve which is why for your particular case the the PetscOptionsFindPair_Private takes so much time. >>>>> >>>>> Now that you have eliminated this issue I would be very interested in seeing the HPCToolKit or Instruments profiling of the code to see hot spots in the PETSc solver configuration you are using. Thanks >>>> >>>> Barry --- the best way and the least back and forth way would be if I can send you the files (maybe off-list) that you can view in HPCViewer, which is a light weight java script app. You can view which the calling context (which petsc function calls which internal petsc routine) in a cascade form. If I send you an excel sheet, it would be in a flat view and not that useful for serious profiling. >>> >>> Amneet, can you just run with OS X Instruments, which Barry already knows how to use (right Barry?)? :-) >>> >>> Thanks, >>> >>> -- Boyce >>> >>>> >>>> Let me know if you would like to try that. >>>>> >>>>> Barry >>>>> >>>>> * Eventually we'll switch to a KSPPreSolveMonitorSet() and KSPPostSolveMonitorSet() model to eliminate this overhead but still have the functionality. >>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> -- Boyce >>>>> >>> > From knepley at gmail.com Sat Jan 16 08:41:44 2016 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 16 Jan 2016 08:41:44 -0600 Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private() In-Reply-To: <4B163265-8984-454E-B88D-6343BDCDD5E4@email.unc.edu> References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu> <8760yu4cyg.fsf@jedbrown.org> <8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu> <906F4402-B8CD-46A5-8858-DE1517F29E46@ad.unc.edu> <4B163265-8984-454E-B88D-6343BDCDD5E4@email.unc.edu> Message-ID: On Sat, Jan 16, 2016 at 7:12 AM, Griffith, Boyce Eugene < boyceg at email.unc.edu> wrote: > > On Jan 16, 2016, at 12:34 AM, Bhalla, Amneet Pal S > wrote: > > > > On Jan 15, 2016, at 5:40 PM, Matthew Knepley wrote: > > I am inclined to try > Barry's experiment first, since this may have bugs that we have not yet > discovered. > > > Ok, I tried Barry?s suggestion. The runtime for > PetscOptionsFindPair_Private() fell from 14% to mere 1.6%. > If I am getting it right, it?s the petsc options in the KSPSolve() that is > sucking up nontrivial amount of time (14 - 1.6) > and not KSPSetFromOptions() itself (1.6%). > > > Barry / Matt / Jed, if we were using KSPReset here and reusing KSPs, would > that also bypass these calls to PetscOptionsXXX? > No, we have to fix KSPSolve(). We will do it right now. Thanks, Matt > Thanks, > > -- Boyce > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From amneetb at live.unc.edu Sat Jan 16 17:09:26 2016 From: amneetb at live.unc.edu (Bhalla, Amneet Pal S) Date: Sat, 16 Jan 2016 23:09:26 +0000 Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private() In-Reply-To: References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu> <8760yu4cyg.fsf@jedbrown.org> <8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu> <906F4402-B8CD-46A5-8858-DE1517F29E46@ad.unc.edu> <4B163265-8984-454E-B88D-6343BDCDD5E4@email.unc.edu> <258B10DE-29A8-4C89-95D1-9544748819C0@email.unc.edu> Message-ID: <281383E4-00B6-4911-8184-D82E04F88EF3@ad.unc.edu> On Jan 16, 2016, at 1:13 PM, Barry Smith > wrote: Either way is fine so long as I don't have to install a ton of stuff; which it sounds like I won?t. http://hpctoolkit.org/download/hpcviewer/ Unzip HPCViewer for MacOSX with command line and drag the unzipped folder to Applications. You will be able to fire HPCViewer from LaunchPad. Point it to this attached directory. You will be able to see three different kind of profiling under Calling Context View, Callers View and Flat View. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: hpctoolkit-main2d-database.zip Type: application/zip Size: 1076038 bytes Desc: hpctoolkit-main2d-database.zip URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sat Jan 16 17:46:17 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sat, 16 Jan 2016 17:46:17 -0600 Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private() In-Reply-To: <281383E4-00B6-4911-8184-D82E04F88EF3@ad.unc.edu> References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu> <8760yu4cyg.fsf@jedbrown.org> <8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu> <906F4402-B8CD-46A5-8858-DE1517F29E46@ad.unc.edu> <4B163265-8984-454E-B88D-6343BDCDD5E4@email.unc.edu> <258B10DE-29A8-4C89-95D1-9544748819C0@email.unc.edu> <281383E4-00B6-4911-8184-D82E04F88EF3@ad.unc.edu> Message-ID: Just as I feared. HPC software with bad dependencies, oh well charging ahead anyways > On Jan 16, 2016, at 5:09 PM, Bhalla, Amneet Pal S wrote: > > > > >> On Jan 16, 2016, at 1:13 PM, Barry Smith wrote: >> >> Either way is fine so long as I don't have to install a ton of stuff; which it sounds like I won?t. > > http://hpctoolkit.org/download/hpcviewer/ > > Unzip HPCViewer for MacOSX with command line and drag the unzipped folder to Applications. You will be able to > fire HPCViewer from LaunchPad. Point it to this attached directory. You will be able to see three different kind of profiling > under Calling Context View, Callers View and Flat View. > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Untitled.png Type: image/png Size: 748263 bytes Desc: not available URL: From amneetb at live.unc.edu Sat Jan 16 17:58:51 2016 From: amneetb at live.unc.edu (Bhalla, Amneet Pal S) Date: Sat, 16 Jan 2016 23:58:51 +0000 Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private() In-Reply-To: References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu> <8760yu4cyg.fsf@jedbrown.org> <8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu> <906F4402-B8CD-46A5-8858-DE1517F29E46@ad.unc.edu> <4B163265-8984-454E-B88D-6343BDCDD5E4@email.unc.edu> <258B10DE-29A8-4C89-95D1-9544748819C0@email.unc.edu> <281383E4-00B6-4911-8184-D82E04F88EF3@ad.unc.edu> Message-ID: On Jan 16, 2016, at 3:46 PM, Barry Smith > wrote: Just as I feared. HPC software with bad dependencies, oh well charging ahead anyways Hmm... I have latest Java on my system. Can you try downloading it on a different browser (say Chrome)? Probably Safari is trying to unzip the file itself. You need to unzip by command line as Justin suggested. [cid:97DF211D-D876-4B67-B6E7-E002DA2EE95A] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2016-01-16 at 3.53.58 PM.png Type: image/png Size: 3467612 bytes Desc: Screen Shot 2016-01-16 at 3.53.58 PM.png URL: From bsmith at mcs.anl.gov Sat Jan 16 18:00:14 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sat, 16 Jan 2016 18:00:14 -0600 Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private() In-Reply-To: <281383E4-00B6-4911-8184-D82E04F88EF3@ad.unc.edu> References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu> <8760yu4cyg.fsf@jedbrown.org> <8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu> <906F4402-B8CD-46A5-8858-DE1517F29E46@ad.unc.edu> <4B163265-8984-454E-B88D-6343BDCDD5E4@email.unc.edu> <258B10DE-29A8-4C89-95D1-9544748819C0@email.unc.edu> <281383E4-00B6-4911-8184-D82E04F88EF3@ad.unc.edu> Message-ID: Ok, I looked at your results in hpcviewer and don't see any surprises. The PETSc time is in the little LU factorizations, the LU solves and the matrix-vector products as it should be. Not much can be done on speeding these except running on machines with high memory bandwidth. If you are using the master branch of PETSc two users gave us a nifty new profiler that is "PETSc style" but shows the hierarchy of PETSc solvers time and flop etc. You can run with -log_view :filename.xml:ascii_xml and then open the file with a browser (for example open -f Safari filename.xml) or email the file. Barry > On Jan 16, 2016, at 5:09 PM, Bhalla, Amneet Pal S wrote: > > > >> On Jan 16, 2016, at 1:13 PM, Barry Smith wrote: >> >> Either way is fine so long as I don't have to install a ton of stuff; which it sounds like I won?t. > > http://hpctoolkit.org/download/hpcviewer/ > > Unzip HPCViewer for MacOSX with command line and drag the unzipped folder to Applications. You will be able to > fire HPCViewer from LaunchPad. Point it to this attached directory. You will be able to see three different kind of profiling > under Calling Context View, Callers View and Flat View. > > > > > From bsmith at mcs.anl.gov Sat Jan 16 18:05:17 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sat, 16 Jan 2016 18:05:17 -0600 Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private() In-Reply-To: References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu> <8760yu4cyg.fsf@jedbrown.org> <8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu> <906F4402-B8CD-46A5-8858-DE1517F29E46@ad.unc.edu> <4B163265-8984-454E-B88D-6343BDCDD5E4@email.unc.edu> <258B10DE-29A8-4C89-95D1-9544748819C0@email.unc.edu> <281383E4-00B6-4911-8184-D82E04F88EF3@ad.unc.edu> Message-ID: No problem, I got it installed. I just like grumble about HPC people who use Java :-) > On Jan 16, 2016, at 5:58 PM, Bhalla, Amneet Pal S wrote: > > > >> On Jan 16, 2016, at 3:46 PM, Barry Smith wrote: >> >> Just as I feared. HPC software with bad dependencies, oh well charging ahead anyways > > Hmm... I have latest Java on my system. Can you try downloading it on a different browser (say Chrome)? Probably Safari is > trying to unzip the file itself. You need to unzip by command line as Justin suggested. > > > From amneetb at live.unc.edu Sat Jan 16 18:07:31 2016 From: amneetb at live.unc.edu (Bhalla, Amneet Pal S) Date: Sun, 17 Jan 2016 00:07:31 +0000 Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private() In-Reply-To: References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu> <8760yu4cyg.fsf@jedbrown.org> <8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu> <906F4402-B8CD-46A5-8858-DE1517F29E46@ad.unc.edu> <4B163265-8984-454E-B88D-6343BDCDD5E4@email.unc.edu> <258B10DE-29A8-4C89-95D1-9544748819C0@email.unc.edu> <281383E4-00B6-4911-8184-D82E04F88EF3@ad.unc.edu> Message-ID: <6F735708-A361-4084-A0B6-C7C7F0AB7B27@ad.unc.edu> On Jan 16, 2016, at 4:00 PM, Barry Smith > wrote: If you are using the master branch of PETSc two users gave us a nifty new profiler that is "PETSc style" but shows the hierarchy of PETSc solvers time and flop etc. I am using ?next? where Matt has pushed some code for multiplicative ASM (MSM). Is it available there too? -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sat Jan 16 18:08:34 2016 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 16 Jan 2016 18:08:34 -0600 Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private() In-Reply-To: <6F735708-A361-4084-A0B6-C7C7F0AB7B27@ad.unc.edu> References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu> <8760yu4cyg.fsf@jedbrown.org> <8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu> <906F4402-B8CD-46A5-8858-DE1517F29E46@ad.unc.edu> <4B163265-8984-454E-B88D-6343BDCDD5E4@email.unc.edu> <258B10DE-29A8-4C89-95D1-9544748819C0@email.unc.edu> <281383E4-00B6-4911-8184-D82E04F88EF3@ad.unc.edu> <6F735708-A361-4084-A0B6-C7C7F0AB7B27@ad.unc.edu> Message-ID: On Sat, Jan 16, 2016 at 6:07 PM, Bhalla, Amneet Pal S wrote: > > > On Jan 16, 2016, at 4:00 PM, Barry Smith wrote: > > If you are using the master branch of PETSc two users gave us a nifty new > profiler that is "PETSc style" but shows the hierarchy of PETSc solvers > time and flop etc. > > > I am using ?next? where Matt has pushed some code for multiplicative ASM > (MSM). Is it available there too? > That should now be in 'master'. Thanks, Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From boyceg at email.unc.edu Sat Jan 16 20:25:50 2016 From: boyceg at email.unc.edu (Griffith, Boyce Eugene) Date: Sun, 17 Jan 2016 02:25:50 +0000 Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private() In-Reply-To: References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu> <8760yu4cyg.fsf@jedbrown.org> <8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu> <906F4402-B8CD-46A5-8858-DE1517F29E46@ad.unc.edu> <4B163265-8984-454E-B88D-6343BDCDD5E4@email.unc.edu> <258B10DE-29A8-4C89-95D1-9544748819C0@email.unc.edu> <281383E4-00B6-4911-8184-D82E04F88EF3@ad.unc.edu> Message-ID: <0ABFB383-1822-4E5C-8A28-680E39032A4F@email.unc.edu> > On Jan 16, 2016, at 7:00 PM, Barry Smith wrote: > > > Ok, I looked at your results in hpcviewer and don't see any surprises. The PETSc time is in the little LU factorizations, the LU solves and the matrix-vector products as it should be. Not much can be done on speeding these except running on machines with high memory bandwidth. Looks like LU factorizations are about 25% for this particular case. Many of these little subsystems are going to be identical (many will correspond to constant coefficient Stokes), and it is fairly easy to figure out which are which. How hard would it be to modify PCASM to allow for the specification of one or more "default" KSPs that can be used for specified blocks? Of course, we'll also look into tweaking the subdomain solves --- it may not even be necessary to do exact subdomain solves to get reasonable MG performance. -- Boyce > If you are using the master branch of PETSc two users gave us a nifty new profiler that is "PETSc style" but shows the hierarchy of PETSc solvers time and flop etc. You can run with -log_view :filename.xml:ascii_xml and then open the file with a browser (for example open -f Safari filename.xml) or email the file. > > Barry > >> On Jan 16, 2016, at 5:09 PM, Bhalla, Amneet Pal S wrote: >> >> >> >>> On Jan 16, 2016, at 1:13 PM, Barry Smith wrote: >>> >>> Either way is fine so long as I don't have to install a ton of stuff; which it sounds like I won?t. >> >> http://hpctoolkit.org/download/hpcviewer/ >> >> Unzip HPCViewer for MacOSX with command line and drag the unzipped folder to Applications. You will be able to >> fire HPCViewer from LaunchPad. Point it to this attached directory. You will be able to see three different kind of profiling >> under Calling Context View, Callers View and Flat View. >> >> >> >> >> > From bsmith at mcs.anl.gov Sat Jan 16 21:46:45 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sat, 16 Jan 2016 21:46:45 -0600 Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private() In-Reply-To: <0ABFB383-1822-4E5C-8A28-680E39032A4F@email.unc.edu> References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu> <8760yu4cyg.fsf@jedbrown.org> <8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu> <906F4402-B8CD-46A5-8858-DE1517F29E46@ad.unc.edu> <4B163265-8984-454E-B88D-6343BDCDD5E4@email.unc.edu> <258B10DE-29A8-4C89-95D1-9544748819C0@email.unc.edu> <281383E4-00B6-4911-8184-D82E04F88EF3@ad.unc.edu> <0ABFB383-1822-4E5C-8A28-680E39032A4F@email.unc.edu> Message-ID: Boyce, Of course anything is possible in software. But I expect an optimization to not rebuild common submatrices/factorization requires a custom PCSetUp_ASM() rather than some PETSc option that we could add (especially if you are using Matt's PC_COMPOSITE_MULTIPLICATIVE). I would start by copying PCSetUp_ASM(), stripping out all the setup stuff that doesn't relate to your code and then mark identical domains so you don't need to call MatGetSubMatrices() on those domains and don't create a new KSP for each one of those subdomains (but reuses a common one). The PCApply_ASM() should be hopefully be reusable so long as you have created the full array of KSP objects (some of which will be common). If you increase the reference counts of the common KSP in PCSetUp_ASM() (and maybe the common sub matrices) then the PCDestroy_ASM() should also work unchanged Good luck, Barry > On Jan 16, 2016, at 8:25 PM, Griffith, Boyce Eugene wrote: > > >> On Jan 16, 2016, at 7:00 PM, Barry Smith wrote: >> >> >> Ok, I looked at your results in hpcviewer and don't see any surprises. The PETSc time is in the little LU factorizations, the LU solves and the matrix-vector products as it should be. Not much can be done on speeding these except running on machines with high memory bandwidth. > > Looks like LU factorizations are about 25% for this particular case. Many of these little subsystems are going to be identical (many will correspond to constant coefficient Stokes), and it is fairly easy to figure out which are which. How hard would it be to modify PCASM to allow for the specification of one or more "default" KSPs that can be used for specified blocks? > > Of course, we'll also look into tweaking the subdomain solves --- it may not even be necessary to do exact subdomain solves to get reasonable MG performance. > > -- Boyce > >> If you are using the master branch of PETSc two users gave us a nifty new profiler that is "PETSc style" but shows the hierarchy of PETSc solvers time and flop etc. You can run with -log_view :filename.xml:ascii_xml and then open the file with a browser (for example open -f Safari filename.xml) or email the file. >> >> Barry >> >>> On Jan 16, 2016, at 5:09 PM, Bhalla, Amneet Pal S wrote: >>> >>> >>> >>>> On Jan 16, 2016, at 1:13 PM, Barry Smith wrote: >>>> >>>> Either way is fine so long as I don't have to install a ton of stuff; which it sounds like I won?t. >>> >>> http://hpctoolkit.org/download/hpcviewer/ >>> >>> Unzip HPCViewer for MacOSX with command line and drag the unzipped folder to Applications. You will be able to >>> fire HPCViewer from LaunchPad. Point it to this attached directory. You will be able to see three different kind of profiling >>> under Calling Context View, Callers View and Flat View. >>> >>> >>> >>> >>> >> > From boyceg at email.unc.edu Sun Jan 17 10:13:15 2016 From: boyceg at email.unc.edu (Griffith, Boyce Eugene) Date: Sun, 17 Jan 2016 16:13:15 +0000 Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private() In-Reply-To: References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu> <8760yu4cyg.fsf@jedbrown.org> <8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu> <906F4402-B8CD-46A5-8858-DE1517F29E46@ad.unc.edu> <4B163265-8984-454E-B88D-6343BDCDD5E4@email.unc.edu> <258B10DE-29A8-4C89-95D1-9544748819C0@email.unc.edu> <281383E4-00B6-4911-8184-D82E04F88EF3@ad.unc.edu> <0ABFB383-1822-4E5C-8A28-680E39032A4F@email.unc.edu> Message-ID: <1A4122A1-E0D9-4E81-8636-D3C4163298A5@email.unc.edu> Barry -- Another random thought --- are these smallish direct solves things that make sense to (try to) offload to a GPU? Thanks, -- Boyce > On Jan 16, 2016, at 10:46 PM, Barry Smith wrote: > > > Boyce, > > Of course anything is possible in software. But I expect an optimization to not rebuild common submatrices/factorization requires a custom PCSetUp_ASM() rather than some PETSc option that we could add (especially if you are using Matt's PC_COMPOSITE_MULTIPLICATIVE). > > I would start by copying PCSetUp_ASM(), stripping out all the setup stuff that doesn't relate to your code and then mark identical domains so you don't need to call MatGetSubMatrices() on those domains and don't create a new KSP for each one of those subdomains (but reuses a common one). The PCApply_ASM() should be hopefully be reusable so long as you have created the full array of KSP objects (some of which will be common). If you increase the reference counts of the common KSP in > PCSetUp_ASM() (and maybe the common sub matrices) then the PCDestroy_ASM() should also work unchanged > > Good luck, > > Barry > >> On Jan 16, 2016, at 8:25 PM, Griffith, Boyce Eugene wrote: >> >> >>> On Jan 16, 2016, at 7:00 PM, Barry Smith wrote: >>> >>> >>> Ok, I looked at your results in hpcviewer and don't see any surprises. The PETSc time is in the little LU factorizations, the LU solves and the matrix-vector products as it should be. Not much can be done on speeding these except running on machines with high memory bandwidth. >> >> Looks like LU factorizations are about 25% for this particular case. Many of these little subsystems are going to be identical (many will correspond to constant coefficient Stokes), and it is fairly easy to figure out which are which. How hard would it be to modify PCASM to allow for the specification of one or more "default" KSPs that can be used for specified blocks? >> >> Of course, we'll also look into tweaking the subdomain solves --- it may not even be necessary to do exact subdomain solves to get reasonable MG performance. >> >> -- Boyce >> >>> If you are using the master branch of PETSc two users gave us a nifty new profiler that is "PETSc style" but shows the hierarchy of PETSc solvers time and flop etc. You can run with -log_view :filename.xml:ascii_xml and then open the file with a browser (for example open -f Safari filename.xml) or email the file. >>> >>> Barry >>> >>>> On Jan 16, 2016, at 5:09 PM, Bhalla, Amneet Pal S wrote: >>>> >>>> >>>> >>>>> On Jan 16, 2016, at 1:13 PM, Barry Smith wrote: >>>>> >>>>> Either way is fine so long as I don't have to install a ton of stuff; which it sounds like I won?t. >>>> >>>> http://hpctoolkit.org/download/hpcviewer/ >>>> >>>> Unzip HPCViewer for MacOSX with command line and drag the unzipped folder to Applications. You will be able to >>>> fire HPCViewer from LaunchPad. Point it to this attached directory. You will be able to see three different kind of profiling >>>> under Calling Context View, Callers View and Flat View. >>>> >>>> >>>> >>>> >>>> >>> >> > From knepley at gmail.com Sun Jan 17 12:17:21 2016 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 17 Jan 2016 12:17:21 -0600 Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private() In-Reply-To: <1A4122A1-E0D9-4E81-8636-D3C4163298A5@email.unc.edu> References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu> <8760yu4cyg.fsf@jedbrown.org> <8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu> <906F4402-B8CD-46A5-8858-DE1517F29E46@ad.unc.edu> <4B163265-8984-454E-B88D-6343BDCDD5E4@email.unc.edu> <258B10DE-29A8-4C89-95D1-9544748819C0@email.unc.edu> <281383E4-00B6-4911-8184-D82E04F88EF3@ad.unc.edu> <0ABFB383-1822-4E5C-8A28-680E39032A4F@email.unc.edu> <1A4122A1-E0D9-4E81-8636-D3C4163298A5@email.unc.edu> Message-ID: On Sun, Jan 17, 2016 at 10:13 AM, Griffith, Boyce Eugene < boyceg at email.unc.edu> wrote: > Barry -- > > Another random thought --- are these smallish direct solves things that > make sense to (try to) offload to a GPU? > Possibly, but the only clear-cut wins are for BLAS3, so we would need to stack up the identical solves. Matt > Thanks, > > -- Boyce > > > On Jan 16, 2016, at 10:46 PM, Barry Smith wrote: > > > > > > Boyce, > > > > Of course anything is possible in software. But I expect an > optimization to not rebuild common submatrices/factorization requires a > custom PCSetUp_ASM() rather than some PETSc option that we could add > (especially if you are using Matt's PC_COMPOSITE_MULTIPLICATIVE). > > > > I would start by copying PCSetUp_ASM(), stripping out all the setup > stuff that doesn't relate to your code and then mark identical domains so > you don't need to call MatGetSubMatrices() on those domains and don't > create a new KSP for each one of those subdomains (but reuses a common > one). The PCApply_ASM() should be hopefully be reusable so long as you have > created the full array of KSP objects (some of which will be common). If > you increase the reference counts of the common KSP in > > PCSetUp_ASM() (and maybe the common sub matrices) then the > PCDestroy_ASM() should also work unchanged > > > > Good luck, > > > > Barry > > > >> On Jan 16, 2016, at 8:25 PM, Griffith, Boyce Eugene < > boyceg at email.unc.edu> wrote: > >> > >> > >>> On Jan 16, 2016, at 7:00 PM, Barry Smith wrote: > >>> > >>> > >>> Ok, I looked at your results in hpcviewer and don't see any surprises. > The PETSc time is in the little LU factorizations, the LU solves and the > matrix-vector products as it should be. Not much can be done on speeding > these except running on machines with high memory bandwidth. > >> > >> Looks like LU factorizations are about 25% for this particular case. > Many of these little subsystems are going to be identical (many will > correspond to constant coefficient Stokes), and it is fairly easy to figure > out which are which. How hard would it be to modify PCASM to allow for the > specification of one or more "default" KSPs that can be used for specified > blocks? > >> > >> Of course, we'll also look into tweaking the subdomain solves --- it > may not even be necessary to do exact subdomain solves to get reasonable MG > performance. > >> > >> -- Boyce > >> > >>> If you are using the master branch of PETSc two users gave us a nifty > new profiler that is "PETSc style" but shows the hierarchy of PETSc solvers > time and flop etc. You can run with -log_view :filename.xml:ascii_xml and > then open the file with a browser (for example open -f Safari filename.xml) > or email the file. > >>> > >>> Barry > >>> > >>>> On Jan 16, 2016, at 5:09 PM, Bhalla, Amneet Pal S < > amneetb at live.unc.edu> wrote: > >>>> > >>>> > >>>> > >>>>> On Jan 16, 2016, at 1:13 PM, Barry Smith wrote: > >>>>> > >>>>> Either way is fine so long as I don't have to install a ton of > stuff; which it sounds like I won?t. > >>>> > >>>> http://hpctoolkit.org/download/hpcviewer/ > >>>> > >>>> Unzip HPCViewer for MacOSX with command line and drag the unzipped > folder to Applications. You will be able to > >>>> fire HPCViewer from LaunchPad. Point it to this attached directory. > You will be able to see three different kind of profiling > >>>> under Calling Context View, Callers View and Flat View. > >>>> > >>>> > >>>> > >>>> > >>>> > >>> > >> > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From hgbk2008 at gmail.com Mon Jan 18 08:29:30 2016 From: hgbk2008 at gmail.com (Hoang Giang Bui) Date: Mon, 18 Jan 2016 15:29:30 +0100 Subject: [petsc-users] Why use MATMPIBAIJ? In-Reply-To: References: <2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov> <87fuy07zvi.fsf@jedbrown.org> Message-ID: On Thu, Jan 14, 2016 at 8:08 PM, Barry Smith wrote: > > > On Jan 14, 2016, at 12:57 PM, Jed Brown wrote: > > > > Hoang Giang Bui writes: > >> One more question I like to ask, which is more on the performance of the > >> solver. That if I have a coupled problem, says the point block is [u_x > u_y > >> u_z p] in which entries of p block in stiffness matrix is in a much > smaller > >> scale than u (p~1e-6, u~1e+8), then AMG with hypre in PETSc still scale? > > > > You should scale the model (as Barry says). But the names of your > > variables suggest that the system is a saddle point problem, in which > > case there's a good chance AMG won't work at all. For example, > > BoomerAMG produces a singular preconditioner in similar contexts, such > > that the preconditioned residual drops smoothly while the true residual > > stagnates (the equations are not solved at all). So be vary careful if > > you think it's "working". > > Using block size 4 with the scaling, the hypre AMG does not converge. So it's somehow right. > The PCFIEDSPLIT preconditioner is designed for helping to solve saddle > point problems. > > > Does PCFIELDSPLIT support variable block size? For example using P2/P1 discretization, the number of nodes carrying [u_x u_y u_z] is different with number of nodes carrying p. PCFieldSplitSetBlockSize would not be correct in this case. Giang -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Jan 18 08:58:10 2016 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 18 Jan 2016 08:58:10 -0600 Subject: [petsc-users] Why use MATMPIBAIJ? In-Reply-To: References: <2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov> <87fuy07zvi.fsf@jedbrown.org> Message-ID: On Mon, Jan 18, 2016 at 8:29 AM, Hoang Giang Bui wrote: > > > On Thu, Jan 14, 2016 at 8:08 PM, Barry Smith wrote: > >> >> > On Jan 14, 2016, at 12:57 PM, Jed Brown wrote: >> > >> > Hoang Giang Bui writes: >> >> One more question I like to ask, which is more on the performance of >> the >> >> solver. That if I have a coupled problem, says the point block is [u_x >> u_y >> >> u_z p] in which entries of p block in stiffness matrix is in a much >> smaller >> >> scale than u (p~1e-6, u~1e+8), then AMG with hypre in PETSc still >> scale? >> > >> > You should scale the model (as Barry says). But the names of your >> > variables suggest that the system is a saddle point problem, in which >> > case there's a good chance AMG won't work at all. For example, >> > BoomerAMG produces a singular preconditioner in similar contexts, such >> > that the preconditioned residual drops smoothly while the true residual >> > stagnates (the equations are not solved at all). So be vary careful if >> > you think it's "working". >> >> > > Using block size 4 with the scaling, the hypre AMG does not converge. So > it's somehow right. > > > >> The PCFIEDSPLIT preconditioner is designed for helping to solve saddle >> point problems. >> >> >> > > Does PCFIELDSPLIT support variable block size? For example using P2/P1 > discretization, the number of nodes carrying [u_x u_y u_z] is different > with number of nodes carrying p. PCFieldSplitSetBlockSize would not be > correct in this case. > You misunderstand the blocking. You would put ALL velocities (P2) in one block and ALL pressure (P1) in another. The PCFieldSplitSetBlockSize() call is for co-located discretizations, which P2/P2 is not. Matt > Giang > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From hgbk2008 at gmail.com Mon Jan 18 09:42:00 2016 From: hgbk2008 at gmail.com (Hoang Giang Bui) Date: Mon, 18 Jan 2016 16:42:00 +0100 Subject: [petsc-users] Why use MATMPIBAIJ? In-Reply-To: References: <2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov> <87fuy07zvi.fsf@jedbrown.org> Message-ID: Why P2/P2 is not for co-located discretization? However, it's not my question. The P2/P1 which I used generate variable block size at each node. That was fine if I used PCFieldSplitSetIS for each components, displacements and pressures. But how to set the block size (3) for displacement block? Giang On Mon, Jan 18, 2016 at 3:58 PM, Matthew Knepley wrote: > On Mon, Jan 18, 2016 at 8:29 AM, Hoang Giang Bui > wrote: > >> >> >> On Thu, Jan 14, 2016 at 8:08 PM, Barry Smith wrote: >> >>> >>> > On Jan 14, 2016, at 12:57 PM, Jed Brown wrote: >>> > >>> > Hoang Giang Bui writes: >>> >> One more question I like to ask, which is more on the performance of >>> the >>> >> solver. That if I have a coupled problem, says the point block is >>> [u_x u_y >>> >> u_z p] in which entries of p block in stiffness matrix is in a much >>> smaller >>> >> scale than u (p~1e-6, u~1e+8), then AMG with hypre in PETSc still >>> scale? >>> > >>> > You should scale the model (as Barry says). But the names of your >>> > variables suggest that the system is a saddle point problem, in which >>> > case there's a good chance AMG won't work at all. For example, >>> > BoomerAMG produces a singular preconditioner in similar contexts, such >>> > that the preconditioned residual drops smoothly while the true residual >>> > stagnates (the equations are not solved at all). So be vary careful if >>> > you think it's "working". >>> >>> >> >> Using block size 4 with the scaling, the hypre AMG does not converge. So >> it's somehow right. >> >> >> >>> The PCFIEDSPLIT preconditioner is designed for helping to solve >>> saddle point problems. >>> >>> >>> >> >> Does PCFIELDSPLIT support variable block size? For example using P2/P1 >> discretization, the number of nodes carrying [u_x u_y u_z] is different >> with number of nodes carrying p. PCFieldSplitSetBlockSize would not be >> correct in this case. >> > > You misunderstand the blocking. You would put ALL velocities (P2) in one > block and ALL pressure (P1) in another. > The PCFieldSplitSetBlockSize() call is for co-located discretizations, > which P2/P2 is not. > > Matt > > >> Giang >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Jan 18 09:54:56 2016 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 18 Jan 2016 09:54:56 -0600 Subject: [petsc-users] Why use MATMPIBAIJ? In-Reply-To: References: <2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov> <87fuy07zvi.fsf@jedbrown.org> Message-ID: On Mon, Jan 18, 2016 at 9:42 AM, Hoang Giang Bui wrote: > Why P2/P2 is not for co-located discretization? However, it's not my > question. The P2/P1 which I used generate variable block size at each node. > That was fine if I used PCFieldSplitSetIS for each components, > displacements and pressures. But how to set the block size (3) for > displacement block? > P2/P1 does not generate block matrices, and is not col-located, because the variables are located at different sets of nodes. You can use PCFieldSplitSetIS() to specify the splits. This is the right method for P2/P1. Setting the block size for the P2 block is not crucial. When its working we can do that. Matt > Giang > > On Mon, Jan 18, 2016 at 3:58 PM, Matthew Knepley > wrote: > >> On Mon, Jan 18, 2016 at 8:29 AM, Hoang Giang Bui >> wrote: >> >>> >>> >>> On Thu, Jan 14, 2016 at 8:08 PM, Barry Smith wrote: >>> >>>> >>>> > On Jan 14, 2016, at 12:57 PM, Jed Brown wrote: >>>> > >>>> > Hoang Giang Bui writes: >>>> >> One more question I like to ask, which is more on the performance of >>>> the >>>> >> solver. That if I have a coupled problem, says the point block is >>>> [u_x u_y >>>> >> u_z p] in which entries of p block in stiffness matrix is in a much >>>> smaller >>>> >> scale than u (p~1e-6, u~1e+8), then AMG with hypre in PETSc still >>>> scale? >>>> > >>>> > You should scale the model (as Barry says). But the names of your >>>> > variables suggest that the system is a saddle point problem, in which >>>> > case there's a good chance AMG won't work at all. For example, >>>> > BoomerAMG produces a singular preconditioner in similar contexts, such >>>> > that the preconditioned residual drops smoothly while the true >>>> residual >>>> > stagnates (the equations are not solved at all). So be vary careful >>>> if >>>> > you think it's "working". >>>> >>>> >>> >>> Using block size 4 with the scaling, the hypre AMG does not converge. So >>> it's somehow right. >>> >>> >>> >>>> The PCFIEDSPLIT preconditioner is designed for helping to solve >>>> saddle point problems. >>>> >>>> >>>> >>> >>> Does PCFIELDSPLIT support variable block size? For example using P2/P1 >>> discretization, the number of nodes carrying [u_x u_y u_z] is different >>> with number of nodes carrying p. PCFieldSplitSetBlockSize would not be >>> correct in this case. >>> >> >> You misunderstand the blocking. You would put ALL velocities (P2) in one >> block and ALL pressure (P1) in another. >> The PCFieldSplitSetBlockSize() call is for co-located discretizations, >> which P2/P2 is not. >> >> Matt >> >> >>> Giang >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Mon Jan 18 10:25:42 2016 From: jed at jedbrown.org (Jed Brown) Date: Mon, 18 Jan 2016 09:25:42 -0700 Subject: [petsc-users] Why use MATMPIBAIJ? In-Reply-To: References: <2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov> <87fuy07zvi.fsf@jedbrown.org> Message-ID: <87si1ug8hl.fsf@jedbrown.org> Hoang Giang Bui writes: > Why P2/P2 is not for co-located discretization? Matt typed "P2/P2" when me meant "P2/P1". -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From tabrezali at gmail.com Tue Jan 19 17:07:12 2016 From: tabrezali at gmail.com (Tabrez Ali) Date: Tue, 19 Jan 2016 17:07:12 -0600 Subject: [petsc-users] external packages Message-ID: <569EC1A0.3020603@gmail.com> Hello W.r.t. to external packages, does "--download-xyz=yes" implicitly means "--with-xyz=1" Also, is "--download-xyz=yes" exactly same as "--download-xyz" Regards, Tabrez From jed at jedbrown.org Tue Jan 19 17:10:56 2016 From: jed at jedbrown.org (Jed Brown) Date: Tue, 19 Jan 2016 16:10:56 -0700 Subject: [petsc-users] external packages In-Reply-To: <569EC1A0.3020603@gmail.com> References: <569EC1A0.3020603@gmail.com> Message-ID: <8737ttdv27.fsf@jedbrown.org> Tabrez Ali writes: > Hello > > W.r.t. to external packages, does "--download-xyz=yes" implicitly means > "--with-xyz=1" Yes. > Also, is "--download-xyz=yes" exactly same as "--download-xyz" Yes, though --download-xyz=/path/to/xyz.tar.gz has semantic meaning. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From salazardetro1 at llnl.gov Wed Jan 20 11:35:29 2016 From: salazardetro1 at llnl.gov (Salazar De Troya, Miguel) Date: Wed, 20 Jan 2016 17:35:29 +0000 Subject: [petsc-users] Preconditioners for KSPSolveTranspose() in linear elasticity Message-ID: Hello I am trying to speed up a two dimensional linear elasticity problem with isotropic and heterogeneous properties. It is a topology optimization problem, therefore some regions have an almost zero stiffness whereas other regions have a higher value, making the matrix ill-conditioned. So far, from having searched mail lists on similar problems, I have come up with the following CL options to pass to the petsc solver (two dimensional problem): -ksp_type cg -pc_type fieldsplit -pc_fieldsplit_block_size 2 -fieldsplit_pc_type hypre -fieldsplit_pc_hypre_type boomeramg -fieldsplit_pc_hypre_boomeramg_strong_threshold 0.7 -pc_fieldsplit_0 0,1 -pc_fieldsplit_type symmetric_multiplicative -ksp_atol 1e-10 It works reasonably well and shows similar number of iterations for different levels of refinement. However, it does not converge when I use the same options for KSPSolveTranspose(). I obtain DIVERGED_INDEFINITE_PC after three iterations. I believe this has to do with the field split, but I do not where to start. I am using libMesh which interfaces with petsc through the file petsc_linear solver.C (http://libmesh.github.io/doxygen/classlibMesh_1_1PetscLinearSolver.html#a4e66cc138b52e80e93a75e55315245ee) The KSPSolveTranspose() is called in adjoint_solve(). Changing that to KSPSolve() solves the issue and to me it is not a problem because my matrix is symmetric, but I don?t want to have to change it in the libMesh source code. So the question is, why do those CL options not work for the KSPSolveTranspose() despite having a symmetric matrix? Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Jan 20 11:47:59 2016 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 20 Jan 2016 11:47:59 -0600 Subject: [petsc-users] Preconditioners for KSPSolveTranspose() in linear elasticity In-Reply-To: References: Message-ID: On Wed, Jan 20, 2016 at 11:35 AM, Salazar De Troya, Miguel < salazardetro1 at llnl.gov> wrote: > Hello > > I am trying to speed up a two dimensional linear elasticity problem with > isotropic and heterogeneous properties. It is a topology optimization > problem, therefore some regions have an almost zero stiffness whereas other > regions have a higher value, making the matrix ill-conditioned. So far, > from having searched mail lists on similar problems, I have come up with > the following CL options to pass to the petsc solver (two dimensional > problem): > > -ksp_type cg -pc_type fieldsplit -pc_fieldsplit_block_size 2 > -fieldsplit_pc_type hypre -fieldsplit_pc_hypre_type boomeramg > -fieldsplit_pc_hypre_boomeramg_strong_threshold 0.7 -pc_fieldsplit_0 0,1 > -pc_fieldsplit_type symmetric_multiplicative -ksp_atol 1e-10 > > It works reasonably well and shows similar number of iterations for > different levels of refinement. However, it does not converge when I use > the same options for KSPSolveTranspose(). I obtain DIVERGED_INDEFINITE_PC > after three iterations. I believe this has to do with the field split, > but I do not where to start. I am using libMesh which interfaces with petsc > through the file petsc_linear solver.C ( > http://libmesh.github.io/doxygen/classlibMesh_1_1PetscLinearSolver.html#a4e66cc138b52e80e93a75e55315245ee) > The KSPSolveTranspose() is called in adjoint_solve(). Changing that to > KSPSolve() solves the issue and to me it is not a problem because my matrix > is symmetric, but I don?t want to have to change it in the libMesh source > code. So the question is, why do those CL options not work for the > KSPSolveTranspose() despite having a symmetric matrix? > 1) Are you sure the matrix itself is symmetric? It could have boundary conditions that break this symmetry. 2) This sounds like a bug in PCApplyTranspose_FieldSplit() since I am almost certain it is not tested. 3) Can you send the matrix and rhs? This should be easy by using MatView() for a binary viewer. Thanks, Matt > Thanks > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From salazardetro1 at llnl.gov Wed Jan 20 12:22:19 2016 From: salazardetro1 at llnl.gov (Salazar De Troya, Miguel) Date: Wed, 20 Jan 2016 18:22:19 +0000 Subject: [petsc-users] Preconditioners for KSPSolveTranspose() in linear elasticity In-Reply-To: References: Message-ID: I am not 100% confident because I am not sure how libMesh handles the Dirichlet boundary conditions, I checked it with MatIsSymmetric() and obtained 1 as the response. Please find attached the binary files (I set up the viewer with PetscViewerSetFormat(viewer, PETSC_VIEWER_DEFAULT); hope that?s the right way) Thanks From: Matthew Knepley > Date: Wednesday, January 20, 2016 at 9:47 AM To: Miguel Salazar > Cc: "petsc-users at mcs.anl.gov" > Subject: Re: [petsc-users] Preconditioners for KSPSolveTranspose() in linear elasticity On Wed, Jan 20, 2016 at 11:35 AM, Salazar De Troya, Miguel > wrote: Hello I am trying to speed up a two dimensional linear elasticity problem with isotropic and heterogeneous properties. It is a topology optimization problem, therefore some regions have an almost zero stiffness whereas other regions have a higher value, making the matrix ill-conditioned. So far, from having searched mail lists on similar problems, I have come up with the following CL options to pass to the petsc solver (two dimensional problem): -ksp_type cg -pc_type fieldsplit -pc_fieldsplit_block_size 2 -fieldsplit_pc_type hypre -fieldsplit_pc_hypre_type boomeramg -fieldsplit_pc_hypre_boomeramg_strong_threshold 0.7 -pc_fieldsplit_0 0,1 -pc_fieldsplit_type symmetric_multiplicative -ksp_atol 1e-10 It works reasonably well and shows similar number of iterations for different levels of refinement. However, it does not converge when I use the same options for KSPSolveTranspose(). I obtain DIVERGED_INDEFINITE_PC after three iterations. I believe this has to do with the field split, but I do not where to start. I am using libMesh which interfaces with petsc through the file petsc_linear solver.C (http://libmesh.github.io/doxygen/classlibMesh_1_1PetscLinearSolver.html#a4e66cc138b52e80e93a75e55315245ee) The KSPSolveTranspose() is called in adjoint_solve(). Changing that to KSPSolve() solves the issue and to me it is not a problem because my matrix is symmetric, but I don?t want to have to change it in the libMesh source code. So the question is, why do those CL options not work for the KSPSolveTranspose() despite having a symmetric matrix? 1) Are you sure the matrix itself is symmetric? It could have boundary conditions that break this symmetry. 2) This sounds like a bug in PCApplyTranspose_FieldSplit() since I am almost certain it is not tested. 3) Can you send the matrix and rhs? This should be easy by using MatView() for a binary viewer. Thanks, Matt Thanks -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: rhs_vector Type: application/octet-stream Size: 105624 bytes Desc: rhs_vector URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: rhs_vector.info Type: application/octet-stream Size: 22 bytes Desc: rhs_vector.info URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: stiffness_matrix Type: application/octet-stream Size: 2846472 bytes Desc: stiffness_matrix URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: stiffness_matrix.info Type: application/octet-stream Size: 22 bytes Desc: stiffness_matrix.info URL: From jed at jedbrown.org Wed Jan 20 18:36:09 2016 From: jed at jedbrown.org (Jed Brown) Date: Wed, 20 Jan 2016 17:36:09 -0700 Subject: [petsc-users] [petsc-maint] Preconditioners for KSPSolveTranspose() in linear elasticity In-Reply-To: References: Message-ID: <87lh7jbwg6.fsf@jedbrown.org> "Salazar De Troya, Miguel" writes: > I am not 100% confident because I am not sure how libMesh handles the > Dirichlet boundary conditions, How are you using libmesh? Normally you write the boundary conditions. Many of the examples use penalty conditions, which maintain symmetric but have other problems. > I am trying to speed up a two dimensional linear elasticity problem with isotropic and heterogeneous properties. It is a topology optimization problem, therefore some regions have an almost zero stiffness whereas other regions have a higher value, making the matrix ill-conditioned. So far, from having searched mail lists on similar problems, I have come up with the following CL options to pass to the petsc solver (two dimensional problem): > > -ksp_type cg -pc_type fieldsplit -pc_fieldsplit_block_size 2 -fieldsplit_pc_type hypre -fieldsplit_pc_hypre_type boomeramg -fieldsplit_pc_hypre_boomeramg_strong_threshold 0.7 -pc_fieldsplit_0 0,1 This option looks funny. What are you trying to do here? > -pc_fieldsplit_type symmetric_multiplicative -ksp_atol 1e-10 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From salazardetro1 at llnl.gov Thu Jan 21 10:17:11 2016 From: salazardetro1 at llnl.gov (Salazar De Troya, Miguel) Date: Thu, 21 Jan 2016 16:17:11 +0000 Subject: [petsc-users] [petsc-maint] Preconditioners for KSPSolveTranspose() in linear elasticity In-Reply-To: <87lh7jbwg6.fsf@jedbrown.org> References: <87lh7jbwg6.fsf@jedbrown.org> Message-ID: I write the boundary conditions using their DirichletBoundary class, not the penalty term. The options I?m using are ones that I found in the libMesh mail list from a user who suggested them for elasticity problems. The idea he mentioned was to use field split to separate each field of the displacement vector solution. I honestly do not know the role of -pc_fieldsplit_type symmetric_multiplicative, but it was working for me. On 1/20/16, 4:36 PM, "Jed Brown" wrote: >"Salazar De Troya, Miguel" writes: > >> I am not 100% confident because I am not sure how libMesh handles the >> Dirichlet boundary conditions, > >How are you using libmesh? Normally you write the boundary conditions. >Many of the examples use penalty conditions, which maintain symmetric >but have other problems. > >> I am trying to speed up a two dimensional linear elasticity problem >>with isotropic and heterogeneous properties. It is a topology >>optimization problem, therefore some regions have an almost zero >>stiffness whereas other regions have a higher value, making the matrix >>ill-conditioned. So far, from having searched mail lists on similar >>problems, I have come up with the following CL options to pass to the >>petsc solver (two dimensional problem): >> >> -ksp_type cg -pc_type fieldsplit -pc_fieldsplit_block_size 2 >>-fieldsplit_pc_type hypre -fieldsplit_pc_hypre_type boomeramg >>-fieldsplit_pc_hypre_boomeramg_strong_threshold 0.7 -pc_fieldsplit_0 0,1 > >This option looks funny. What are you trying to do here? > >> -pc_fieldsplit_type symmetric_multiplicative -ksp_atol 1e-10 From ptbauman at gmail.com Thu Jan 21 10:32:11 2016 From: ptbauman at gmail.com (Paul T. Bauman) Date: Thu, 21 Jan 2016 11:32:11 -0500 Subject: [petsc-users] 3.5 -> 3.6 Change in MatOrdering Behavior Message-ID: Greetings, We have a test that has started failing upon switching from 3.5.4 to 3.6.0 (actually went straight to 3.6.3 but checked this is repeatable with 3.6.0). I've attached the matrix generated with -mat_view binary and a small PETSc program that runs in serial that reproduces the behavior by loading the matrix and solving a linear system (RHS doesn't matter here). For context, this matrix is the Jacobian of a Taylor-Hood approximation of the axisymmetric incompressible Navier-Stokes equations for flow between concentric cylinders (for which there is an exact solution). The matrix is for a two element case, hopefully small enough for debugging. Using the following command line options with the test program works with PETSc 3.5.4 and gives a NAN residual with PETSc 3.6.0: PETSC_OPTIONS="-pc_type asm -pc_asm_overlap 12 -sub_pc_type ilu -sub_pc_factor_mat_ordering_type 1wd -sub_pc_factor_levels 4" If I remove the mat ordering option, all is well again in PETSc 3.6.x: PETSC_OPTIONS="-pc_type asm -pc_asm_overlap 12 -sub_pc_type ilu -sub_pc_factor_levels 4" Those options are nothing special. They were arrived at through trial/error to get decent behavior for the solver on up to 4 processors to keep the time to something reasonable for the test suite without getting really fancy. Specifically, we'd noticed this mat ordering on some problems in the test suite behaved noticeably better. As always, thanks for your time. Best, Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test.mat Type: application/octet-stream Size: 33024 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test.mat.info Type: application/octet-stream Size: 65 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test_mat.c Type: text/x-csrc Size: 1190 bytes Desc: not available URL: From hzhang at mcs.anl.gov Thu Jan 21 11:16:54 2016 From: hzhang at mcs.anl.gov (Hong) Date: Thu, 21 Jan 2016 11:16:54 -0600 Subject: [petsc-users] 3.5 -> 3.6 Change in MatOrdering Behavior In-Reply-To: References: Message-ID: Paul : Using petsc-dev (we recently added feature for better displaying convergence behavior), I found that '-sub_pc_factor_mat_ordering_type 1wd' causes zero pivot: ./ex10 -f0 test.mat -rhs 0 -pc_type asm -pc_asm_overlap 12 -sub_pc_type ilu -sub_pc_factor_mat_ordering_type 1wd -sub_pc_factor_levels 4 -ksp_converged_reason Linear solve did not converge due to DIVERGED_PCSETUP_FAILED iterations 0 PCSETUP_FAILED due to SUBPC_ERROR Number of iterations = 0 adding option '-info |grep zero' [0] MatPivotCheck_none(): Detected zero pivot in factorization in row 0 value 0. tolerance 2.22045e-14 or '-ksp_error_if_not_converged' [0]PETSC ERROR: Zero pivot in LU factorization: http://www.mcs.anl.gov/petsc/documentation/faq.html#zeropivot [0]PETSC ERROR: Zero pivot row 0 value 0. tolerance 2.22045e-14 '-sub_pc_factor_mat_ordering_type natural' avoids it: ./ex10 -f0 test.mat -rhs 0 -pc_type asm -pc_asm_overlap 12 -sub_pc_type ilu -sub_pc_factor_mat_ordering_type natural -sub_pc_factor_levels 4 -ksp_converged_reason Linear solve converged due to CONVERGED_RTOL iterations 1 Number of iterations = 1 Residual norm < 1.e-12 Hong Greetings, > > We have a test that has started failing upon switching from 3.5.4 to 3.6.0 > (actually went straight to 3.6.3 but checked this is repeatable with > 3.6.0). I've attached the matrix generated with -mat_view binary and a > small PETSc program that runs in serial that reproduces the behavior by > loading the matrix and solving a linear system (RHS doesn't matter here). > For context, this matrix is the Jacobian of a Taylor-Hood approximation of > the axisymmetric incompressible Navier-Stokes equations for flow between > concentric cylinders (for which there is an exact solution). The matrix is > for a two element case, hopefully small enough for debugging. > > Using the following command line options with the test program works with > PETSc 3.5.4 and gives a NAN residual with PETSc 3.6.0: > > PETSC_OPTIONS="-pc_type asm -pc_asm_overlap 12 -sub_pc_type ilu > -sub_pc_factor_mat_ordering_type 1wd -sub_pc_factor_levels 4" > > If I remove the mat ordering option, all is well again in PETSc 3.6.x: > > PETSC_OPTIONS="-pc_type asm -pc_asm_overlap 12 -sub_pc_type ilu > -sub_pc_factor_levels 4" > > Those options are nothing special. They were arrived at through > trial/error to get decent behavior for the solver on up to 4 processors to > keep the time to something reasonable for the test suite without getting > really fancy. Specifically, we'd noticed this mat ordering on some problems > in the test suite behaved noticeably better. > > As always, thanks for your time. > > Best, > > Paul > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ptbauman at gmail.com Thu Jan 21 11:34:17 2016 From: ptbauman at gmail.com (Paul T. Bauman) Date: Thu, 21 Jan 2016 12:34:17 -0500 Subject: [petsc-users] 3.5 -> 3.6 Change in MatOrdering Behavior In-Reply-To: References: Message-ID: Thanks Hong, On Thu, Jan 21, 2016 at 12:16 PM, Hong wrote: > Paul : > Using petsc-dev (we recently added feature for better displaying > convergence behavior), > OK, good to know, thanks. > I found that '-sub_pc_factor_mat_ordering_type 1wd' causes zero pivot: > I figured it was something along these lines. So, just so I'm clear, likely this zero pivot was always there with this mat ordering (i.e. no mat ordering bits actually changed between 3.5 and 3.6) and this is a reflection of increased consistency checking in the newer PETSc? Thanks much, Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Thu Jan 21 11:51:38 2016 From: hzhang at mcs.anl.gov (Hong) Date: Thu, 21 Jan 2016 11:51:38 -0600 Subject: [petsc-users] 3.5 -> 3.6 Change in MatOrdering Behavior In-Reply-To: References: Message-ID: Paul: It might be caused by our changes in default shift strategy. We previously used '-pc_factor_shift_type NONZERO' for ilu, then changed to '-pc_factor_shift_type NONE'. For your test, I get ./ex10 -f0 test.mat -rhs 0 -pc_type asm -pc_asm_overlap 12 -sub_pc_type ilu -sub_pc_factor_mat_ordering_type 1wd -sub_pc_factor_levels 4 -ksp_converged_reason -sub_pc_factor_shift_type NONZERO Linear solve converged due to CONVERGED_RTOL iterations 2 Number of iterations = 2 Residual norm 0.0116896 with '-sub_pc_factor_shift_type INBLOCKS': Linear solve converged due to CONVERGED_RTOL iterations 2 Number of iterations = 2 Residual norm 0.00603736 I guess your previous run might use one of these options. Hong Thanks Hong, > > On Thu, Jan 21, 2016 at 12:16 PM, Hong wrote: > >> Paul : >> Using petsc-dev (we recently added feature for better displaying >> convergence behavior), >> > > OK, good to know, thanks. > > >> I found that '-sub_pc_factor_mat_ordering_type 1wd' causes zero pivot: >> > > I figured it was something along these lines. So, just so I'm clear, > likely this zero pivot was always there with this mat ordering (i.e. no mat > ordering bits actually changed between 3.5 and 3.6) and this is a > reflection of increased consistency checking in the newer PETSc? > > Thanks much, > > Paul > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ptbauman at gmail.com Thu Jan 21 11:59:55 2016 From: ptbauman at gmail.com (Paul T. Bauman) Date: Thu, 21 Jan 2016 12:59:55 -0500 Subject: [petsc-users] 3.5 -> 3.6 Change in MatOrdering Behavior In-Reply-To: References: Message-ID: On Thu, Jan 21, 2016 at 12:51 PM, Hong wrote: > Paul: > It might be caused by our changes in default shift strategy. > We previously used '-pc_factor_shift_type NONZERO' for ilu, then changed > to '-pc_factor_shift_type NONE'. > For your test, I get > ./ex10 -f0 test.mat -rhs 0 -pc_type asm -pc_asm_overlap 12 -sub_pc_type > ilu -sub_pc_factor_mat_ordering_type 1wd -sub_pc_factor_levels 4 > -ksp_converged_reason -sub_pc_factor_shift_type NONZERO > Ah! That was the change. Confirmed, adding -sub_pc_factor_shift_type nonzero gets our test passing again with those options. Mystery solved. Thank you very much. Best, Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From wen.zhao at outlook.fr Thu Jan 21 17:11:03 2016 From: wen.zhao at outlook.fr (wen zhao) Date: Fri, 22 Jan 2016 00:11:03 +0100 Subject: [petsc-users] addition of two matrix Message-ID: Hello, I want to add to matrix, but i haven't found a function which can do this operation. Is there existe a kind of operation can do C = A + B Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.mayhem23 at gmail.com Thu Jan 21 17:43:11 2016 From: dave.mayhem23 at gmail.com (Dave May) Date: Fri, 22 Jan 2016 00:43:11 +0100 Subject: [petsc-users] addition of two matrix In-Reply-To: References: Message-ID: Try this http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatAXPY.html On 22 January 2016 at 00:11, wen zhao wrote: > Hello, > > I want to add to matrix, but i haven't found a function which can do this > operation. Is there existe a kind of operation can do C = A + B > > Thanks > -------------- next part -------------- An HTML attachment was scrubbed... URL: From amneetb at live.unc.edu Thu Jan 21 18:52:56 2016 From: amneetb at live.unc.edu (Bhalla, Amneet Pal S) Date: Fri, 22 Jan 2016 00:52:56 +0000 Subject: [petsc-users] Runtime for ILU(k) vs. LU Message-ID: <2A1B4293-6A96-4533-8405-85DF4D955880@ad.unc.edu> Hi Folks, Is there a general rule for runtime of ILU(k) vs. LU for some higher level k? In other words, after what value of 'k' one would be better off using LU in the preconditioner than ILU(k). Thanks, --Amneet From bsmith at mcs.anl.gov Thu Jan 21 19:03:39 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 21 Jan 2016 19:03:39 -0600 Subject: [petsc-users] Runtime for ILU(k) vs. LU In-Reply-To: <2A1B4293-6A96-4533-8405-85DF4D955880@ad.unc.edu> References: <2A1B4293-6A96-4533-8405-85DF4D955880@ad.unc.edu> Message-ID: <64BB108C-E291-40C1-83D8-F136710C2B5F@mcs.anl.gov> If ILU(0, 1, or 2) doesn't work well then ILU( n > 2) generally doesn't work well. > On Jan 21, 2016, at 6:52 PM, Bhalla, Amneet Pal S wrote: > > > Hi Folks, > > Is there a general rule for runtime of ILU(k) vs. LU for some higher level k? In other words, after what value of 'k' one > would be better off using LU in the preconditioner than ILU(k). > > Thanks, > --Amneet From jed at jedbrown.org Thu Jan 21 21:48:08 2016 From: jed at jedbrown.org (Jed Brown) Date: Thu, 21 Jan 2016 19:48:08 -0800 Subject: [petsc-users] Preconditioners for KSPSolveTranspose() in linear elasticity In-Reply-To: References: <87lh7jbwg6.fsf@jedbrown.org> Message-ID: <87k2n28ebr.fsf@jedbrown.org> "Salazar De Troya, Miguel" writes: > I write the boundary conditions using their DirichletBoundary class, not > the penalty term. > The options I?m using are ones that I found in the libMesh mail list > from a user who suggested them for elasticity problems. The idea he > mentioned was to use field split to separate each field of the > displacement vector solution. I honestly do not know the role of > -pc_fieldsplit_type symmetric_multiplicative, but it was working for > me. I would use GAMG or ML (without fieldsplit; set a near null space) instead of Hypre, because the algorithm is usually better for elasticity. Fieldsplit can work for this, but the performance degrades for higher Poisson ratio. In any case, "-pc_fieldsplit_0 0,1" is ignored (I think) and "-pc_fieldsplit_0_fields 0,1" is likely not what you want. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From hgbk2008 at gmail.com Fri Jan 22 03:40:06 2016 From: hgbk2008 at gmail.com (Hoang Giang Bui) Date: Fri, 22 Jan 2016 10:40:06 +0100 Subject: [petsc-users] Why use MATMPIBAIJ? In-Reply-To: <87si1ug8hl.fsf@jedbrown.org> References: <2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov> <87fuy07zvi.fsf@jedbrown.org> <87si1ug8hl.fsf@jedbrown.org> Message-ID: Hi Matt I would rather like to set the block size for block P2 too. Why? Because in one of my test (for problem involves only [u_x u_y u_z]), the gmres + Hypre AMG converges in 50 steps with block size 3, whereby it increases to 140 if block size is 1 (see attached files). This gives me the impression that AMG will give better inversion for "P2" block if I can set its block size to 3. Of course it's still an hypothesis but worth to try. Another question: In one of the Petsc presentation, you said the Hypre AMG does not scale well, because set up cost amortize the iterations. How is it quantified? and what is the memory overhead? Giang On Mon, Jan 18, 2016 at 5:25 PM, Jed Brown wrote: > Hoang Giang Bui writes: > > > Why P2/P2 is not for co-located discretization? > > Matt typed "P2/P2" when me meant "P2/P1". > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- Mat BlockSize: 1 0 KSP preconditioned resid norm 1.911887586816e+01 true resid norm 1.379276869721e+08 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.766603264917e+00 true resid norm 3.321478370642e+08 ||r(i)||/||b|| 2.408130262718e+00 2 KSP preconditioned resid norm 1.341257182306e+00 true resid norm 2.284905925636e+08 ||r(i)||/||b|| 1.656597000788e+00 3 KSP preconditioned resid norm 8.396573229986e-01 true resid norm 2.270539153896e+08 ||r(i)||/||b|| 1.646180838482e+00 4 KSP preconditioned resid norm 7.000469584372e-01 true resid norm 8.924328074922e+07 ||r(i)||/||b|| 6.470294884832e-01 5 KSP preconditioned resid norm 5.856685616746e-01 true resid norm 1.504438024413e+08 ||r(i)||/||b|| 1.090744039460e+00 6 KSP preconditioned resid norm 5.206871112424e-01 true resid norm 7.058092385951e+07 ||r(i)||/||b|| 5.117241172455e-01 7 KSP preconditioned resid norm 4.410181993512e-01 true resid norm 1.051980263461e+08 ||r(i)||/||b|| 7.627042014222e-01 8 KSP preconditioned resid norm 3.881051205583e-01 true resid norm 4.982159043203e+07 ||r(i)||/||b|| 3.612152971297e-01 9 KSP preconditioned resid norm 3.512930248959e-01 true resid norm 6.067212598629e+07 ||r(i)||/||b|| 4.398835891344e-01 10 KSP preconditioned resid norm 3.271448340295e-01 true resid norm 3.424131685069e+07 ||r(i)||/||b|| 2.482555721943e-01 11 KSP preconditioned resid norm 3.011320838979e-01 true resid norm 5.174670019159e+07 ||r(i)||/||b|| 3.751726816246e-01 12 KSP preconditioned resid norm 2.805100118127e-01 true resid norm 2.781535129664e+07 ||r(i)||/||b|| 2.016661912286e-01 13 KSP preconditioned resid norm 2.614209581950e-01 true resid norm 3.994710222180e+07 ||r(i)||/||b|| 2.896235201123e-01 14 KSP preconditioned resid norm 2.453473435748e-01 true resid norm 2.418477634219e+07 ||r(i)||/||b|| 1.753438839809e-01 15 KSP preconditioned resid norm 2.289681095906e-01 true resid norm 3.486085091310e+07 ||r(i)||/||b|| 2.527473031586e-01 16 KSP preconditioned resid norm 2.144691019898e-01 true resid norm 2.104834554929e+07 ||r(i)||/||b|| 1.526042088530e-01 17 KSP preconditioned resid norm 1.966530290799e-01 true resid norm 3.221759403482e+07 ||r(i)||/||b|| 2.335832256894e-01 18 KSP preconditioned resid norm 1.787680477497e-01 true resid norm 1.985608018861e+07 ||r(i)||/||b|| 1.439600751996e-01 19 KSP preconditioned resid norm 1.616287717763e-01 true resid norm 2.594110090381e+07 ||r(i)||/||b|| 1.880775460917e-01 20 KSP preconditioned resid norm 1.459692725095e-01 true resid norm 1.632446483812e+07 ||r(i)||/||b|| 1.183552424933e-01 21 KSP preconditioned resid norm 1.285628249652e-01 true resid norm 2.243797730991e+07 ||r(i)||/||b|| 1.626792836339e-01 22 KSP preconditioned resid norm 1.108907310481e-01 true resid norm 1.478710121511e+07 ||r(i)||/||b|| 1.072090857154e-01 23 KSP preconditioned resid norm 9.615779693990e-02 true resid norm 1.714570075368e+07 ||r(i)||/||b|| 1.243093473839e-01 24 KSP preconditioned resid norm 8.502624437458e-02 true resid norm 1.099906702424e+07 ||r(i)||/||b|| 7.974517129734e-02 25 KSP preconditioned resid norm 7.485831599300e-02 true resid norm 1.273774482167e+07 ||r(i)||/||b|| 9.235089126263e-02 26 KSP preconditioned resid norm 6.498795881661e-02 true resid norm 9.054891380813e+06 ||r(i)||/||b|| 6.564955578966e-02 27 KSP preconditioned resid norm 5.606210257311e-02 true resid norm 9.322345868566e+06 ||r(i)||/||b|| 6.758864788658e-02 28 KSP preconditioned resid norm 4.852981804509e-02 true resid norm 6.783123934236e+06 ||r(i)||/||b|| 4.917884206677e-02 29 KSP preconditioned resid norm 4.115275921216e-02 true resid norm 7.099683403283e+06 ||r(i)||/||b|| 5.147395391847e-02 30 KSP preconditioned resid norm 3.475504277276e-02 true resid norm 5.078397872029e+06 ||r(i)||/||b|| 3.681927815593e-02 31 KSP preconditioned resid norm 2.997164782126e-02 true resid norm 4.946201748538e+06 ||r(i)||/||b|| 3.586083299968e-02 32 KSP preconditioned resid norm 2.664135997726e-02 true resid norm 3.448113390289e+06 ||r(i)||/||b|| 2.499942880204e-02 33 KSP preconditioned resid norm 2.388699396304e-02 true resid norm 3.463856953580e+06 ||r(i)||/||b|| 2.511357240610e-02 34 KSP preconditioned resid norm 2.137640944786e-02 true resid norm 2.686552612036e+06 ||r(i)||/||b|| 1.947797915715e-02 35 KSP preconditioned resid norm 1.881149396168e-02 true resid norm 2.957495619033e+06 ||r(i)||/||b|| 2.144236363241e-02 36 KSP preconditioned resid norm 1.647029407191e-02 true resid norm 2.245396553839e+06 ||r(i)||/||b|| 1.627952011037e-02 37 KSP preconditioned resid norm 1.440370412702e-02 true resid norm 2.435890403622e+06 ||r(i)||/||b|| 1.766063404018e-02 38 KSP preconditioned resid norm 1.276902196677e-02 true resid norm 1.729689589175e+06 ||r(i)||/||b|| 1.254055387389e-02 39 KSP preconditioned resid norm 1.139909831257e-02 true resid norm 1.781670979589e+06 ||r(i)||/||b|| 1.291742810093e-02 40 KSP preconditioned resid norm 1.010426972645e-02 true resid norm 1.465549222837e+06 ||r(i)||/||b|| 1.062548974039e-02 41 KSP preconditioned resid norm 8.841216742488e-03 true resid norm 1.517327342182e+06 ||r(i)||/||b|| 1.100089021640e-02 42 KSP preconditioned resid norm 7.597480392291e-03 true resid norm 1.309360150093e+06 ||r(i)||/||b|| 9.493091480305e-03 43 KSP preconditioned resid norm 6.426423094622e-03 true resid norm 1.201517765632e+06 ||r(i)||/||b|| 8.711215217252e-03 44 KSP preconditioned resid norm 5.485494418122e-03 true resid norm 9.649312672951e+05 ||r(i)||/||b|| 6.995921475073e-03 45 KSP preconditioned resid norm 4.745918452027e-03 true resid norm 8.357247744265e+05 ||r(i)||/||b|| 6.059151666885e-03 46 KSP preconditioned resid norm 4.198710675595e-03 true resid norm 6.521219706502e+05 ||r(i)||/||b|| 4.727999033161e-03 47 KSP preconditioned resid norm 3.789795192481e-03 true resid norm 5.714532479962e+05 ||r(i)||/||b|| 4.143136599629e-03 48 KSP preconditioned resid norm 3.467959153060e-03 true resid norm 4.692192176654e+05 ||r(i)||/||b|| 3.401921890856e-03 49 KSP preconditioned resid norm 3.196981188103e-03 true resid norm 4.555886699213e+05 ||r(i)||/||b|| 3.303098021309e-03 50 KSP preconditioned resid norm 2.942116744663e-03 true resid norm 3.798527602738e+05 ||r(i)||/||b|| 2.753999350041e-03 51 KSP preconditioned resid norm 2.708801858465e-03 true resid norm 3.689853990680e+05 ||r(i)||/||b|| 2.675209069102e-03 52 KSP preconditioned resid norm 2.502301741803e-03 true resid norm 3.275165211396e+05 ||r(i)||/||b|| 2.374552407349e-03 53 KSP preconditioned resid norm 2.317949488968e-03 true resid norm 3.018851715748e+05 ||r(i)||/||b|| 2.188720612968e-03 54 KSP preconditioned resid norm 2.156174844354e-03 true resid norm 2.823523554064e+05 ||r(i)||/||b|| 2.047104258796e-03 55 KSP preconditioned resid norm 2.000110080039e-03 true resid norm 2.543455755483e+05 ||r(i)||/||b|| 1.844050176813e-03 56 KSP preconditioned resid norm 1.847999471907e-03 true resid norm 2.391365810971e+05 ||r(i)||/||b|| 1.733782290901e-03 57 KSP preconditioned resid norm 1.708100898735e-03 true resid norm 2.118002702306e+05 ||r(i)||/||b|| 1.535589227082e-03 58 KSP preconditioned resid norm 1.573441943768e-03 true resid norm 1.999611130374e+05 ||r(i)||/||b|| 1.449753254238e-03 59 KSP preconditioned resid norm 1.453586970061e-03 true resid norm 1.779722098624e+05 ||r(i)||/||b|| 1.290329837101e-03 60 KSP preconditioned resid norm 1.350886322617e-03 true resid norm 1.688723404284e+05 ||r(i)||/||b|| 1.224354182511e-03 61 KSP preconditioned resid norm 1.258682463134e-03 true resid norm 1.547529842690e+05 ||r(i)||/||b|| 1.121986365945e-03 62 KSP preconditioned resid norm 1.169434992982e-03 true resid norm 1.472675575210e+05 ||r(i)||/||b|| 1.067715704903e-03 63 KSP preconditioned resid norm 1.079674819468e-03 true resid norm 1.330438765481e+05 ||r(i)||/||b|| 9.645915150817e-04 64 KSP preconditioned resid norm 9.874428413616e-04 true resid norm 1.285833489083e+05 ||r(i)||/||b|| 9.322519048287e-04 65 KSP preconditioned resid norm 8.941059635027e-04 true resid norm 1.142253308783e+05 ||r(i)||/||b|| 8.281537477054e-04 66 KSP preconditioned resid norm 7.872220338454e-04 true resid norm 1.102884114086e+05 ||r(i)||/||b|| 7.996103888185e-04 67 KSP preconditioned resid norm 6.839661733511e-04 true resid norm 9.503368205957e+04 ||r(i)||/||b|| 6.890109168495e-04 68 KSP preconditioned resid norm 5.898177057824e-04 true resid norm 8.569975158634e+04 ||r(i)||/||b|| 6.213382785407e-04 69 KSP preconditioned resid norm 5.053678856224e-04 true resid norm 7.498934376228e+04 ||r(i)||/||b|| 5.436859372365e-04 70 KSP preconditioned resid norm 4.320022440871e-04 true resid norm 6.418754062251e+04 ||r(i)||/||b|| 4.653709638117e-04 71 KSP preconditioned resid norm 3.672139191641e-04 true resid norm 5.605122833701e+04 ||r(i)||/||b|| 4.063812680941e-04 72 KSP preconditioned resid norm 3.066360391429e-04 true resid norm 4.821781492553e+04 ||r(i)||/||b|| 3.495876425107e-04 73 KSP preconditioned resid norm 2.548299312382e-04 true resid norm 4.196880387078e+04 ||r(i)||/||b|| 3.042812128016e-04 74 KSP preconditioned resid norm 2.118585701026e-04 true resid norm 3.526464374380e+04 ||r(i)||/||b|| 2.556748722317e-04 75 KSP preconditioned resid norm 1.767429538449e-04 true resid norm 2.944454292582e+04 ||r(i)||/||b|| 2.134781172092e-04 76 KSP preconditioned resid norm 1.500301319043e-04 true resid norm 2.468586338272e+04 ||r(i)||/||b|| 1.789768531950e-04 77 KSP preconditioned resid norm 1.293962820440e-04 true resid norm 2.072070921551e+04 ||r(i)||/||b|| 1.502287877828e-04 78 KSP preconditioned resid norm 1.105232365734e-04 true resid norm 1.780577813563e+04 ||r(i)||/||b|| 1.290950245489e-04 79 KSP preconditioned resid norm 9.283977688287e-05 true resid norm 1.493966657631e+04 ||r(i)||/||b|| 1.083152114291e-04 80 KSP preconditioned resid norm 7.791430766433e-05 true resid norm 1.247067574651e+04 ||r(i)||/||b|| 9.041459347486e-05 81 KSP preconditioned resid norm 6.556651888802e-05 true resid norm 9.968297147374e+03 ||r(i)||/||b|| 7.227190831810e-05 82 KSP preconditioned resid norm 5.537504198132e-05 true resid norm 8.629249672435e+03 ||r(i)||/||b|| 6.256357850894e-05 83 KSP preconditioned resid norm 4.703997534876e-05 true resid norm 7.096617147931e+03 ||r(i)||/||b|| 5.145172302764e-05 84 KSP preconditioned resid norm 4.006510714154e-05 true resid norm 6.366160074207e+03 ||r(i)||/||b|| 4.615578071352e-05 85 KSP preconditioned resid norm 3.376683589896e-05 true resid norm 5.396248837455e+03 ||r(i)||/||b|| 3.912375358363e-05 86 KSP preconditioned resid norm 2.809495214990e-05 true resid norm 4.909188189426e+03 ||r(i)||/||b|| 3.559247818329e-05 87 KSP preconditioned resid norm 2.347820962340e-05 true resid norm 3.974763705693e+03 ||r(i)||/||b|| 2.881773625695e-05 88 KSP preconditioned resid norm 1.982396512986e-05 true resid norm 3.473147974279e+03 ||r(i)||/||b|| 2.518093394100e-05 89 KSP preconditioned resid norm 1.700659761335e-05 true resid norm 2.760227290755e+03 ||r(i)||/||b|| 2.001213354149e-05 90 KSP preconditioned resid norm 1.499373175494e-05 true resid norm 2.344074368650e+03 ||r(i)||/||b|| 1.699495163088e-05 91 KSP preconditioned resid norm 1.360307479804e-05 true resid norm 1.900490151078e+03 ||r(i)||/||b|| 1.377888800138e-05 92 KSP preconditioned resid norm 1.249495433113e-05 true resid norm 1.618210433966e+03 ||r(i)||/||b|| 1.173231038300e-05 93 KSP preconditioned resid norm 1.150300683784e-05 true resid norm 1.455954597720e+03 ||r(i)||/||b|| 1.055592702004e-05 94 KSP preconditioned resid norm 1.059084138944e-05 true resid norm 1.262010781402e+03 ||r(i)||/||b|| 9.149800225803e-06 95 KSP preconditioned resid norm 9.662389323940e-06 true resid norm 1.215301439629e+03 ||r(i)||/||b|| 8.811149279081e-06 96 KSP preconditioned resid norm 8.681503480457e-06 true resid norm 1.069004503090e+03 ||r(i)||/||b|| 7.750470747086e-06 97 KSP preconditioned resid norm 7.682419140612e-06 true resid norm 1.045217218220e+03 ||r(i)||/||b|| 7.578008746218e-06 98 KSP preconditioned resid norm 6.768975296683e-06 true resid norm 9.348747369017e+02 ||r(i)||/||b|| 6.778006341040e-06 99 KSP preconditioned resid norm 5.977242747868e-06 true resid norm 8.718383163583e+02 ||r(i)||/||b|| 6.320981200350e-06 100 KSP preconditioned resid norm 5.285042408342e-06 true resid norm 7.683347534359e+02 ||r(i)||/||b|| 5.570562156904e-06 101 KSP preconditioned resid norm 4.665732202515e-06 true resid norm 6.777921193321e+02 ||r(i)||/||b|| 4.914112127968e-06 102 KSP preconditioned resid norm 4.130379914167e-06 true resid norm 5.772281802894e+02 ||r(i)||/||b|| 4.185005874899e-06 103 KSP preconditioned resid norm 3.628698410807e-06 true resid norm 4.894653296876e+02 ||r(i)||/||b|| 3.548709765478e-06 104 KSP preconditioned resid norm 3.153048277807e-06 true resid norm 4.249356620286e+02 ||r(i)||/||b|| 3.080858320452e-06 105 KSP preconditioned resid norm 2.757772234239e-06 true resid norm 3.519900248906e+02 ||r(i)||/||b|| 2.551989615847e-06 106 KSP preconditioned resid norm 2.445508908427e-06 true resid norm 3.136882105238e+02 ||r(i)||/||b|| 2.274294722185e-06 107 KSP preconditioned resid norm 2.198218991806e-06 true resid norm 2.634914570760e+02 ||r(i)||/||b|| 1.910359427178e-06 108 KSP preconditioned resid norm 1.974395242035e-06 true resid norm 2.480478776426e+02 ||r(i)||/||b|| 1.798390758867e-06 109 KSP preconditioned resid norm 1.762729796116e-06 true resid norm 2.226623111329e+02 ||r(i)||/||b|| 1.614340934884e-06 110 KSP preconditioned resid norm 1.568122463183e-06 true resid norm 2.145171953162e+02 ||r(i)||/||b|| 1.555287412016e-06 111 KSP preconditioned resid norm 1.387736181698e-06 true resid norm 1.952676099913e+02 ||r(i)||/||b|| 1.415724531296e-06 112 KSP preconditioned resid norm 1.216112588033e-06 true resid norm 1.838192127791e+02 ||r(i)||/||b|| 1.332721637073e-06 113 KSP preconditioned resid norm 1.053116104148e-06 true resid norm 1.636126064266e+02 ||r(i)||/||b|| 1.186220185507e-06 114 KSP preconditioned resid norm 9.031383314478e-07 true resid norm 1.492802463042e+02 ||r(i)||/||b|| 1.082308052729e-06 115 KSP preconditioned resid norm 7.632614892009e-07 true resid norm 1.290421703804e+02 ||r(i)||/||b|| 9.355784412344e-07 116 KSP preconditioned resid norm 6.312348151395e-07 true resid norm 1.137948273916e+02 ||r(i)||/||b|| 8.250325216763e-07 117 KSP preconditioned resid norm 5.131878711201e-07 true resid norm 9.359691688996e+01 ||r(i)||/||b|| 6.785941165598e-07 118 KSP preconditioned resid norm 4.175283244420e-07 true resid norm 7.836275622787e+01 ||r(i)||/||b|| 5.681437711903e-07 119 KSP preconditioned resid norm 3.436406468823e-07 true resid norm 6.250193237076e+01 ||r(i)||/||b|| 4.531500073905e-07 120 KSP preconditioned resid norm 2.848928030049e-07 true resid norm 5.073297682146e+01 ||r(i)||/||b|| 3.678230088186e-07 121 KSP preconditioned resid norm 2.385222482703e-07 true resid norm 4.086992937564e+01 ||r(i)||/||b|| 2.963141793563e-07 122 KSP preconditioned resid norm 2.009837789504e-07 true resid norm 3.313795701825e+01 ||r(i)||/||b|| 2.402560192643e-07 123 KSP preconditioned resid norm 1.702119131612e-07 true resid norm 2.713678016435e+01 ||r(i)||/||b|| 1.967464311197e-07 124 KSP preconditioned resid norm 1.452437153791e-07 true resid norm 2.275092931707e+01 ||r(i)||/||b|| 1.649482407522e-07 125 KSP preconditioned resid norm 1.258597152459e-07 true resid norm 1.969719685589e+01 ||r(i)||/||b|| 1.428081430807e-07 126 KSP preconditioned resid norm 1.099656180693e-07 true resid norm 1.684553971561e+01 ||r(i)||/||b|| 1.221331270423e-07 127 KSP preconditioned resid norm 9.609071194249e-08 true resid norm 1.456894280933e+01 ||r(i)||/||b|| 1.056273988868e-07 128 KSP preconditioned resid norm 8.335806699897e-08 true resid norm 1.259199550933e+01 ||r(i)||/||b|| 9.129418310245e-08 129 KSP preconditioned resid norm 7.201543486543e-08 true resid norm 1.066127980017e+01 ||r(i)||/||b|| 7.729615448656e-08 130 KSP preconditioned resid norm 6.224482708204e-08 true resid norm 9.347965933554e+00 ||r(i)||/||b|| 6.777439786579e-08 131 KSP preconditioned resid norm 5.370305957638e-08 true resid norm 7.934585697190e+00 ||r(i)||/||b|| 5.752714245687e-08 132 KSP preconditioned resid norm 4.572913023727e-08 true resid norm 7.068485009027e+00 ||r(i)||/||b|| 5.124776007052e-08 133 KSP preconditioned resid norm 3.872205065558e-08 true resid norm 5.976371697552e+00 ||r(i)||/||b|| 4.332974639647e-08 134 KSP preconditioned resid norm 3.285092158053e-08 true resid norm 5.218042556055e+00 ||r(i)||/||b|| 3.783172668669e-08 135 KSP preconditioned resid norm 2.789700871097e-08 true resid norm 4.438823171064e+00 ||r(i)||/||b|| 3.218224903577e-08 136 KSP preconditioned resid norm 2.364142882920e-08 true resid norm 3.949840232160e+00 ||r(i)||/||b|| 2.863703668836e-08 137 KSP preconditioned resid norm 2.005918792395e-08 true resid norm 3.363443951745e+00 ||r(i)||/||b|| 2.438556047435e-08 138 KSP preconditioned resid norm 1.701096972675e-08 true resid norm 2.979989310769e+00 ||r(i)||/||b|| 2.160544685544e-08 Linear solve converged due to CONVERGED_RTOL iterations 138 KSP Object: 8 MPI processes type: gmres GMRES: restart=300, using Modified Gram-Schmidt Orthogonalization GMRES: happy breakdown tolerance 1e-30 maximum iterations=300, initial guess is zero tolerances: relative=1e-09, absolute=1e-20, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object: 8 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type PMIS HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaij rows=657685, cols=657685 total: nonzeros=1.19268e+08, allocated nonzeros=1.73433e+08 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 27310 nodes, limit used is 5 KSPSolve completed -------------- next part -------------- Mat BlockSize: 3 0 KSP preconditioned resid norm 3.922843899310e+01 true resid norm 1.379276869721e+08 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 3.789896758165e+00 true resid norm 3.322014839602e+08 ||r(i)||/||b|| 2.408519212154e+00 2 KSP preconditioned resid norm 1.662141140725e+00 true resid norm 1.679851765628e+08 ||r(i)||/||b|| 1.217922088383e+00 3 KSP preconditioned resid norm 7.267315369946e-01 true resid norm 1.849826360747e+08 ||r(i)||/||b|| 1.341156660679e+00 4 KSP preconditioned resid norm 2.968302859938e-01 true resid norm 9.098771823190e+07 ||r(i)||/||b|| 6.596769671800e-01 5 KSP preconditioned resid norm 1.850093521701e-01 true resid norm 3.367649135701e+07 ||r(i)||/||b|| 2.441604879796e-01 6 KSP preconditioned resid norm 1.123686922687e-01 true resid norm 2.397865218391e+07 ||r(i)||/||b|| 1.738494475642e-01 7 KSP preconditioned resid norm 7.007966328525e-02 true resid norm 1.704429706270e+07 ||r(i)||/||b|| 1.235741527816e-01 8 KSP preconditioned resid norm 4.540226244304e-02 true resid norm 1.155499950028e+07 ||r(i)||/||b|| 8.377577956930e-02 9 KSP preconditioned resid norm 2.826820112754e-02 true resid norm 8.329824418647e+06 ||r(i)||/||b|| 6.039269273277e-02 10 KSP preconditioned resid norm 1.548122767123e-02 true resid norm 5.046728472042e+06 ||r(i)||/||b|| 3.658966943354e-02 11 KSP preconditioned resid norm 8.319309697099e-03 true resid norm 2.706550673766e+06 ||r(i)||/||b|| 1.962296862351e-02 12 KSP preconditioned resid norm 4.305763732343e-03 true resid norm 1.410741128981e+06 ||r(i)||/||b|| 1.022812141601e-02 13 KSP preconditioned resid norm 2.440491660864e-03 true resid norm 7.333572066616e+05 ||r(i)||/||b|| 5.316968788217e-03 14 KSP preconditioned resid norm 1.476352449129e-03 true resid norm 4.058067750648e+05 ||r(i)||/||b|| 2.942170524087e-03 15 KSP preconditioned resid norm 8.703809868174e-04 true resid norm 2.713238735433e+05 ||r(i)||/||b|| 1.967145824741e-03 16 KSP preconditioned resid norm 5.001006007448e-04 true resid norm 1.707183750745e+05 ||r(i)||/||b|| 1.237738258520e-03 17 KSP preconditioned resid norm 2.877239505205e-04 true resid norm 1.072077924150e+05 ||r(i)||/||b|| 7.772753590559e-04 18 KSP preconditioned resid norm 1.800774654043e-04 true resid norm 7.938540730423e+04 ||r(i)||/||b|| 5.755581714373e-04 19 KSP preconditioned resid norm 1.158652230276e-04 true resid norm 6.225024111862e+04 ||r(i)||/||b|| 4.513252015255e-04 20 KSP preconditioned resid norm 7.758215398310e-05 true resid norm 5.008804225912e+04 ||r(i)||/||b|| 3.631471197603e-04 21 KSP preconditioned resid norm 5.662942779039e-05 true resid norm 4.190496993343e+04 ||r(i)||/||b|| 3.038184055237e-04 22 KSP preconditioned resid norm 4.130690000478e-05 true resid norm 3.441885346649e+04 ||r(i)||/||b|| 2.495427438978e-04 23 KSP preconditioned resid norm 3.169843455827e-05 true resid norm 2.773595092118e+04 ||r(i)||/||b|| 2.010905245355e-04 24 KSP preconditioned resid norm 2.398916332360e-05 true resid norm 2.141075928952e+04 ||r(i)||/||b|| 1.552317722391e-04 25 KSP preconditioned resid norm 1.806290897280e-05 true resid norm 1.600142681600e+04 ||r(i)||/||b|| 1.160131599918e-04 26 KSP preconditioned resid norm 1.306830199253e-05 true resid norm 1.114410319574e+04 ||r(i)||/||b|| 8.079670906096e-05 27 KSP preconditioned resid norm 9.332635869324e-06 true resid norm 8.131127062842e+03 ||r(i)||/||b|| 5.895210194083e-05 28 KSP preconditioned resid norm 7.155256833303e-06 true resid norm 6.423264348021e+03 ||r(i)||/||b|| 4.656979674661e-05 29 KSP preconditioned resid norm 5.755469825493e-06 true resid norm 5.164732533873e+03 ||r(i)||/||b|| 3.744521964556e-05 30 KSP preconditioned resid norm 4.713974229264e-06 true resid norm 4.212063007112e+03 ||r(i)||/||b|| 3.053819794690e-05 31 KSP preconditioned resid norm 3.923079924701e-06 true resid norm 3.379019742987e+03 ||r(i)||/||b|| 2.449848770154e-05 32 KSP preconditioned resid norm 3.198857062416e-06 true resid norm 2.635520676977e+03 ||r(i)||/||b|| 1.910798864851e-05 33 KSP preconditioned resid norm 2.434729804632e-06 true resid norm 1.989444200221e+03 ||r(i)||/||b|| 1.442382050982e-05 34 KSP preconditioned resid norm 1.799411077253e-06 true resid norm 1.601232088542e+03 ||r(i)||/||b|| 1.160921439120e-05 35 KSP preconditioned resid norm 1.441113857607e-06 true resid norm 1.388357809073e+03 ||r(i)||/||b|| 1.006583840817e-05 36 KSP preconditioned resid norm 1.222670192141e-06 true resid norm 1.193888531401e+03 ||r(i)||/||b|| 8.655901926654e-06 37 KSP preconditioned resid norm 1.022705842198e-06 true resid norm 9.615954443382e+02 ||r(i)||/||b|| 6.971736171671e-06 38 KSP preconditioned resid norm 8.061451908820e-07 true resid norm 7.055806093973e+02 ||r(i)||/||b|| 5.115583570542e-06 39 KSP preconditioned resid norm 6.141756002800e-07 true resid norm 5.077759033730e+02 ||r(i)||/||b|| 3.681464646585e-06 40 KSP preconditioned resid norm 4.523502767060e-07 true resid norm 3.646731386764e+02 ||r(i)||/||b|| 2.643944422487e-06 41 KSP preconditioned resid norm 3.435490374534e-07 true resid norm 2.756734541759e+02 ||r(i)||/||b|| 1.998681049670e-06 42 KSP preconditioned resid norm 2.698337933227e-07 true resid norm 2.119727541091e+02 ||r(i)||/||b|| 1.536839765551e-06 43 KSP preconditioned resid norm 2.022177136958e-07 true resid norm 1.525505724444e+02 ||r(i)||/||b|| 1.106018492685e-06 44 KSP preconditioned resid norm 1.450868732897e-07 true resid norm 1.092286964160e+02 ||r(i)||/||b|| 7.919272686572e-07 45 KSP preconditioned resid norm 1.093179310599e-07 true resid norm 8.571670843458e+01 ||r(i)||/||b|| 6.214612186741e-07 46 KSP preconditioned resid norm 8.815037192628e-08 true resid norm 7.064136907088e+01 ||r(i)||/||b|| 5.121623556638e-07 47 KSP preconditioned resid norm 7.142227995496e-08 true resid norm 5.692059590833e+01 ||r(i)||/||b|| 4.126843359582e-07 48 KSP preconditioned resid norm 5.506505113710e-08 true resid norm 4.457038722481e+01 ||r(i)||/||b|| 3.231431498871e-07 49 KSP preconditioned resid norm 3.976822689446e-08 true resid norm 3.530686596329e+01 ||r(i)||/||b|| 2.559809907523e-07 50 KSP preconditioned resid norm 2.931961034973e-08 true resid norm 2.958079043384e+01 ||r(i)||/||b|| 2.144659356161e-07 Linear solve converged due to CONVERGED_RTOL iterations 50 KSP Object: 8 MPI processes type: gmres GMRES: restart=300, using Modified Gram-Schmidt Orthogonalization GMRES: happy breakdown tolerance 1e-30 maximum iterations=300, initial guess is zero tolerances: relative=1e-09, absolute=1e-20, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object: 8 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type PMIS HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaij rows=670170, cols=670170, bs=3 total: nonzeros=1.22417e+08, allocated nonzeros=1.77479e+08 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 27322 nodes, limit used is 5 KSPSolve completed From knepley at gmail.com Fri Jan 22 04:15:48 2016 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 22 Jan 2016 04:15:48 -0600 Subject: [petsc-users] Why use MATMPIBAIJ? In-Reply-To: References: <2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov> <87fuy07zvi.fsf@jedbrown.org> <87si1ug8hl.fsf@jedbrown.org> Message-ID: On Fri, Jan 22, 2016 at 3:40 AM, Hoang Giang Bui wrote: > Hi Matt > I would rather like to set the block size for block P2 too. Why? > > Because in one of my test (for problem involves only [u_x u_y u_z]), the > gmres + Hypre AMG converges in 50 steps with block size 3, whereby it > increases to 140 if block size is 1 (see attached files). > You can still do that. It can be done with options once the decomposition is working. Its true that these solvers work better with the block size set. However, if its the P2 Laplacian it does not really matter since its uncoupled. This gives me the impression that AMG will give better inversion for "P2" > block if I can set its block size to 3. Of course it's still an hypothesis > but worth to try. > > Another question: In one of the Petsc presentation, you said the Hypre AMG > does not scale well, because set up cost amortize the iterations. How is it > quantified? and what is the memory overhead? > I said the Hypre setup cost is not scalable, but it can be amortized over the iterations. You can quantify this just by looking at the PCSetUp time as your increase the number of processes. I don't think they have a good model for the memory usage, and if they do, I do not know what it is. However, generally Hypre takes more memory than the agglomeration MG like ML or GAMG. Thanks, Matt > > Giang > > On Mon, Jan 18, 2016 at 5:25 PM, Jed Brown wrote: > >> Hoang Giang Bui writes: >> >> > Why P2/P2 is not for co-located discretization? >> >> Matt typed "P2/P2" when me meant "P2/P1". >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From hgbk2008 at gmail.com Fri Jan 22 07:27:38 2016 From: hgbk2008 at gmail.com (Hoang Giang Bui) Date: Fri, 22 Jan 2016 14:27:38 +0100 Subject: [petsc-users] Why use MATMPIBAIJ? In-Reply-To: References: <2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov> <87fuy07zvi.fsf@jedbrown.org> <87si1ug8hl.fsf@jedbrown.org> Message-ID: DO you mean the option pc_fieldsplit_block_size? In this thread: http://petsc-users.mcs.anl.narkive.com/qSHIOFhh/fieldsplit-error It assumes you have a constant number of fields at each grid point, am I right? However, my field split is not constant, like [u1_x u1_y u1_z p_1 u2_x u2_y u2_z u3_x u3_y u3_z p_3 u4_x u4_y u4_z] Subsequently the fieldsplit is [u1_x u1_y u1_z u2_x u2_y u2_z u3_x u3_y u3_z u4_x u4_y u4_z] [p_1 p_3] Then what is the option to set block size 3 for split 0? Sorry, I search several forum threads but cannot figure out the options as you said. > You can still do that. It can be done with options once the decomposition > is working. Its true that these solvers > work better with the block size set. However, if its the P2 Laplacian it > does not really matter since its uncoupled. > > Yes, I agree it's uncoupled with the other field, but the crucial factor defining the quality of the block preconditioner is the approximate inversion of individual block. I would merely try block Jacobi first, because it's quite simple. Nevertheless, fieldsplit implements other nice things, like Schur complement, etc. Giang On Fri, Jan 22, 2016 at 11:15 AM, Matthew Knepley wrote: > On Fri, Jan 22, 2016 at 3:40 AM, Hoang Giang Bui > wrote: > >> Hi Matt >> I would rather like to set the block size for block P2 too. Why? >> >> Because in one of my test (for problem involves only [u_x u_y u_z]), the >> gmres + Hypre AMG converges in 50 steps with block size 3, whereby it >> increases to 140 if block size is 1 (see attached files). >> > > You can still do that. It can be done with options once the decomposition > is working. Its true that these solvers > work better with the block size set. However, if its the P2 Laplacian it > does not really matter since its uncoupled. > > This gives me the impression that AMG will give better inversion for "P2" >> block if I can set its block size to 3. Of course it's still an hypothesis >> but worth to try. >> >> Another question: In one of the Petsc presentation, you said the Hypre >> AMG does not scale well, because set up cost amortize the iterations. How >> is it quantified? and what is the memory overhead? >> > > I said the Hypre setup cost is not scalable, but it can be amortized over > the iterations. You can quantify this > just by looking at the PCSetUp time as your increase the number of > processes. I don't think they have a good > model for the memory usage, and if they do, I do not know what it is. > However, generally Hypre takes more > memory than the agglomeration MG like ML or GAMG. > > Thanks, > > Matt > > >> >> Giang >> >> On Mon, Jan 18, 2016 at 5:25 PM, Jed Brown wrote: >> >>> Hoang Giang Bui writes: >>> >>> > Why P2/P2 is not for co-located discretization? >>> >>> Matt typed "P2/P2" when me meant "P2/P1". >>> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Jan 22 07:57:22 2016 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 22 Jan 2016 07:57:22 -0600 Subject: [petsc-users] Why use MATMPIBAIJ? In-Reply-To: References: <2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov> <87fuy07zvi.fsf@jedbrown.org> <87si1ug8hl.fsf@jedbrown.org> Message-ID: On Fri, Jan 22, 2016 at 7:27 AM, Hoang Giang Bui wrote: > DO you mean the option pc_fieldsplit_block_size? In this thread: > > http://petsc-users.mcs.anl.narkive.com/qSHIOFhh/fieldsplit-error > No. "Block Size" is confusing on PETSc since it is used to do several things. Here block size is being used to split the matrix. You do not need this since you are prescribing your splits. The matrix block size is used two ways: 1) To indicate that matrix values come in logically dense blocks 2) To change the storage to match this logical arrangement After everything works, we can just indicate to the submatrix which is extracted that it has a certain block size. However, for the Laplacian I expect it not to matter. > It assumes you have a constant number of fields at each grid point, am I > right? However, my field split is not constant, like > [u1_x u1_y u1_z p_1 u2_x u2_y u2_z u3_x u3_y > u3_z p_3 u4_x u4_y u4_z] > > Subsequently the fieldsplit is > [u1_x u1_y u1_z u2_x u2_y u2_z u3_x u3_y u3_z > u4_x u4_y u4_z] > [p_1 p_3] > > Then what is the option to set block size 3 for split 0? > > Sorry, I search several forum threads but cannot figure out the options as > you said. > > > >> You can still do that. It can be done with options once the decomposition >> is working. Its true that these solvers >> work better with the block size set. However, if its the P2 Laplacian it >> does not really matter since its uncoupled. >> >> Yes, I agree it's uncoupled with the other field, but the crucial factor > defining the quality of the block preconditioner is the approximate > inversion of individual block. I would merely try block Jacobi first, > because it's quite simple. Nevertheless, fieldsplit implements other nice > things, like Schur complement, etc. > I think concepts are getting confused here. I was talking about the interaction of components in one block (the P2 block). You are talking about interaction between blocks. Thanks, Matt > Giang > > > > On Fri, Jan 22, 2016 at 11:15 AM, Matthew Knepley > wrote: > >> On Fri, Jan 22, 2016 at 3:40 AM, Hoang Giang Bui >> wrote: >> >>> Hi Matt >>> I would rather like to set the block size for block P2 too. Why? >>> >>> Because in one of my test (for problem involves only [u_x u_y u_z]), the >>> gmres + Hypre AMG converges in 50 steps with block size 3, whereby it >>> increases to 140 if block size is 1 (see attached files). >>> >> >> You can still do that. It can be done with options once the decomposition >> is working. Its true that these solvers >> work better with the block size set. However, if its the P2 Laplacian it >> does not really matter since its uncoupled. >> >> This gives me the impression that AMG will give better inversion for "P2" >>> block if I can set its block size to 3. Of course it's still an hypothesis >>> but worth to try. >>> >>> Another question: In one of the Petsc presentation, you said the Hypre >>> AMG does not scale well, because set up cost amortize the iterations. How >>> is it quantified? and what is the memory overhead? >>> >> >> I said the Hypre setup cost is not scalable, but it can be amortized over >> the iterations. You can quantify this >> just by looking at the PCSetUp time as your increase the number of >> processes. I don't think they have a good >> model for the memory usage, and if they do, I do not know what it is. >> However, generally Hypre takes more >> memory than the agglomeration MG like ML or GAMG. >> >> Thanks, >> >> Matt >> >> >>> >>> Giang >>> >>> On Mon, Jan 18, 2016 at 5:25 PM, Jed Brown wrote: >>> >>>> Hoang Giang Bui writes: >>>> >>>> > Why P2/P2 is not for co-located discretization? >>>> >>>> Matt typed "P2/P2" when me meant "P2/P1". >>>> >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri Jan 22 09:27:53 2016 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 22 Jan 2016 10:27:53 -0500 Subject: [petsc-users] Why use MATMPIBAIJ? In-Reply-To: References: <2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov> <87fuy07zvi.fsf@jedbrown.org> <87si1ug8hl.fsf@jedbrown.org> Message-ID: > > > > I said the Hypre setup cost is not scalable, > I'd be a little careful here. Scaling for the matrix triple product is hard and hypre does put effort into scaling. I don't have any data however. Do you? > but it can be amortized over the iterations. You can quantify this > just by looking at the PCSetUp time as your increase the number of > processes. I don't think they have a good > model for the memory usage, and if they do, I do not know what it is. > However, generally Hypre takes more > memory than the agglomeration MG like ML or GAMG. > > agglomerations methods tend to have lower "grid complexity", that is smaller coarse grids, than classic AMG like in hypre. THis is more of a constant complexity and not a scaling issue though. You can address this with parameters to some extent. But for elasticity, you want to at least try, if not start with, GAMG or ML. > Thanks, > > Matt > > >> >> Giang >> >> On Mon, Jan 18, 2016 at 5:25 PM, Jed Brown wrote: >> >>> Hoang Giang Bui writes: >>> >>> > Why P2/P2 is not for co-located discretization? >>> >>> Matt typed "P2/P2" when me meant "P2/P1". >>> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Jan 22 09:32:52 2016 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 22 Jan 2016 09:32:52 -0600 Subject: [petsc-users] Why use MATMPIBAIJ? In-Reply-To: References: <2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov> <87fuy07zvi.fsf@jedbrown.org> <87si1ug8hl.fsf@jedbrown.org> Message-ID: On Fri, Jan 22, 2016 at 9:27 AM, Mark Adams wrote: > >> >> I said the Hypre setup cost is not scalable, >> > > I'd be a little careful here. Scaling for the matrix triple product is > hard and hypre does put effort into scaling. I don't have any data > however. Do you? > I used it for PyLith and saw this. I did not think any AMG had scalable setup time. Matt > but it can be amortized over the iterations. You can quantify this >> just by looking at the PCSetUp time as your increase the number of >> processes. I don't think they have a good >> model for the memory usage, and if they do, I do not know what it is. >> However, generally Hypre takes more >> memory than the agglomeration MG like ML or GAMG. >> >> > agglomerations methods tend to have lower "grid complexity", that is > smaller coarse grids, than classic AMG like in hypre. THis is more of a > constant complexity and not a scaling issue though. You can address this > with parameters to some extent. But for elasticity, you want to at least > try, if not start with, GAMG or ML. > > >> Thanks, >> >> Matt >> >> >>> >>> Giang >>> >>> On Mon, Jan 18, 2016 at 5:25 PM, Jed Brown wrote: >>> >>>> Hoang Giang Bui writes: >>>> >>>> > Why P2/P2 is not for co-located discretization? >>>> >>>> Matt typed "P2/P2" when me meant "P2/P1". >>>> >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From hng.email at gmail.com Fri Jan 22 10:52:27 2016 From: hng.email at gmail.com (Hom Nath Gharti) Date: Fri, 22 Jan 2016 11:52:27 -0500 Subject: [petsc-users] Why use MATMPIBAIJ? In-Reply-To: References: <2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov> <87fuy07zvi.fsf@jedbrown.org> <87si1ug8hl.fsf@jedbrown.org> Message-ID: Dear all, I take this opportunity to ask for your important suggestion. I am solving an elastic-acoustic-gravity equation on the planet. I have displacement vector (ux,uy,uz) in solid region, displacement potential (\xi) and pressure (p) in fluid region, and gravitational potential (\phi) in all of space. All these variables are coupled. Currently, I am using MATMPIAIJ and form a single global matrix. Does using a MATMPIBIJ or MATNEST improve the convergence/efficiency in this case? For your information, total degrees of freedoms are about a billion. Any suggestion would be greatly appreciated. Thanks, Hom Nath On Fri, Jan 22, 2016 at 10:32 AM, Matthew Knepley wrote: > On Fri, Jan 22, 2016 at 9:27 AM, Mark Adams wrote: >>> >>> >>> >>> I said the Hypre setup cost is not scalable, >> >> >> I'd be a little careful here. Scaling for the matrix triple product is >> hard and hypre does put effort into scaling. I don't have any data however. >> Do you? > > > I used it for PyLith and saw this. I did not think any AMG had scalable > setup time. > > Matt > >>> >>> but it can be amortized over the iterations. You can quantify this >>> just by looking at the PCSetUp time as your increase the number of >>> processes. I don't think they have a good >>> model for the memory usage, and if they do, I do not know what it is. >>> However, generally Hypre takes more >>> memory than the agglomeration MG like ML or GAMG. >>> >> >> agglomerations methods tend to have lower "grid complexity", that is >> smaller coarse grids, than classic AMG like in hypre. THis is more of a >> constant complexity and not a scaling issue though. You can address this >> with parameters to some extent. But for elasticity, you want to at least >> try, if not start with, GAMG or ML. >> >>> >>> Thanks, >>> >>> Matt >>> >>>> >>>> >>>> Giang >>>> >>>> On Mon, Jan 18, 2016 at 5:25 PM, Jed Brown wrote: >>>>> >>>>> Hoang Giang Bui writes: >>>>> >>>>> > Why P2/P2 is not for co-located discretization? >>>>> >>>>> Matt typed "P2/P2" when me meant "P2/P1". >>>> >>>> >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >> >> > > > > -- > What most experimenters take for granted before they begin their experiments > is infinitely more interesting than any results to which their experiments > lead. > -- Norbert Wiener From knepley at gmail.com Fri Jan 22 11:01:35 2016 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 22 Jan 2016 11:01:35 -0600 Subject: [petsc-users] Why use MATMPIBAIJ? In-Reply-To: References: <2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov> <87fuy07zvi.fsf@jedbrown.org> <87si1ug8hl.fsf@jedbrown.org> Message-ID: On Fri, Jan 22, 2016 at 10:52 AM, Hom Nath Gharti wrote: > Dear all, > > I take this opportunity to ask for your important suggestion. > > I am solving an elastic-acoustic-gravity equation on the planet. I > have displacement vector (ux,uy,uz) in solid region, displacement > potential (\xi) and pressure (p) in fluid region, and gravitational > potential (\phi) in all of space. All these variables are coupled. > > Currently, I am using MATMPIAIJ and form a single global matrix. Does > using a MATMPIBIJ or MATNEST improve the convergence/efficiency in > this case? For your information, total degrees of freedoms are about a > billion. > 1) For any solver question, we need to see the output of -ksp_view, and we would also like -ksp_monitor_true_residual -ksp_converged_reason 2) MATNEST does not affect convergence, and MATMPIBAIJ only in the blocksize which you could set without that format 3) However, you might see benefit from using something like PCFIELDSPLIT if you have multiphysics here Matt > Any suggestion would be greatly appreciated. > > Thanks, > Hom Nath > > On Fri, Jan 22, 2016 at 10:32 AM, Matthew Knepley > wrote: > > On Fri, Jan 22, 2016 at 9:27 AM, Mark Adams wrote: > >>> > >>> > >>> > >>> I said the Hypre setup cost is not scalable, > >> > >> > >> I'd be a little careful here. Scaling for the matrix triple product is > >> hard and hypre does put effort into scaling. I don't have any data > however. > >> Do you? > > > > > > I used it for PyLith and saw this. I did not think any AMG had scalable > > setup time. > > > > Matt > > > >>> > >>> but it can be amortized over the iterations. You can quantify this > >>> just by looking at the PCSetUp time as your increase the number of > >>> processes. I don't think they have a good > >>> model for the memory usage, and if they do, I do not know what it is. > >>> However, generally Hypre takes more > >>> memory than the agglomeration MG like ML or GAMG. > >>> > >> > >> agglomerations methods tend to have lower "grid complexity", that is > >> smaller coarse grids, than classic AMG like in hypre. THis is more of a > >> constant complexity and not a scaling issue though. You can address > this > >> with parameters to some extent. But for elasticity, you want to at least > >> try, if not start with, GAMG or ML. > >> > >>> > >>> Thanks, > >>> > >>> Matt > >>> > >>>> > >>>> > >>>> Giang > >>>> > >>>> On Mon, Jan 18, 2016 at 5:25 PM, Jed Brown wrote: > >>>>> > >>>>> Hoang Giang Bui writes: > >>>>> > >>>>> > Why P2/P2 is not for co-located discretization? > >>>>> > >>>>> Matt typed "P2/P2" when me meant "P2/P1". > >>>> > >>>> > >>> > >>> > >>> > >>> -- > >>> What most experimenters take for granted before they begin their > >>> experiments is infinitely more interesting than any results to which > their > >>> experiments lead. > >>> -- Norbert Wiener > >> > >> > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments > > is infinitely more interesting than any results to which their > experiments > > lead. > > -- Norbert Wiener > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From hng.email at gmail.com Fri Jan 22 11:10:54 2016 From: hng.email at gmail.com (Hom Nath Gharti) Date: Fri, 22 Jan 2016 12:10:54 -0500 Subject: [petsc-users] Why use MATMPIBAIJ? In-Reply-To: References: <2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov> <87fuy07zvi.fsf@jedbrown.org> <87si1ug8hl.fsf@jedbrown.org> Message-ID: Thanks Matt. Attached detailed info on ksp of a much smaller test. This is a multiphysics problem. Hom Nath On Fri, Jan 22, 2016 at 12:01 PM, Matthew Knepley wrote: > On Fri, Jan 22, 2016 at 10:52 AM, Hom Nath Gharti > wrote: >> >> Dear all, >> >> I take this opportunity to ask for your important suggestion. >> >> I am solving an elastic-acoustic-gravity equation on the planet. I >> have displacement vector (ux,uy,uz) in solid region, displacement >> potential (\xi) and pressure (p) in fluid region, and gravitational >> potential (\phi) in all of space. All these variables are coupled. >> >> Currently, I am using MATMPIAIJ and form a single global matrix. Does >> using a MATMPIBIJ or MATNEST improve the convergence/efficiency in >> this case? For your information, total degrees of freedoms are about a >> billion. > > > 1) For any solver question, we need to see the output of -ksp_view, and we > would also like > > -ksp_monitor_true_residual -ksp_converged_reason > > 2) MATNEST does not affect convergence, and MATMPIBAIJ only in the blocksize > which you > could set without that format > > 3) However, you might see benefit from using something like PCFIELDSPLIT if > you have multiphysics here > > Matt > >> >> Any suggestion would be greatly appreciated. >> >> Thanks, >> Hom Nath >> >> On Fri, Jan 22, 2016 at 10:32 AM, Matthew Knepley >> wrote: >> > On Fri, Jan 22, 2016 at 9:27 AM, Mark Adams wrote: >> >>> >> >>> >> >>> >> >>> I said the Hypre setup cost is not scalable, >> >> >> >> >> >> I'd be a little careful here. Scaling for the matrix triple product is >> >> hard and hypre does put effort into scaling. I don't have any data >> >> however. >> >> Do you? >> > >> > >> > I used it for PyLith and saw this. I did not think any AMG had scalable >> > setup time. >> > >> > Matt >> > >> >>> >> >>> but it can be amortized over the iterations. You can quantify this >> >>> just by looking at the PCSetUp time as your increase the number of >> >>> processes. I don't think they have a good >> >>> model for the memory usage, and if they do, I do not know what it is. >> >>> However, generally Hypre takes more >> >>> memory than the agglomeration MG like ML or GAMG. >> >>> >> >> >> >> agglomerations methods tend to have lower "grid complexity", that is >> >> smaller coarse grids, than classic AMG like in hypre. THis is more of a >> >> constant complexity and not a scaling issue though. You can address >> >> this >> >> with parameters to some extent. But for elasticity, you want to at >> >> least >> >> try, if not start with, GAMG or ML. >> >> >> >>> >> >>> Thanks, >> >>> >> >>> Matt >> >>> >> >>>> >> >>>> >> >>>> Giang >> >>>> >> >>>> On Mon, Jan 18, 2016 at 5:25 PM, Jed Brown wrote: >> >>>>> >> >>>>> Hoang Giang Bui writes: >> >>>>> >> >>>>> > Why P2/P2 is not for co-located discretization? >> >>>>> >> >>>>> Matt typed "P2/P2" when me meant "P2/P1". >> >>>> >> >>>> >> >>> >> >>> >> >>> >> >>> -- >> >>> What most experimenters take for granted before they begin their >> >>> experiments is infinitely more interesting than any results to which >> >>> their >> >>> experiments lead. >> >>> -- Norbert Wiener >> >> >> >> >> > >> > >> > >> > -- >> > What most experimenters take for granted before they begin their >> > experiments >> > is infinitely more interesting than any results to which their >> > experiments >> > lead. >> > -- Norbert Wiener > > > > > -- > What most experimenters take for granted before they begin their experiments > is infinitely more interesting than any results to which their experiments > lead. > -- Norbert Wiener -------------- next part -------------- A non-text attachment was scrubbed... Name: ksplog Type: application/octet-stream Size: 14041 bytes Desc: not available URL: From knepley at gmail.com Fri Jan 22 11:16:22 2016 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 22 Jan 2016 11:16:22 -0600 Subject: [petsc-users] Why use MATMPIBAIJ? In-Reply-To: References: <2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov> <87fuy07zvi.fsf@jedbrown.org> <87si1ug8hl.fsf@jedbrown.org> Message-ID: On Fri, Jan 22, 2016 at 11:10 AM, Hom Nath Gharti wrote: > Thanks Matt. > > Attached detailed info on ksp of a much smaller test. This is a > multiphysics problem. > You are using FGMRES/ASM(ILU0). From your description below, this sounds like an elliptic system. I would at least try AMG (-pc_type gamg) to see how it does. Any other advice would have to be based on seeing the equations. Thanks, Matt > Hom Nath > > On Fri, Jan 22, 2016 at 12:01 PM, Matthew Knepley > wrote: > > On Fri, Jan 22, 2016 at 10:52 AM, Hom Nath Gharti > > wrote: > >> > >> Dear all, > >> > >> I take this opportunity to ask for your important suggestion. > >> > >> I am solving an elastic-acoustic-gravity equation on the planet. I > >> have displacement vector (ux,uy,uz) in solid region, displacement > >> potential (\xi) and pressure (p) in fluid region, and gravitational > >> potential (\phi) in all of space. All these variables are coupled. > >> > >> Currently, I am using MATMPIAIJ and form a single global matrix. Does > >> using a MATMPIBIJ or MATNEST improve the convergence/efficiency in > >> this case? For your information, total degrees of freedoms are about a > >> billion. > > > > > > 1) For any solver question, we need to see the output of -ksp_view, and > we > > would also like > > > > -ksp_monitor_true_residual -ksp_converged_reason > > > > 2) MATNEST does not affect convergence, and MATMPIBAIJ only in the > blocksize > > which you > > could set without that format > > > > 3) However, you might see benefit from using something like PCFIELDSPLIT > if > > you have multiphysics here > > > > Matt > > > >> > >> Any suggestion would be greatly appreciated. > >> > >> Thanks, > >> Hom Nath > >> > >> On Fri, Jan 22, 2016 at 10:32 AM, Matthew Knepley > >> wrote: > >> > On Fri, Jan 22, 2016 at 9:27 AM, Mark Adams wrote: > >> >>> > >> >>> > >> >>> > >> >>> I said the Hypre setup cost is not scalable, > >> >> > >> >> > >> >> I'd be a little careful here. Scaling for the matrix triple product > is > >> >> hard and hypre does put effort into scaling. I don't have any data > >> >> however. > >> >> Do you? > >> > > >> > > >> > I used it for PyLith and saw this. I did not think any AMG had > scalable > >> > setup time. > >> > > >> > Matt > >> > > >> >>> > >> >>> but it can be amortized over the iterations. You can quantify this > >> >>> just by looking at the PCSetUp time as your increase the number of > >> >>> processes. I don't think they have a good > >> >>> model for the memory usage, and if they do, I do not know what it > is. > >> >>> However, generally Hypre takes more > >> >>> memory than the agglomeration MG like ML or GAMG. > >> >>> > >> >> > >> >> agglomerations methods tend to have lower "grid complexity", that is > >> >> smaller coarse grids, than classic AMG like in hypre. THis is more > of a > >> >> constant complexity and not a scaling issue though. You can address > >> >> this > >> >> with parameters to some extent. But for elasticity, you want to at > >> >> least > >> >> try, if not start with, GAMG or ML. > >> >> > >> >>> > >> >>> Thanks, > >> >>> > >> >>> Matt > >> >>> > >> >>>> > >> >>>> > >> >>>> Giang > >> >>>> > >> >>>> On Mon, Jan 18, 2016 at 5:25 PM, Jed Brown > wrote: > >> >>>>> > >> >>>>> Hoang Giang Bui writes: > >> >>>>> > >> >>>>> > Why P2/P2 is not for co-located discretization? > >> >>>>> > >> >>>>> Matt typed "P2/P2" when me meant "P2/P1". > >> >>>> > >> >>>> > >> >>> > >> >>> > >> >>> > >> >>> -- > >> >>> What most experimenters take for granted before they begin their > >> >>> experiments is infinitely more interesting than any results to which > >> >>> their > >> >>> experiments lead. > >> >>> -- Norbert Wiener > >> >> > >> >> > >> > > >> > > >> > > >> > -- > >> > What most experimenters take for granted before they begin their > >> > experiments > >> > is infinitely more interesting than any results to which their > >> > experiments > >> > lead. > >> > -- Norbert Wiener > > > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments > > is infinitely more interesting than any results to which their > experiments > > lead. > > -- Norbert Wiener > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From hng.email at gmail.com Fri Jan 22 11:47:47 2016 From: hng.email at gmail.com (Hom Nath Gharti) Date: Fri, 22 Jan 2016 12:47:47 -0500 Subject: [petsc-users] Why use MATMPIBAIJ? In-Reply-To: References: <2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov> <87fuy07zvi.fsf@jedbrown.org> <87si1ug8hl.fsf@jedbrown.org> Message-ID: Thanks a lot. With AMG it did not converge within the iteration limit of 3000. In solid: elastic wave equation with added gravity term \rho \nabla\phi In fluid: acoustic wave equation with added gravity term \rho \nabla\phi Both solid and fluid: Poisson's equation for gravity Outer space: Laplace's equation for gravity We combine so called mapped infinite element with spectral-element method (higher order FEM that uses nodal quadrature) and solve in frequency domain. Hom Nath On Fri, Jan 22, 2016 at 12:16 PM, Matthew Knepley wrote: > On Fri, Jan 22, 2016 at 11:10 AM, Hom Nath Gharti > wrote: >> >> Thanks Matt. >> >> Attached detailed info on ksp of a much smaller test. This is a >> multiphysics problem. > > > You are using FGMRES/ASM(ILU0). From your description below, this sounds > like > an elliptic system. I would at least try AMG (-pc_type gamg) to see how it > does. Any > other advice would have to be based on seeing the equations. > > Thanks, > > Matt > >> >> Hom Nath >> >> On Fri, Jan 22, 2016 at 12:01 PM, Matthew Knepley >> wrote: >> > On Fri, Jan 22, 2016 at 10:52 AM, Hom Nath Gharti >> > wrote: >> >> >> >> Dear all, >> >> >> >> I take this opportunity to ask for your important suggestion. >> >> >> >> I am solving an elastic-acoustic-gravity equation on the planet. I >> >> have displacement vector (ux,uy,uz) in solid region, displacement >> >> potential (\xi) and pressure (p) in fluid region, and gravitational >> >> potential (\phi) in all of space. All these variables are coupled. >> >> >> >> Currently, I am using MATMPIAIJ and form a single global matrix. Does >> >> using a MATMPIBIJ or MATNEST improve the convergence/efficiency in >> >> this case? For your information, total degrees of freedoms are about a >> >> billion. >> > >> > >> > 1) For any solver question, we need to see the output of -ksp_view, and >> > we >> > would also like >> > >> > -ksp_monitor_true_residual -ksp_converged_reason >> > >> > 2) MATNEST does not affect convergence, and MATMPIBAIJ only in the >> > blocksize >> > which you >> > could set without that format >> > >> > 3) However, you might see benefit from using something like PCFIELDSPLIT >> > if >> > you have multiphysics here >> > >> > Matt >> > >> >> >> >> Any suggestion would be greatly appreciated. >> >> >> >> Thanks, >> >> Hom Nath >> >> >> >> On Fri, Jan 22, 2016 at 10:32 AM, Matthew Knepley >> >> wrote: >> >> > On Fri, Jan 22, 2016 at 9:27 AM, Mark Adams wrote: >> >> >>> >> >> >>> >> >> >>> >> >> >>> I said the Hypre setup cost is not scalable, >> >> >> >> >> >> >> >> >> I'd be a little careful here. Scaling for the matrix triple product >> >> >> is >> >> >> hard and hypre does put effort into scaling. I don't have any data >> >> >> however. >> >> >> Do you? >> >> > >> >> > >> >> > I used it for PyLith and saw this. I did not think any AMG had >> >> > scalable >> >> > setup time. >> >> > >> >> > Matt >> >> > >> >> >>> >> >> >>> but it can be amortized over the iterations. You can quantify this >> >> >>> just by looking at the PCSetUp time as your increase the number of >> >> >>> processes. I don't think they have a good >> >> >>> model for the memory usage, and if they do, I do not know what it >> >> >>> is. >> >> >>> However, generally Hypre takes more >> >> >>> memory than the agglomeration MG like ML or GAMG. >> >> >>> >> >> >> >> >> >> agglomerations methods tend to have lower "grid complexity", that is >> >> >> smaller coarse grids, than classic AMG like in hypre. THis is more >> >> >> of a >> >> >> constant complexity and not a scaling issue though. You can address >> >> >> this >> >> >> with parameters to some extent. But for elasticity, you want to at >> >> >> least >> >> >> try, if not start with, GAMG or ML. >> >> >> >> >> >>> >> >> >>> Thanks, >> >> >>> >> >> >>> Matt >> >> >>> >> >> >>>> >> >> >>>> >> >> >>>> Giang >> >> >>>> >> >> >>>> On Mon, Jan 18, 2016 at 5:25 PM, Jed Brown >> >> >>>> wrote: >> >> >>>>> >> >> >>>>> Hoang Giang Bui writes: >> >> >>>>> >> >> >>>>> > Why P2/P2 is not for co-located discretization? >> >> >>>>> >> >> >>>>> Matt typed "P2/P2" when me meant "P2/P1". >> >> >>>> >> >> >>>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> -- >> >> >>> What most experimenters take for granted before they begin their >> >> >>> experiments is infinitely more interesting than any results to >> >> >>> which >> >> >>> their >> >> >>> experiments lead. >> >> >>> -- Norbert Wiener >> >> >> >> >> >> >> >> > >> >> > >> >> > >> >> > -- >> >> > What most experimenters take for granted before they begin their >> >> > experiments >> >> > is infinitely more interesting than any results to which their >> >> > experiments >> >> > lead. >> >> > -- Norbert Wiener >> > >> > >> > >> > >> > -- >> > What most experimenters take for granted before they begin their >> > experiments >> > is infinitely more interesting than any results to which their >> > experiments >> > lead. >> > -- Norbert Wiener > > > > > -- > What most experimenters take for granted before they begin their experiments > is infinitely more interesting than any results to which their experiments > lead. > -- Norbert Wiener From knepley at gmail.com Fri Jan 22 12:07:04 2016 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 22 Jan 2016 12:07:04 -0600 Subject: [petsc-users] Why use MATMPIBAIJ? In-Reply-To: References: <2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov> <87fuy07zvi.fsf@jedbrown.org> <87si1ug8hl.fsf@jedbrown.org> Message-ID: On Fri, Jan 22, 2016 at 11:47 AM, Hom Nath Gharti wrote: > Thanks a lot. > > With AMG it did not converge within the iteration limit of 3000. > > In solid: elastic wave equation with added gravity term \rho \nabla\phi > In fluid: acoustic wave equation with added gravity term \rho \nabla\phi > Both solid and fluid: Poisson's equation for gravity > Outer space: Laplace's equation for gravity > > We combine so called mapped infinite element with spectral-element > method (higher order FEM that uses nodal quadrature) and solve in > frequency domain. > 1) The Poisson and Laplace equation should be using MG, however you are using SEM, so you would need to use a low order PC for the high order problem, also called p-MG (Paul Fischer), see http://epubs.siam.org/doi/abs/10.1137/110834512 2) The acoustic wave equation is Helmholtz to us, and that needs special MG tweaks that are still research material so I can understand using ASM. 3) Same thing for the elastic wave equations. Some people say they have this solved using hierarchical matrix methods, something like http://portal.nersc.gov/project/sparse/strumpack/ However, I think the jury is still out. If you can do 100 iterations of plain vanilla solvers, that seems like a win right now. You might improve the time using FS, but I am not sure about the iterations on the smaller problem. Thanks, Matt > Hom Nath > > On Fri, Jan 22, 2016 at 12:16 PM, Matthew Knepley > wrote: > > On Fri, Jan 22, 2016 at 11:10 AM, Hom Nath Gharti > > wrote: > >> > >> Thanks Matt. > >> > >> Attached detailed info on ksp of a much smaller test. This is a > >> multiphysics problem. > > > > > > You are using FGMRES/ASM(ILU0). From your description below, this sounds > > like > > an elliptic system. I would at least try AMG (-pc_type gamg) to see how > it > > does. Any > > other advice would have to be based on seeing the equations. > > > > Thanks, > > > > Matt > > > >> > >> Hom Nath > >> > >> On Fri, Jan 22, 2016 at 12:01 PM, Matthew Knepley > >> wrote: > >> > On Fri, Jan 22, 2016 at 10:52 AM, Hom Nath Gharti < > hng.email at gmail.com> > >> > wrote: > >> >> > >> >> Dear all, > >> >> > >> >> I take this opportunity to ask for your important suggestion. > >> >> > >> >> I am solving an elastic-acoustic-gravity equation on the planet. I > >> >> have displacement vector (ux,uy,uz) in solid region, displacement > >> >> potential (\xi) and pressure (p) in fluid region, and gravitational > >> >> potential (\phi) in all of space. All these variables are coupled. > >> >> > >> >> Currently, I am using MATMPIAIJ and form a single global matrix. Does > >> >> using a MATMPIBIJ or MATNEST improve the convergence/efficiency in > >> >> this case? For your information, total degrees of freedoms are about > a > >> >> billion. > >> > > >> > > >> > 1) For any solver question, we need to see the output of -ksp_view, > and > >> > we > >> > would also like > >> > > >> > -ksp_monitor_true_residual -ksp_converged_reason > >> > > >> > 2) MATNEST does not affect convergence, and MATMPIBAIJ only in the > >> > blocksize > >> > which you > >> > could set without that format > >> > > >> > 3) However, you might see benefit from using something like > PCFIELDSPLIT > >> > if > >> > you have multiphysics here > >> > > >> > Matt > >> > > >> >> > >> >> Any suggestion would be greatly appreciated. > >> >> > >> >> Thanks, > >> >> Hom Nath > >> >> > >> >> On Fri, Jan 22, 2016 at 10:32 AM, Matthew Knepley > > >> >> wrote: > >> >> > On Fri, Jan 22, 2016 at 9:27 AM, Mark Adams > wrote: > >> >> >>> > >> >> >>> > >> >> >>> > >> >> >>> I said the Hypre setup cost is not scalable, > >> >> >> > >> >> >> > >> >> >> I'd be a little careful here. Scaling for the matrix triple > product > >> >> >> is > >> >> >> hard and hypre does put effort into scaling. I don't have any data > >> >> >> however. > >> >> >> Do you? > >> >> > > >> >> > > >> >> > I used it for PyLith and saw this. I did not think any AMG had > >> >> > scalable > >> >> > setup time. > >> >> > > >> >> > Matt > >> >> > > >> >> >>> > >> >> >>> but it can be amortized over the iterations. You can quantify > this > >> >> >>> just by looking at the PCSetUp time as your increase the number > of > >> >> >>> processes. I don't think they have a good > >> >> >>> model for the memory usage, and if they do, I do not know what it > >> >> >>> is. > >> >> >>> However, generally Hypre takes more > >> >> >>> memory than the agglomeration MG like ML or GAMG. > >> >> >>> > >> >> >> > >> >> >> agglomerations methods tend to have lower "grid complexity", that > is > >> >> >> smaller coarse grids, than classic AMG like in hypre. THis is more > >> >> >> of a > >> >> >> constant complexity and not a scaling issue though. You can > address > >> >> >> this > >> >> >> with parameters to some extent. But for elasticity, you want to at > >> >> >> least > >> >> >> try, if not start with, GAMG or ML. > >> >> >> > >> >> >>> > >> >> >>> Thanks, > >> >> >>> > >> >> >>> Matt > >> >> >>> > >> >> >>>> > >> >> >>>> > >> >> >>>> Giang > >> >> >>>> > >> >> >>>> On Mon, Jan 18, 2016 at 5:25 PM, Jed Brown > >> >> >>>> wrote: > >> >> >>>>> > >> >> >>>>> Hoang Giang Bui writes: > >> >> >>>>> > >> >> >>>>> > Why P2/P2 is not for co-located discretization? > >> >> >>>>> > >> >> >>>>> Matt typed "P2/P2" when me meant "P2/P1". > >> >> >>>> > >> >> >>>> > >> >> >>> > >> >> >>> > >> >> >>> > >> >> >>> -- > >> >> >>> What most experimenters take for granted before they begin their > >> >> >>> experiments is infinitely more interesting than any results to > >> >> >>> which > >> >> >>> their > >> >> >>> experiments lead. > >> >> >>> -- Norbert Wiener > >> >> >> > >> >> >> > >> >> > > >> >> > > >> >> > > >> >> > -- > >> >> > What most experimenters take for granted before they begin their > >> >> > experiments > >> >> > is infinitely more interesting than any results to which their > >> >> > experiments > >> >> > lead. > >> >> > -- Norbert Wiener > >> > > >> > > >> > > >> > > >> > -- > >> > What most experimenters take for granted before they begin their > >> > experiments > >> > is infinitely more interesting than any results to which their > >> > experiments > >> > lead. > >> > -- Norbert Wiener > > > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments > > is infinitely more interesting than any results to which their > experiments > > lead. > > -- Norbert Wiener > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From hng.email at gmail.com Fri Jan 22 12:17:16 2016 From: hng.email at gmail.com (Hom Nath Gharti) Date: Fri, 22 Jan 2016 13:17:16 -0500 Subject: [petsc-users] Why use MATMPIBAIJ? In-Reply-To: References: <2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov> <87fuy07zvi.fsf@jedbrown.org> <87si1ug8hl.fsf@jedbrown.org> Message-ID: Thanks Matt for great suggestion. One last question, do you know whether the GPU capability of current PETSC version is matured enough to try for my problem? Thanks again for your help. Hom Nath On Fri, Jan 22, 2016 at 1:07 PM, Matthew Knepley wrote: > On Fri, Jan 22, 2016 at 11:47 AM, Hom Nath Gharti > wrote: >> >> Thanks a lot. >> >> With AMG it did not converge within the iteration limit of 3000. >> >> In solid: elastic wave equation with added gravity term \rho \nabla\phi >> In fluid: acoustic wave equation with added gravity term \rho \nabla\phi >> Both solid and fluid: Poisson's equation for gravity >> Outer space: Laplace's equation for gravity >> >> We combine so called mapped infinite element with spectral-element >> method (higher order FEM that uses nodal quadrature) and solve in >> frequency domain. > > > 1) The Poisson and Laplace equation should be using MG, however you are > using SEM, so > you would need to use a low order PC for the high order problem, also > called p-MG (Paul Fischer), see > > http://epubs.siam.org/doi/abs/10.1137/110834512 > > 2) The acoustic wave equation is Helmholtz to us, and that needs special MG > tweaks that > are still research material so I can understand using ASM. > > 3) Same thing for the elastic wave equations. Some people say they have this > solved using > hierarchical matrix methods, something like > > http://portal.nersc.gov/project/sparse/strumpack/ > > However, I think the jury is still out. > > If you can do 100 iterations of plain vanilla solvers, that seems like a win > right now. You might improve > the time using FS, but I am not sure about the iterations on the smaller > problem. > > Thanks, > > Matt > >> >> Hom Nath >> >> On Fri, Jan 22, 2016 at 12:16 PM, Matthew Knepley >> wrote: >> > On Fri, Jan 22, 2016 at 11:10 AM, Hom Nath Gharti >> > wrote: >> >> >> >> Thanks Matt. >> >> >> >> Attached detailed info on ksp of a much smaller test. This is a >> >> multiphysics problem. >> > >> > >> > You are using FGMRES/ASM(ILU0). From your description below, this sounds >> > like >> > an elliptic system. I would at least try AMG (-pc_type gamg) to see how >> > it >> > does. Any >> > other advice would have to be based on seeing the equations. >> > >> > Thanks, >> > >> > Matt >> > >> >> >> >> Hom Nath >> >> >> >> On Fri, Jan 22, 2016 at 12:01 PM, Matthew Knepley >> >> wrote: >> >> > On Fri, Jan 22, 2016 at 10:52 AM, Hom Nath Gharti >> >> > >> >> > wrote: >> >> >> >> >> >> Dear all, >> >> >> >> >> >> I take this opportunity to ask for your important suggestion. >> >> >> >> >> >> I am solving an elastic-acoustic-gravity equation on the planet. I >> >> >> have displacement vector (ux,uy,uz) in solid region, displacement >> >> >> potential (\xi) and pressure (p) in fluid region, and gravitational >> >> >> potential (\phi) in all of space. All these variables are coupled. >> >> >> >> >> >> Currently, I am using MATMPIAIJ and form a single global matrix. >> >> >> Does >> >> >> using a MATMPIBIJ or MATNEST improve the convergence/efficiency in >> >> >> this case? For your information, total degrees of freedoms are about >> >> >> a >> >> >> billion. >> >> > >> >> > >> >> > 1) For any solver question, we need to see the output of -ksp_view, >> >> > and >> >> > we >> >> > would also like >> >> > >> >> > -ksp_monitor_true_residual -ksp_converged_reason >> >> > >> >> > 2) MATNEST does not affect convergence, and MATMPIBAIJ only in the >> >> > blocksize >> >> > which you >> >> > could set without that format >> >> > >> >> > 3) However, you might see benefit from using something like >> >> > PCFIELDSPLIT >> >> > if >> >> > you have multiphysics here >> >> > >> >> > Matt >> >> > >> >> >> >> >> >> Any suggestion would be greatly appreciated. >> >> >> >> >> >> Thanks, >> >> >> Hom Nath >> >> >> >> >> >> On Fri, Jan 22, 2016 at 10:32 AM, Matthew Knepley >> >> >> >> >> >> wrote: >> >> >> > On Fri, Jan 22, 2016 at 9:27 AM, Mark Adams >> >> >> > wrote: >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> I said the Hypre setup cost is not scalable, >> >> >> >> >> >> >> >> >> >> >> >> I'd be a little careful here. Scaling for the matrix triple >> >> >> >> product >> >> >> >> is >> >> >> >> hard and hypre does put effort into scaling. I don't have any >> >> >> >> data >> >> >> >> however. >> >> >> >> Do you? >> >> >> > >> >> >> > >> >> >> > I used it for PyLith and saw this. I did not think any AMG had >> >> >> > scalable >> >> >> > setup time. >> >> >> > >> >> >> > Matt >> >> >> > >> >> >> >>> >> >> >> >>> but it can be amortized over the iterations. You can quantify >> >> >> >>> this >> >> >> >>> just by looking at the PCSetUp time as your increase the number >> >> >> >>> of >> >> >> >>> processes. I don't think they have a good >> >> >> >>> model for the memory usage, and if they do, I do not know what >> >> >> >>> it >> >> >> >>> is. >> >> >> >>> However, generally Hypre takes more >> >> >> >>> memory than the agglomeration MG like ML or GAMG. >> >> >> >>> >> >> >> >> >> >> >> >> agglomerations methods tend to have lower "grid complexity", that >> >> >> >> is >> >> >> >> smaller coarse grids, than classic AMG like in hypre. THis is >> >> >> >> more >> >> >> >> of a >> >> >> >> constant complexity and not a scaling issue though. You can >> >> >> >> address >> >> >> >> this >> >> >> >> with parameters to some extent. But for elasticity, you want to >> >> >> >> at >> >> >> >> least >> >> >> >> try, if not start with, GAMG or ML. >> >> >> >> >> >> >> >>> >> >> >> >>> Thanks, >> >> >> >>> >> >> >> >>> Matt >> >> >> >>> >> >> >> >>>> >> >> >> >>>> >> >> >> >>>> Giang >> >> >> >>>> >> >> >> >>>> On Mon, Jan 18, 2016 at 5:25 PM, Jed Brown >> >> >> >>>> wrote: >> >> >> >>>>> >> >> >> >>>>> Hoang Giang Bui writes: >> >> >> >>>>> >> >> >> >>>>> > Why P2/P2 is not for co-located discretization? >> >> >> >>>>> >> >> >> >>>>> Matt typed "P2/P2" when me meant "P2/P1". >> >> >> >>>> >> >> >> >>>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> -- >> >> >> >>> What most experimenters take for granted before they begin their >> >> >> >>> experiments is infinitely more interesting than any results to >> >> >> >>> which >> >> >> >>> their >> >> >> >>> experiments lead. >> >> >> >>> -- Norbert Wiener >> >> >> >> >> >> >> >> >> >> >> > >> >> >> > >> >> >> > >> >> >> > -- >> >> >> > What most experimenters take for granted before they begin their >> >> >> > experiments >> >> >> > is infinitely more interesting than any results to which their >> >> >> > experiments >> >> >> > lead. >> >> >> > -- Norbert Wiener >> >> > >> >> > >> >> > >> >> > >> >> > -- >> >> > What most experimenters take for granted before they begin their >> >> > experiments >> >> > is infinitely more interesting than any results to which their >> >> > experiments >> >> > lead. >> >> > -- Norbert Wiener >> > >> > >> > >> > >> > -- >> > What most experimenters take for granted before they begin their >> > experiments >> > is infinitely more interesting than any results to which their >> > experiments >> > lead. >> > -- Norbert Wiener > > > > > -- > What most experimenters take for granted before they begin their experiments > is infinitely more interesting than any results to which their experiments > lead. > -- Norbert Wiener From hng.email at gmail.com Fri Jan 22 14:19:10 2016 From: hng.email at gmail.com (Hom Nath Gharti) Date: Fri, 22 Jan 2016 15:19:10 -0500 Subject: [petsc-users] PCFIELDSPLIT question Message-ID: Dear all, I am new to PcFieldSplit. I have a matrix formed using MATMPIAIJ. Is it possible to use PCFIELDSPLIT operations in this type of matrix? Or does it have to be MATMPIBIJ or MATNEST format? If possible for MATMPIAIJ, could anybody provide me a simple example or few steps? Variables in the equations are displacement vector, scalar potential and pressure. Thanks for help. Hom Nath From mfadams at lbl.gov Fri Jan 22 14:44:15 2016 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 22 Jan 2016 15:44:15 -0500 Subject: [petsc-users] Why use MATMPIBAIJ? In-Reply-To: References: <2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov> <87fuy07zvi.fsf@jedbrown.org> <87si1ug8hl.fsf@jedbrown.org> Message-ID: > > > I used it for PyLith and saw this. I did not think any AMG had scalable > setup time. > > OK, I am guessing it was scaling poorly, weak scaling, but it was sublinear after some saturation at the beginning. I have not done a weak scaling study on matrix setup (RAP primarily) ever, but I did on Prometheus in the GB work. Prometheus' RAP was pretty simple also and PETSc's is probably faster, and hence may look less scalable. I don't think there is anything fundamentally unscalable about RAP, it is just a complicated algorithm and we have never gotten around to doing all the things that we would like to do with it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Jan 22 15:33:13 2016 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 22 Jan 2016 15:33:13 -0600 Subject: [petsc-users] Why use MATMPIBAIJ? In-Reply-To: References: <2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov> <87fuy07zvi.fsf@jedbrown.org> <87si1ug8hl.fsf@jedbrown.org> Message-ID: On Fri, Jan 22, 2016 at 12:17 PM, Hom Nath Gharti wrote: > Thanks Matt for great suggestion. One last question, do you know > whether the GPU capability of current PETSC version is matured enough > to try for my problem? > The only thing that would really make sense to do on the GPU is the SEM integration, which would not be part of PETSc. This is what SPECFEM has optimized. Thanks, Matt > Thanks again for your help. > Hom Nath > > On Fri, Jan 22, 2016 at 1:07 PM, Matthew Knepley > wrote: > > On Fri, Jan 22, 2016 at 11:47 AM, Hom Nath Gharti > > wrote: > >> > >> Thanks a lot. > >> > >> With AMG it did not converge within the iteration limit of 3000. > >> > >> In solid: elastic wave equation with added gravity term \rho \nabla\phi > >> In fluid: acoustic wave equation with added gravity term \rho \nabla\phi > >> Both solid and fluid: Poisson's equation for gravity > >> Outer space: Laplace's equation for gravity > >> > >> We combine so called mapped infinite element with spectral-element > >> method (higher order FEM that uses nodal quadrature) and solve in > >> frequency domain. > > > > > > 1) The Poisson and Laplace equation should be using MG, however you are > > using SEM, so > > you would need to use a low order PC for the high order problem, also > > called p-MG (Paul Fischer), see > > > > http://epubs.siam.org/doi/abs/10.1137/110834512 > > > > 2) The acoustic wave equation is Helmholtz to us, and that needs special > MG > > tweaks that > > are still research material so I can understand using ASM. > > > > 3) Same thing for the elastic wave equations. Some people say they have > this > > solved using > > hierarchical matrix methods, something like > > > > http://portal.nersc.gov/project/sparse/strumpack/ > > > > However, I think the jury is still out. > > > > If you can do 100 iterations of plain vanilla solvers, that seems like a > win > > right now. You might improve > > the time using FS, but I am not sure about the iterations on the smaller > > problem. > > > > Thanks, > > > > Matt > > > >> > >> Hom Nath > >> > >> On Fri, Jan 22, 2016 at 12:16 PM, Matthew Knepley > >> wrote: > >> > On Fri, Jan 22, 2016 at 11:10 AM, Hom Nath Gharti < > hng.email at gmail.com> > >> > wrote: > >> >> > >> >> Thanks Matt. > >> >> > >> >> Attached detailed info on ksp of a much smaller test. This is a > >> >> multiphysics problem. > >> > > >> > > >> > You are using FGMRES/ASM(ILU0). From your description below, this > sounds > >> > like > >> > an elliptic system. I would at least try AMG (-pc_type gamg) to see > how > >> > it > >> > does. Any > >> > other advice would have to be based on seeing the equations. > >> > > >> > Thanks, > >> > > >> > Matt > >> > > >> >> > >> >> Hom Nath > >> >> > >> >> On Fri, Jan 22, 2016 at 12:01 PM, Matthew Knepley > > >> >> wrote: > >> >> > On Fri, Jan 22, 2016 at 10:52 AM, Hom Nath Gharti > >> >> > > >> >> > wrote: > >> >> >> > >> >> >> Dear all, > >> >> >> > >> >> >> I take this opportunity to ask for your important suggestion. > >> >> >> > >> >> >> I am solving an elastic-acoustic-gravity equation on the planet. I > >> >> >> have displacement vector (ux,uy,uz) in solid region, displacement > >> >> >> potential (\xi) and pressure (p) in fluid region, and > gravitational > >> >> >> potential (\phi) in all of space. All these variables are coupled. > >> >> >> > >> >> >> Currently, I am using MATMPIAIJ and form a single global matrix. > >> >> >> Does > >> >> >> using a MATMPIBIJ or MATNEST improve the convergence/efficiency in > >> >> >> this case? For your information, total degrees of freedoms are > about > >> >> >> a > >> >> >> billion. > >> >> > > >> >> > > >> >> > 1) For any solver question, we need to see the output of -ksp_view, > >> >> > and > >> >> > we > >> >> > would also like > >> >> > > >> >> > -ksp_monitor_true_residual -ksp_converged_reason > >> >> > > >> >> > 2) MATNEST does not affect convergence, and MATMPIBAIJ only in the > >> >> > blocksize > >> >> > which you > >> >> > could set without that format > >> >> > > >> >> > 3) However, you might see benefit from using something like > >> >> > PCFIELDSPLIT > >> >> > if > >> >> > you have multiphysics here > >> >> > > >> >> > Matt > >> >> > > >> >> >> > >> >> >> Any suggestion would be greatly appreciated. > >> >> >> > >> >> >> Thanks, > >> >> >> Hom Nath > >> >> >> > >> >> >> On Fri, Jan 22, 2016 at 10:32 AM, Matthew Knepley > >> >> >> > >> >> >> wrote: > >> >> >> > On Fri, Jan 22, 2016 at 9:27 AM, Mark Adams > >> >> >> > wrote: > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> I said the Hypre setup cost is not scalable, > >> >> >> >> > >> >> >> >> > >> >> >> >> I'd be a little careful here. Scaling for the matrix triple > >> >> >> >> product > >> >> >> >> is > >> >> >> >> hard and hypre does put effort into scaling. I don't have any > >> >> >> >> data > >> >> >> >> however. > >> >> >> >> Do you? > >> >> >> > > >> >> >> > > >> >> >> > I used it for PyLith and saw this. I did not think any AMG had > >> >> >> > scalable > >> >> >> > setup time. > >> >> >> > > >> >> >> > Matt > >> >> >> > > >> >> >> >>> > >> >> >> >>> but it can be amortized over the iterations. You can quantify > >> >> >> >>> this > >> >> >> >>> just by looking at the PCSetUp time as your increase the > number > >> >> >> >>> of > >> >> >> >>> processes. I don't think they have a good > >> >> >> >>> model for the memory usage, and if they do, I do not know what > >> >> >> >>> it > >> >> >> >>> is. > >> >> >> >>> However, generally Hypre takes more > >> >> >> >>> memory than the agglomeration MG like ML or GAMG. > >> >> >> >>> > >> >> >> >> > >> >> >> >> agglomerations methods tend to have lower "grid complexity", > that > >> >> >> >> is > >> >> >> >> smaller coarse grids, than classic AMG like in hypre. THis is > >> >> >> >> more > >> >> >> >> of a > >> >> >> >> constant complexity and not a scaling issue though. You can > >> >> >> >> address > >> >> >> >> this > >> >> >> >> with parameters to some extent. But for elasticity, you want to > >> >> >> >> at > >> >> >> >> least > >> >> >> >> try, if not start with, GAMG or ML. > >> >> >> >> > >> >> >> >>> > >> >> >> >>> Thanks, > >> >> >> >>> > >> >> >> >>> Matt > >> >> >> >>> > >> >> >> >>>> > >> >> >> >>>> > >> >> >> >>>> Giang > >> >> >> >>>> > >> >> >> >>>> On Mon, Jan 18, 2016 at 5:25 PM, Jed Brown > > >> >> >> >>>> wrote: > >> >> >> >>>>> > >> >> >> >>>>> Hoang Giang Bui writes: > >> >> >> >>>>> > >> >> >> >>>>> > Why P2/P2 is not for co-located discretization? > >> >> >> >>>>> > >> >> >> >>>>> Matt typed "P2/P2" when me meant "P2/P1". > >> >> >> >>>> > >> >> >> >>>> > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> -- > >> >> >> >>> What most experimenters take for granted before they begin > their > >> >> >> >>> experiments is infinitely more interesting than any results to > >> >> >> >>> which > >> >> >> >>> their > >> >> >> >>> experiments lead. > >> >> >> >>> -- Norbert Wiener > >> >> >> >> > >> >> >> >> > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > -- > >> >> >> > What most experimenters take for granted before they begin their > >> >> >> > experiments > >> >> >> > is infinitely more interesting than any results to which their > >> >> >> > experiments > >> >> >> > lead. > >> >> >> > -- Norbert Wiener > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > -- > >> >> > What most experimenters take for granted before they begin their > >> >> > experiments > >> >> > is infinitely more interesting than any results to which their > >> >> > experiments > >> >> > lead. > >> >> > -- Norbert Wiener > >> > > >> > > >> > > >> > > >> > -- > >> > What most experimenters take for granted before they begin their > >> > experiments > >> > is infinitely more interesting than any results to which their > >> > experiments > >> > lead. > >> > -- Norbert Wiener > > > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments > > is infinitely more interesting than any results to which their > experiments > > lead. > > -- Norbert Wiener > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From hng.email at gmail.com Fri Jan 22 15:47:13 2016 From: hng.email at gmail.com (Hom Nath Gharti) Date: Fri, 22 Jan 2016 16:47:13 -0500 Subject: [petsc-users] Why use MATMPIBAIJ? In-Reply-To: References: <2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov> <87fuy07zvi.fsf@jedbrown.org> <87si1ug8hl.fsf@jedbrown.org> Message-ID: Hi Matt, SPECFEM currently has only an explicit time scheme and does not have full gravity implemented. I am adding implicit time scheme and full gravity so that it can be used for interesting quasistatic problems such as glacial rebound, post seismic relaxation etc. I am using Petsc as a linear solver which I would like to see GPU implemented. Thanks, Hom Nath On Fri, Jan 22, 2016 at 4:33 PM, Matthew Knepley wrote: > On Fri, Jan 22, 2016 at 12:17 PM, Hom Nath Gharti > wrote: >> >> Thanks Matt for great suggestion. One last question, do you know >> whether the GPU capability of current PETSC version is matured enough >> to try for my problem? > > > The only thing that would really make sense to do on the GPU is the SEM > integration, which > would not be part of PETSc. This is what SPECFEM has optimized. > > Thanks, > > Matt > >> >> Thanks again for your help. >> Hom Nath >> >> On Fri, Jan 22, 2016 at 1:07 PM, Matthew Knepley >> wrote: >> > On Fri, Jan 22, 2016 at 11:47 AM, Hom Nath Gharti >> > wrote: >> >> >> >> Thanks a lot. >> >> >> >> With AMG it did not converge within the iteration limit of 3000. >> >> >> >> In solid: elastic wave equation with added gravity term \rho \nabla\phi >> >> In fluid: acoustic wave equation with added gravity term \rho >> >> \nabla\phi >> >> Both solid and fluid: Poisson's equation for gravity >> >> Outer space: Laplace's equation for gravity >> >> >> >> We combine so called mapped infinite element with spectral-element >> >> method (higher order FEM that uses nodal quadrature) and solve in >> >> frequency domain. >> > >> > >> > 1) The Poisson and Laplace equation should be using MG, however you are >> > using SEM, so >> > you would need to use a low order PC for the high order problem, >> > also >> > called p-MG (Paul Fischer), see >> > >> > http://epubs.siam.org/doi/abs/10.1137/110834512 >> > >> > 2) The acoustic wave equation is Helmholtz to us, and that needs special >> > MG >> > tweaks that >> > are still research material so I can understand using ASM. >> > >> > 3) Same thing for the elastic wave equations. Some people say they have >> > this >> > solved using >> > hierarchical matrix methods, something like >> > >> > http://portal.nersc.gov/project/sparse/strumpack/ >> > >> > However, I think the jury is still out. >> > >> > If you can do 100 iterations of plain vanilla solvers, that seems like a >> > win >> > right now. You might improve >> > the time using FS, but I am not sure about the iterations on the smaller >> > problem. >> > >> > Thanks, >> > >> > Matt >> > >> >> >> >> Hom Nath >> >> >> >> On Fri, Jan 22, 2016 at 12:16 PM, Matthew Knepley >> >> wrote: >> >> > On Fri, Jan 22, 2016 at 11:10 AM, Hom Nath Gharti >> >> > >> >> > wrote: >> >> >> >> >> >> Thanks Matt. >> >> >> >> >> >> Attached detailed info on ksp of a much smaller test. This is a >> >> >> multiphysics problem. >> >> > >> >> > >> >> > You are using FGMRES/ASM(ILU0). From your description below, this >> >> > sounds >> >> > like >> >> > an elliptic system. I would at least try AMG (-pc_type gamg) to see >> >> > how >> >> > it >> >> > does. Any >> >> > other advice would have to be based on seeing the equations. >> >> > >> >> > Thanks, >> >> > >> >> > Matt >> >> > >> >> >> >> >> >> Hom Nath >> >> >> >> >> >> On Fri, Jan 22, 2016 at 12:01 PM, Matthew Knepley >> >> >> >> >> >> wrote: >> >> >> > On Fri, Jan 22, 2016 at 10:52 AM, Hom Nath Gharti >> >> >> > >> >> >> > wrote: >> >> >> >> >> >> >> >> Dear all, >> >> >> >> >> >> >> >> I take this opportunity to ask for your important suggestion. >> >> >> >> >> >> >> >> I am solving an elastic-acoustic-gravity equation on the planet. >> >> >> >> I >> >> >> >> have displacement vector (ux,uy,uz) in solid region, displacement >> >> >> >> potential (\xi) and pressure (p) in fluid region, and >> >> >> >> gravitational >> >> >> >> potential (\phi) in all of space. All these variables are >> >> >> >> coupled. >> >> >> >> >> >> >> >> Currently, I am using MATMPIAIJ and form a single global matrix. >> >> >> >> Does >> >> >> >> using a MATMPIBIJ or MATNEST improve the convergence/efficiency >> >> >> >> in >> >> >> >> this case? For your information, total degrees of freedoms are >> >> >> >> about >> >> >> >> a >> >> >> >> billion. >> >> >> > >> >> >> > >> >> >> > 1) For any solver question, we need to see the output of >> >> >> > -ksp_view, >> >> >> > and >> >> >> > we >> >> >> > would also like >> >> >> > >> >> >> > -ksp_monitor_true_residual -ksp_converged_reason >> >> >> > >> >> >> > 2) MATNEST does not affect convergence, and MATMPIBAIJ only in the >> >> >> > blocksize >> >> >> > which you >> >> >> > could set without that format >> >> >> > >> >> >> > 3) However, you might see benefit from using something like >> >> >> > PCFIELDSPLIT >> >> >> > if >> >> >> > you have multiphysics here >> >> >> > >> >> >> > Matt >> >> >> > >> >> >> >> >> >> >> >> Any suggestion would be greatly appreciated. >> >> >> >> >> >> >> >> Thanks, >> >> >> >> Hom Nath >> >> >> >> >> >> >> >> On Fri, Jan 22, 2016 at 10:32 AM, Matthew Knepley >> >> >> >> >> >> >> >> wrote: >> >> >> >> > On Fri, Jan 22, 2016 at 9:27 AM, Mark Adams >> >> >> >> > wrote: >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> I said the Hypre setup cost is not scalable, >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> I'd be a little careful here. Scaling for the matrix triple >> >> >> >> >> product >> >> >> >> >> is >> >> >> >> >> hard and hypre does put effort into scaling. I don't have any >> >> >> >> >> data >> >> >> >> >> however. >> >> >> >> >> Do you? >> >> >> >> > >> >> >> >> > >> >> >> >> > I used it for PyLith and saw this. I did not think any AMG had >> >> >> >> > scalable >> >> >> >> > setup time. >> >> >> >> > >> >> >> >> > Matt >> >> >> >> > >> >> >> >> >>> >> >> >> >> >>> but it can be amortized over the iterations. You can quantify >> >> >> >> >>> this >> >> >> >> >>> just by looking at the PCSetUp time as your increase the >> >> >> >> >>> number >> >> >> >> >>> of >> >> >> >> >>> processes. I don't think they have a good >> >> >> >> >>> model for the memory usage, and if they do, I do not know >> >> >> >> >>> what >> >> >> >> >>> it >> >> >> >> >>> is. >> >> >> >> >>> However, generally Hypre takes more >> >> >> >> >>> memory than the agglomeration MG like ML or GAMG. >> >> >> >> >>> >> >> >> >> >> >> >> >> >> >> agglomerations methods tend to have lower "grid complexity", >> >> >> >> >> that >> >> >> >> >> is >> >> >> >> >> smaller coarse grids, than classic AMG like in hypre. THis is >> >> >> >> >> more >> >> >> >> >> of a >> >> >> >> >> constant complexity and not a scaling issue though. You can >> >> >> >> >> address >> >> >> >> >> this >> >> >> >> >> with parameters to some extent. But for elasticity, you want >> >> >> >> >> to >> >> >> >> >> at >> >> >> >> >> least >> >> >> >> >> try, if not start with, GAMG or ML. >> >> >> >> >> >> >> >> >> >>> >> >> >> >> >>> Thanks, >> >> >> >> >>> >> >> >> >> >>> Matt >> >> >> >> >>> >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >>>> Giang >> >> >> >> >>>> >> >> >> >> >>>> On Mon, Jan 18, 2016 at 5:25 PM, Jed Brown >> >> >> >> >>>> >> >> >> >> >>>> wrote: >> >> >> >> >>>>> >> >> >> >> >>>>> Hoang Giang Bui writes: >> >> >> >> >>>>> >> >> >> >> >>>>> > Why P2/P2 is not for co-located discretization? >> >> >> >> >>>>> >> >> >> >> >>>>> Matt typed "P2/P2" when me meant "P2/P1". >> >> >> >> >>>> >> >> >> >> >>>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> -- >> >> >> >> >>> What most experimenters take for granted before they begin >> >> >> >> >>> their >> >> >> >> >>> experiments is infinitely more interesting than any results >> >> >> >> >>> to >> >> >> >> >>> which >> >> >> >> >>> their >> >> >> >> >>> experiments lead. >> >> >> >> >>> -- Norbert Wiener >> >> >> >> >> >> >> >> >> >> >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > -- >> >> >> >> > What most experimenters take for granted before they begin >> >> >> >> > their >> >> >> >> > experiments >> >> >> >> > is infinitely more interesting than any results to which their >> >> >> >> > experiments >> >> >> >> > lead. >> >> >> >> > -- Norbert Wiener >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > -- >> >> >> > What most experimenters take for granted before they begin their >> >> >> > experiments >> >> >> > is infinitely more interesting than any results to which their >> >> >> > experiments >> >> >> > lead. >> >> >> > -- Norbert Wiener >> >> > >> >> > >> >> > >> >> > >> >> > -- >> >> > What most experimenters take for granted before they begin their >> >> > experiments >> >> > is infinitely more interesting than any results to which their >> >> > experiments >> >> > lead. >> >> > -- Norbert Wiener >> > >> > >> > >> > >> > -- >> > What most experimenters take for granted before they begin their >> > experiments >> > is infinitely more interesting than any results to which their >> > experiments >> > lead. >> > -- Norbert Wiener > > > > > -- > What most experimenters take for granted before they begin their experiments > is infinitely more interesting than any results to which their experiments > lead. > -- Norbert Wiener From knepley at gmail.com Fri Jan 22 16:06:09 2016 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 22 Jan 2016 16:06:09 -0600 Subject: [petsc-users] Why use MATMPIBAIJ? In-Reply-To: References: <2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov> <87fuy07zvi.fsf@jedbrown.org> <87si1ug8hl.fsf@jedbrown.org> Message-ID: On Fri, Jan 22, 2016 at 3:47 PM, Hom Nath Gharti wrote: > Hi Matt, > > SPECFEM currently has only an explicit time scheme and does not have > full gravity implemented. I am adding implicit time scheme and full > gravity so that it can be used for interesting quasistatic problems > such as glacial rebound, post seismic relaxation etc. I am using Petsc > as a linear solver which I would like to see GPU implemented. > Why? It really does not make sense for those operations. It is an unfortunate fact, but the usefulness of GPUs has been oversold. You can certainly get some mileage out of a SpMV on the GPU, but there the maximum win is maybe 2x or less for a nice CPU, and then you have to account for transfer time and other latencies. Unless you have a really compelling case, I would not waste your time. To come to this opinion, I used years of my own time looking at GPUs. Thanks, Matt > Thanks, > Hom Nath > > On Fri, Jan 22, 2016 at 4:33 PM, Matthew Knepley > wrote: > > On Fri, Jan 22, 2016 at 12:17 PM, Hom Nath Gharti > > wrote: > >> > >> Thanks Matt for great suggestion. One last question, do you know > >> whether the GPU capability of current PETSC version is matured enough > >> to try for my problem? > > > > > > The only thing that would really make sense to do on the GPU is the SEM > > integration, which > > would not be part of PETSc. This is what SPECFEM has optimized. > > > > Thanks, > > > > Matt > > > >> > >> Thanks again for your help. > >> Hom Nath > >> > >> On Fri, Jan 22, 2016 at 1:07 PM, Matthew Knepley > >> wrote: > >> > On Fri, Jan 22, 2016 at 11:47 AM, Hom Nath Gharti < > hng.email at gmail.com> > >> > wrote: > >> >> > >> >> Thanks a lot. > >> >> > >> >> With AMG it did not converge within the iteration limit of 3000. > >> >> > >> >> In solid: elastic wave equation with added gravity term \rho > \nabla\phi > >> >> In fluid: acoustic wave equation with added gravity term \rho > >> >> \nabla\phi > >> >> Both solid and fluid: Poisson's equation for gravity > >> >> Outer space: Laplace's equation for gravity > >> >> > >> >> We combine so called mapped infinite element with spectral-element > >> >> method (higher order FEM that uses nodal quadrature) and solve in > >> >> frequency domain. > >> > > >> > > >> > 1) The Poisson and Laplace equation should be using MG, however you > are > >> > using SEM, so > >> > you would need to use a low order PC for the high order problem, > >> > also > >> > called p-MG (Paul Fischer), see > >> > > >> > http://epubs.siam.org/doi/abs/10.1137/110834512 > >> > > >> > 2) The acoustic wave equation is Helmholtz to us, and that needs > special > >> > MG > >> > tweaks that > >> > are still research material so I can understand using ASM. > >> > > >> > 3) Same thing for the elastic wave equations. Some people say they > have > >> > this > >> > solved using > >> > hierarchical matrix methods, something like > >> > > >> > http://portal.nersc.gov/project/sparse/strumpack/ > >> > > >> > However, I think the jury is still out. > >> > > >> > If you can do 100 iterations of plain vanilla solvers, that seems > like a > >> > win > >> > right now. You might improve > >> > the time using FS, but I am not sure about the iterations on the > smaller > >> > problem. > >> > > >> > Thanks, > >> > > >> > Matt > >> > > >> >> > >> >> Hom Nath > >> >> > >> >> On Fri, Jan 22, 2016 at 12:16 PM, Matthew Knepley > > >> >> wrote: > >> >> > On Fri, Jan 22, 2016 at 11:10 AM, Hom Nath Gharti > >> >> > > >> >> > wrote: > >> >> >> > >> >> >> Thanks Matt. > >> >> >> > >> >> >> Attached detailed info on ksp of a much smaller test. This is a > >> >> >> multiphysics problem. > >> >> > > >> >> > > >> >> > You are using FGMRES/ASM(ILU0). From your description below, this > >> >> > sounds > >> >> > like > >> >> > an elliptic system. I would at least try AMG (-pc_type gamg) to see > >> >> > how > >> >> > it > >> >> > does. Any > >> >> > other advice would have to be based on seeing the equations. > >> >> > > >> >> > Thanks, > >> >> > > >> >> > Matt > >> >> > > >> >> >> > >> >> >> Hom Nath > >> >> >> > >> >> >> On Fri, Jan 22, 2016 at 12:01 PM, Matthew Knepley > >> >> >> > >> >> >> wrote: > >> >> >> > On Fri, Jan 22, 2016 at 10:52 AM, Hom Nath Gharti > >> >> >> > > >> >> >> > wrote: > >> >> >> >> > >> >> >> >> Dear all, > >> >> >> >> > >> >> >> >> I take this opportunity to ask for your important suggestion. > >> >> >> >> > >> >> >> >> I am solving an elastic-acoustic-gravity equation on the > planet. > >> >> >> >> I > >> >> >> >> have displacement vector (ux,uy,uz) in solid region, > displacement > >> >> >> >> potential (\xi) and pressure (p) in fluid region, and > >> >> >> >> gravitational > >> >> >> >> potential (\phi) in all of space. All these variables are > >> >> >> >> coupled. > >> >> >> >> > >> >> >> >> Currently, I am using MATMPIAIJ and form a single global > matrix. > >> >> >> >> Does > >> >> >> >> using a MATMPIBIJ or MATNEST improve the convergence/efficiency > >> >> >> >> in > >> >> >> >> this case? For your information, total degrees of freedoms are > >> >> >> >> about > >> >> >> >> a > >> >> >> >> billion. > >> >> >> > > >> >> >> > > >> >> >> > 1) For any solver question, we need to see the output of > >> >> >> > -ksp_view, > >> >> >> > and > >> >> >> > we > >> >> >> > would also like > >> >> >> > > >> >> >> > -ksp_monitor_true_residual -ksp_converged_reason > >> >> >> > > >> >> >> > 2) MATNEST does not affect convergence, and MATMPIBAIJ only in > the > >> >> >> > blocksize > >> >> >> > which you > >> >> >> > could set without that format > >> >> >> > > >> >> >> > 3) However, you might see benefit from using something like > >> >> >> > PCFIELDSPLIT > >> >> >> > if > >> >> >> > you have multiphysics here > >> >> >> > > >> >> >> > Matt > >> >> >> > > >> >> >> >> > >> >> >> >> Any suggestion would be greatly appreciated. > >> >> >> >> > >> >> >> >> Thanks, > >> >> >> >> Hom Nath > >> >> >> >> > >> >> >> >> On Fri, Jan 22, 2016 at 10:32 AM, Matthew Knepley > >> >> >> >> > >> >> >> >> wrote: > >> >> >> >> > On Fri, Jan 22, 2016 at 9:27 AM, Mark Adams > > >> >> >> >> > wrote: > >> >> >> >> >>> > >> >> >> >> >>> > >> >> >> >> >>> > >> >> >> >> >>> I said the Hypre setup cost is not scalable, > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> I'd be a little careful here. Scaling for the matrix triple > >> >> >> >> >> product > >> >> >> >> >> is > >> >> >> >> >> hard and hypre does put effort into scaling. I don't have > any > >> >> >> >> >> data > >> >> >> >> >> however. > >> >> >> >> >> Do you? > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > I used it for PyLith and saw this. I did not think any AMG > had > >> >> >> >> > scalable > >> >> >> >> > setup time. > >> >> >> >> > > >> >> >> >> > Matt > >> >> >> >> > > >> >> >> >> >>> > >> >> >> >> >>> but it can be amortized over the iterations. You can > quantify > >> >> >> >> >>> this > >> >> >> >> >>> just by looking at the PCSetUp time as your increase the > >> >> >> >> >>> number > >> >> >> >> >>> of > >> >> >> >> >>> processes. I don't think they have a good > >> >> >> >> >>> model for the memory usage, and if they do, I do not know > >> >> >> >> >>> what > >> >> >> >> >>> it > >> >> >> >> >>> is. > >> >> >> >> >>> However, generally Hypre takes more > >> >> >> >> >>> memory than the agglomeration MG like ML or GAMG. > >> >> >> >> >>> > >> >> >> >> >> > >> >> >> >> >> agglomerations methods tend to have lower "grid complexity", > >> >> >> >> >> that > >> >> >> >> >> is > >> >> >> >> >> smaller coarse grids, than classic AMG like in hypre. THis > is > >> >> >> >> >> more > >> >> >> >> >> of a > >> >> >> >> >> constant complexity and not a scaling issue though. You can > >> >> >> >> >> address > >> >> >> >> >> this > >> >> >> >> >> with parameters to some extent. But for elasticity, you want > >> >> >> >> >> to > >> >> >> >> >> at > >> >> >> >> >> least > >> >> >> >> >> try, if not start with, GAMG or ML. > >> >> >> >> >> > >> >> >> >> >>> > >> >> >> >> >>> Thanks, > >> >> >> >> >>> > >> >> >> >> >>> Matt > >> >> >> >> >>> > >> >> >> >> >>>> > >> >> >> >> >>>> > >> >> >> >> >>>> Giang > >> >> >> >> >>>> > >> >> >> >> >>>> On Mon, Jan 18, 2016 at 5:25 PM, Jed Brown > >> >> >> >> >>>> > >> >> >> >> >>>> wrote: > >> >> >> >> >>>>> > >> >> >> >> >>>>> Hoang Giang Bui writes: > >> >> >> >> >>>>> > >> >> >> >> >>>>> > Why P2/P2 is not for co-located discretization? > >> >> >> >> >>>>> > >> >> >> >> >>>>> Matt typed "P2/P2" when me meant "P2/P1". > >> >> >> >> >>>> > >> >> >> >> >>>> > >> >> >> >> >>> > >> >> >> >> >>> > >> >> >> >> >>> > >> >> >> >> >>> -- > >> >> >> >> >>> What most experimenters take for granted before they begin > >> >> >> >> >>> their > >> >> >> >> >>> experiments is infinitely more interesting than any results > >> >> >> >> >>> to > >> >> >> >> >>> which > >> >> >> >> >>> their > >> >> >> >> >>> experiments lead. > >> >> >> >> >>> -- Norbert Wiener > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > -- > >> >> >> >> > What most experimenters take for granted before they begin > >> >> >> >> > their > >> >> >> >> > experiments > >> >> >> >> > is infinitely more interesting than any results to which > their > >> >> >> >> > experiments > >> >> >> >> > lead. > >> >> >> >> > -- Norbert Wiener > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > -- > >> >> >> > What most experimenters take for granted before they begin their > >> >> >> > experiments > >> >> >> > is infinitely more interesting than any results to which their > >> >> >> > experiments > >> >> >> > lead. > >> >> >> > -- Norbert Wiener > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > -- > >> >> > What most experimenters take for granted before they begin their > >> >> > experiments > >> >> > is infinitely more interesting than any results to which their > >> >> > experiments > >> >> > lead. > >> >> > -- Norbert Wiener > >> > > >> > > >> > > >> > > >> > -- > >> > What most experimenters take for granted before they begin their > >> > experiments > >> > is infinitely more interesting than any results to which their > >> > experiments > >> > lead. > >> > -- Norbert Wiener > > > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments > > is infinitely more interesting than any results to which their > experiments > > lead. > > -- Norbert Wiener > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From hng.email at gmail.com Fri Jan 22 16:11:14 2016 From: hng.email at gmail.com (Hom Nath Gharti) Date: Fri, 22 Jan 2016 17:11:14 -0500 Subject: [petsc-users] Why use MATMPIBAIJ? In-Reply-To: References: <2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov> <87fuy07zvi.fsf@jedbrown.org> <87si1ug8hl.fsf@jedbrown.org> Message-ID: Thanks for your suggestions! If it's just 2X, I will not waste my time! Hom Nath On Fri, Jan 22, 2016 at 5:06 PM, Matthew Knepley wrote: > On Fri, Jan 22, 2016 at 3:47 PM, Hom Nath Gharti > wrote: >> >> Hi Matt, >> >> SPECFEM currently has only an explicit time scheme and does not have >> full gravity implemented. I am adding implicit time scheme and full >> gravity so that it can be used for interesting quasistatic problems >> such as glacial rebound, post seismic relaxation etc. I am using Petsc >> as a linear solver which I would like to see GPU implemented. > > > Why? It really does not make sense for those operations. > > It is an unfortunate fact, but the usefulness of GPUs has been oversold. You > can certainly > get some mileage out of a SpMV on the GPU, but there the maximum win is > maybe 2x or > less for a nice CPU, and then you have to account for transfer time and > other latencies. > Unless you have a really compelling case, I would not waste your time. > > To come to this opinion, I used years of my own time looking at GPUs. > > Thanks, > > Matt > >> >> Thanks, >> Hom Nath >> >> On Fri, Jan 22, 2016 at 4:33 PM, Matthew Knepley >> wrote: >> > On Fri, Jan 22, 2016 at 12:17 PM, Hom Nath Gharti >> > wrote: >> >> >> >> Thanks Matt for great suggestion. One last question, do you know >> >> whether the GPU capability of current PETSC version is matured enough >> >> to try for my problem? >> > >> > >> > The only thing that would really make sense to do on the GPU is the SEM >> > integration, which >> > would not be part of PETSc. This is what SPECFEM has optimized. >> > >> > Thanks, >> > >> > Matt >> > >> >> >> >> Thanks again for your help. >> >> Hom Nath >> >> >> >> On Fri, Jan 22, 2016 at 1:07 PM, Matthew Knepley >> >> wrote: >> >> > On Fri, Jan 22, 2016 at 11:47 AM, Hom Nath Gharti >> >> > >> >> > wrote: >> >> >> >> >> >> Thanks a lot. >> >> >> >> >> >> With AMG it did not converge within the iteration limit of 3000. >> >> >> >> >> >> In solid: elastic wave equation with added gravity term \rho >> >> >> \nabla\phi >> >> >> In fluid: acoustic wave equation with added gravity term \rho >> >> >> \nabla\phi >> >> >> Both solid and fluid: Poisson's equation for gravity >> >> >> Outer space: Laplace's equation for gravity >> >> >> >> >> >> We combine so called mapped infinite element with spectral-element >> >> >> method (higher order FEM that uses nodal quadrature) and solve in >> >> >> frequency domain. >> >> > >> >> > >> >> > 1) The Poisson and Laplace equation should be using MG, however you >> >> > are >> >> > using SEM, so >> >> > you would need to use a low order PC for the high order problem, >> >> > also >> >> > called p-MG (Paul Fischer), see >> >> > >> >> > http://epubs.siam.org/doi/abs/10.1137/110834512 >> >> > >> >> > 2) The acoustic wave equation is Helmholtz to us, and that needs >> >> > special >> >> > MG >> >> > tweaks that >> >> > are still research material so I can understand using ASM. >> >> > >> >> > 3) Same thing for the elastic wave equations. Some people say they >> >> > have >> >> > this >> >> > solved using >> >> > hierarchical matrix methods, something like >> >> > >> >> > http://portal.nersc.gov/project/sparse/strumpack/ >> >> > >> >> > However, I think the jury is still out. >> >> > >> >> > If you can do 100 iterations of plain vanilla solvers, that seems >> >> > like a >> >> > win >> >> > right now. You might improve >> >> > the time using FS, but I am not sure about the iterations on the >> >> > smaller >> >> > problem. >> >> > >> >> > Thanks, >> >> > >> >> > Matt >> >> > >> >> >> >> >> >> Hom Nath >> >> >> >> >> >> On Fri, Jan 22, 2016 at 12:16 PM, Matthew Knepley >> >> >> >> >> >> wrote: >> >> >> > On Fri, Jan 22, 2016 at 11:10 AM, Hom Nath Gharti >> >> >> > >> >> >> > wrote: >> >> >> >> >> >> >> >> Thanks Matt. >> >> >> >> >> >> >> >> Attached detailed info on ksp of a much smaller test. This is a >> >> >> >> multiphysics problem. >> >> >> > >> >> >> > >> >> >> > You are using FGMRES/ASM(ILU0). From your description below, this >> >> >> > sounds >> >> >> > like >> >> >> > an elliptic system. I would at least try AMG (-pc_type gamg) to >> >> >> > see >> >> >> > how >> >> >> > it >> >> >> > does. Any >> >> >> > other advice would have to be based on seeing the equations. >> >> >> > >> >> >> > Thanks, >> >> >> > >> >> >> > Matt >> >> >> > >> >> >> >> >> >> >> >> Hom Nath >> >> >> >> >> >> >> >> On Fri, Jan 22, 2016 at 12:01 PM, Matthew Knepley >> >> >> >> >> >> >> >> wrote: >> >> >> >> > On Fri, Jan 22, 2016 at 10:52 AM, Hom Nath Gharti >> >> >> >> > >> >> >> >> > wrote: >> >> >> >> >> >> >> >> >> >> Dear all, >> >> >> >> >> >> >> >> >> >> I take this opportunity to ask for your important suggestion. >> >> >> >> >> >> >> >> >> >> I am solving an elastic-acoustic-gravity equation on the >> >> >> >> >> planet. >> >> >> >> >> I >> >> >> >> >> have displacement vector (ux,uy,uz) in solid region, >> >> >> >> >> displacement >> >> >> >> >> potential (\xi) and pressure (p) in fluid region, and >> >> >> >> >> gravitational >> >> >> >> >> potential (\phi) in all of space. All these variables are >> >> >> >> >> coupled. >> >> >> >> >> >> >> >> >> >> Currently, I am using MATMPIAIJ and form a single global >> >> >> >> >> matrix. >> >> >> >> >> Does >> >> >> >> >> using a MATMPIBIJ or MATNEST improve the >> >> >> >> >> convergence/efficiency >> >> >> >> >> in >> >> >> >> >> this case? For your information, total degrees of freedoms are >> >> >> >> >> about >> >> >> >> >> a >> >> >> >> >> billion. >> >> >> >> > >> >> >> >> > >> >> >> >> > 1) For any solver question, we need to see the output of >> >> >> >> > -ksp_view, >> >> >> >> > and >> >> >> >> > we >> >> >> >> > would also like >> >> >> >> > >> >> >> >> > -ksp_monitor_true_residual -ksp_converged_reason >> >> >> >> > >> >> >> >> > 2) MATNEST does not affect convergence, and MATMPIBAIJ only in >> >> >> >> > the >> >> >> >> > blocksize >> >> >> >> > which you >> >> >> >> > could set without that format >> >> >> >> > >> >> >> >> > 3) However, you might see benefit from using something like >> >> >> >> > PCFIELDSPLIT >> >> >> >> > if >> >> >> >> > you have multiphysics here >> >> >> >> > >> >> >> >> > Matt >> >> >> >> > >> >> >> >> >> >> >> >> >> >> Any suggestion would be greatly appreciated. >> >> >> >> >> >> >> >> >> >> Thanks, >> >> >> >> >> Hom Nath >> >> >> >> >> >> >> >> >> >> On Fri, Jan 22, 2016 at 10:32 AM, Matthew Knepley >> >> >> >> >> >> >> >> >> >> wrote: >> >> >> >> >> > On Fri, Jan 22, 2016 at 9:27 AM, Mark Adams >> >> >> >> >> > >> >> >> >> >> > wrote: >> >> >> >> >> >>> >> >> >> >> >> >>> >> >> >> >> >> >>> >> >> >> >> >> >>> I said the Hypre setup cost is not scalable, >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> I'd be a little careful here. Scaling for the matrix >> >> >> >> >> >> triple >> >> >> >> >> >> product >> >> >> >> >> >> is >> >> >> >> >> >> hard and hypre does put effort into scaling. I don't have >> >> >> >> >> >> any >> >> >> >> >> >> data >> >> >> >> >> >> however. >> >> >> >> >> >> Do you? >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > I used it for PyLith and saw this. I did not think any AMG >> >> >> >> >> > had >> >> >> >> >> > scalable >> >> >> >> >> > setup time. >> >> >> >> >> > >> >> >> >> >> > Matt >> >> >> >> >> > >> >> >> >> >> >>> >> >> >> >> >> >>> but it can be amortized over the iterations. You can >> >> >> >> >> >>> quantify >> >> >> >> >> >>> this >> >> >> >> >> >>> just by looking at the PCSetUp time as your increase the >> >> >> >> >> >>> number >> >> >> >> >> >>> of >> >> >> >> >> >>> processes. I don't think they have a good >> >> >> >> >> >>> model for the memory usage, and if they do, I do not know >> >> >> >> >> >>> what >> >> >> >> >> >>> it >> >> >> >> >> >>> is. >> >> >> >> >> >>> However, generally Hypre takes more >> >> >> >> >> >>> memory than the agglomeration MG like ML or GAMG. >> >> >> >> >> >>> >> >> >> >> >> >> >> >> >> >> >> >> agglomerations methods tend to have lower "grid >> >> >> >> >> >> complexity", >> >> >> >> >> >> that >> >> >> >> >> >> is >> >> >> >> >> >> smaller coarse grids, than classic AMG like in hypre. THis >> >> >> >> >> >> is >> >> >> >> >> >> more >> >> >> >> >> >> of a >> >> >> >> >> >> constant complexity and not a scaling issue though. You >> >> >> >> >> >> can >> >> >> >> >> >> address >> >> >> >> >> >> this >> >> >> >> >> >> with parameters to some extent. But for elasticity, you >> >> >> >> >> >> want >> >> >> >> >> >> to >> >> >> >> >> >> at >> >> >> >> >> >> least >> >> >> >> >> >> try, if not start with, GAMG or ML. >> >> >> >> >> >> >> >> >> >> >> >>> >> >> >> >> >> >>> Thanks, >> >> >> >> >> >>> >> >> >> >> >> >>> Matt >> >> >> >> >> >>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> Giang >> >> >> >> >> >>>> >> >> >> >> >> >>>> On Mon, Jan 18, 2016 at 5:25 PM, Jed Brown >> >> >> >> >> >>>> >> >> >> >> >> >>>> wrote: >> >> >> >> >> >>>>> >> >> >> >> >> >>>>> Hoang Giang Bui writes: >> >> >> >> >> >>>>> >> >> >> >> >> >>>>> > Why P2/P2 is not for co-located discretization? >> >> >> >> >> >>>>> >> >> >> >> >> >>>>> Matt typed "P2/P2" when me meant "P2/P1". >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>> >> >> >> >> >> >>> >> >> >> >> >> >>> >> >> >> >> >> >>> -- >> >> >> >> >> >>> What most experimenters take for granted before they begin >> >> >> >> >> >>> their >> >> >> >> >> >>> experiments is infinitely more interesting than any >> >> >> >> >> >>> results >> >> >> >> >> >>> to >> >> >> >> >> >>> which >> >> >> >> >> >>> their >> >> >> >> >> >>> experiments lead. >> >> >> >> >> >>> -- Norbert Wiener >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > -- >> >> >> >> >> > What most experimenters take for granted before they begin >> >> >> >> >> > their >> >> >> >> >> > experiments >> >> >> >> >> > is infinitely more interesting than any results to which >> >> >> >> >> > their >> >> >> >> >> > experiments >> >> >> >> >> > lead. >> >> >> >> >> > -- Norbert Wiener >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > -- >> >> >> >> > What most experimenters take for granted before they begin >> >> >> >> > their >> >> >> >> > experiments >> >> >> >> > is infinitely more interesting than any results to which their >> >> >> >> > experiments >> >> >> >> > lead. >> >> >> >> > -- Norbert Wiener >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > -- >> >> >> > What most experimenters take for granted before they begin their >> >> >> > experiments >> >> >> > is infinitely more interesting than any results to which their >> >> >> > experiments >> >> >> > lead. >> >> >> > -- Norbert Wiener >> >> > >> >> > >> >> > >> >> > >> >> > -- >> >> > What most experimenters take for granted before they begin their >> >> > experiments >> >> > is infinitely more interesting than any results to which their >> >> > experiments >> >> > lead. >> >> > -- Norbert Wiener >> > >> > >> > >> > >> > -- >> > What most experimenters take for granted before they begin their >> > experiments >> > is infinitely more interesting than any results to which their >> > experiments >> > lead. >> > -- Norbert Wiener > > > > > -- > What most experimenters take for granted before they begin their experiments > is infinitely more interesting than any results to which their experiments > lead. > -- Norbert Wiener From jychang48 at gmail.com Fri Jan 22 16:27:00 2016 From: jychang48 at gmail.com (Justin Chang) Date: Fri, 22 Jan 2016 15:27:00 -0700 Subject: [petsc-users] Optimization methods in PETSc/TAO Message-ID: Hi all, Consider the following problem: minimize 1/2 - subject to c >= 0 (P1) To solve (P1) using TAO, I recall that there were two recommended solvers to use: TRON and BLMVM I recently got reviews for this paper of mine that uses BLMVM and got hammered for this, as I quote, "convenient yet inadequate choice" of solver. It was suggested that I use either semi smooth Newton methods or projected Newton methods for the optimization problem. My question is, are these methodologies/solvers available currently within PETSc/TAO? 1) I see that we have SNESVINEWTONSSLS, and I tried this over half a year ago but it didn't seem to work. I believe I was told by one of the PETSc developers (Matt?) that this was not the one to use? 2) Is TRON a type of projected Newton method? I know it's an active-set Newton trust region, but is this a well-accepted high performing optimization method to use? I was also referred to ROL: https://trilinos.org/packages/rol but I am guessing this isn't accessible/downloadable from petsc at the moment? Thanks, Justin -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Jan 22 16:42:20 2016 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 22 Jan 2016 16:42:20 -0600 Subject: [petsc-users] Optimization methods in PETSc/TAO In-Reply-To: References: Message-ID: On Fri, Jan 22, 2016 at 4:27 PM, Justin Chang wrote: > Hi all, > > Consider the following problem: > > minimize 1/2 - > subject to c >= 0 (P1) > > To solve (P1) using TAO, I recall that there were two recommended solvers > to use: TRON and BLMVM > > I recently got reviews for this paper of mine that uses BLMVM and got > hammered for this, as I quote, "convenient yet inadequate choice" of > solver. > If they did not back this up with a citation it is just empty snobbery, not surprising from some quarters. > It was suggested that I use either semi smooth Newton methods or > projected Newton methods for the optimization problem. My question is, are > these methodologies/solvers available currently within PETSc/TAO? > You can Google TRON and BLMVM and they come up on the NEOS pages. BLMVM is a gradient descent method, but TRON is a Newton method, so trying it may silence the doubters. Matt > 1) I see that we have SNESVINEWTONSSLS, and I tried this over half a year > ago but it didn't seem to work. I believe I was told by one of the PETSc > developers (Matt?) that this was not the one to use? > > 2) Is TRON a type of projected Newton method? I know it's an active-set > Newton trust region, but is this a well-accepted high performing > optimization method to use? > > I was also referred to ROL: https://trilinos.org/packages/rol but I am > guessing this isn't accessible/downloadable from petsc at the moment? > > Thanks, > Justin > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jychang48 at gmail.com Fri Jan 22 16:57:45 2016 From: jychang48 at gmail.com (Justin Chang) Date: Fri, 22 Jan 2016 15:57:45 -0700 Subject: [petsc-users] Optimization methods in PETSc/TAO In-Reply-To: References: Message-ID: This was one of the citations provided: M. Ulbrich, "Semismooth Newton Methods for Variational Inequalities and Constrained Optimization Problems in Function Spaces", SIAM, 2011, Haven't looked into this in detail, but is what's described in that equivalent to the SNESVINEWTONSSLS? On Fri, Jan 22, 2016 at 3:42 PM, Matthew Knepley wrote: > On Fri, Jan 22, 2016 at 4:27 PM, Justin Chang wrote: > >> Hi all, >> >> Consider the following problem: >> >> minimize 1/2 - >> subject to c >= 0 (P1) >> >> To solve (P1) using TAO, I recall that there were two recommended solvers >> to use: TRON and BLMVM >> >> I recently got reviews for this paper of mine that uses BLMVM and got >> hammered for this, as I quote, "convenient yet inadequate choice" of >> solver. >> > > If they did not back this up with a citation it is just empty snobbery, > not surprising from some quarters. > > >> It was suggested that I use either semi smooth Newton methods or >> projected Newton methods for the optimization problem. My question is, are >> these methodologies/solvers available currently within PETSc/TAO? >> > > You can Google TRON and BLMVM and they come up on the NEOS pages. BLMVM is > a gradient descent method, but > TRON is a Newton method, so trying it may silence the doubters. > > Matt > > >> 1) I see that we have SNESVINEWTONSSLS, and I tried this over half a year >> ago but it didn't seem to work. I believe I was told by one of the PETSc >> developers (Matt?) that this was not the one to use? >> >> 2) Is TRON a type of projected Newton method? I know it's an active-set >> Newton trust region, but is this a well-accepted high performing >> optimization method to use? >> >> I was also referred to ROL: https://trilinos.org/packages/rol but I am >> guessing this isn't accessible/downloadable from petsc at the moment? >> >> Thanks, >> Justin >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sun Jan 24 05:26:24 2016 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 24 Jan 2016 05:26:24 -0600 Subject: [petsc-users] PCFIELDSPLIT question In-Reply-To: References: Message-ID: On Fri, Jan 22, 2016 at 2:19 PM, Hom Nath Gharti wrote: > Dear all, > > I am new to PcFieldSplit. > > I have a matrix formed using MATMPIAIJ. Is it possible to use > PCFIELDSPLIT operations in this type of matrix? Or does it have to be > MATMPIBIJ or MATNEST format? > Yes, you can split AIJ. > If possible for MATMPIAIJ, could anybody provide me a simple example > or few steps? Variables in the equations are displacement vector, > scalar potential and pressure. > If you do not have a collocated discretization, then you have to use http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCFieldSplitSetIS.html Thanks, Matt > Thanks for help. > > Hom Nath > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From hng.email at gmail.com Sun Jan 24 18:14:34 2016 From: hng.email at gmail.com (Hom Nath Gharti) Date: Sun, 24 Jan 2016 19:14:34 -0500 Subject: [petsc-users] PCFIELDSPLIT question In-Reply-To: References: Message-ID: Thank you so much Matt! I will try. Hom Nath On Sun, Jan 24, 2016 at 6:26 AM, Matthew Knepley wrote: > On Fri, Jan 22, 2016 at 2:19 PM, Hom Nath Gharti > wrote: >> >> Dear all, >> >> I am new to PcFieldSplit. >> >> I have a matrix formed using MATMPIAIJ. Is it possible to use >> PCFIELDSPLIT operations in this type of matrix? Or does it have to be >> MATMPIBIJ or MATNEST format? > > > Yes, you can split AIJ. > >> >> If possible for MATMPIAIJ, could anybody provide me a simple example >> or few steps? Variables in the equations are displacement vector, >> scalar potential and pressure. > > > If you do not have a collocated discretization, then you have to use > > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCFieldSplitSetIS.html > > Thanks, > > Matt > >> >> Thanks for help. >> >> Hom Nath > > > > > -- > What most experimenters take for granted before they begin their experiments > is infinitely more interesting than any results to which their experiments > lead. > -- Norbert Wiener From praveenpetsc at gmail.com Mon Jan 25 00:26:32 2016 From: praveenpetsc at gmail.com (praveen kumar) Date: Mon, 25 Jan 2016 11:56:32 +0530 Subject: [petsc-users] error message from GDB Message-ID: I am employing PETSc for DD in existing serial fortran code. the program ran for few seconds and showed segmentation fault core dumped. would anyone suggest how to fix this error message from GDB: Program received signal SIGSEGV, Segmentation fault. 0x00007ffff64f7cc0 in PetscCheckPointer (ptr=0x15e00007fff, dtype=PETSC_OBJECT) at /home/praveen/petsc/src/sys/error/checkptr.c:106 106 PETSC_UNUSED volatile PetscClassId classid = ((PetscObject)ptr)->classid; -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Jan 25 00:41:07 2016 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 25 Jan 2016 00:41:07 -0600 Subject: [petsc-users] [petsc-maint] error message from GDB In-Reply-To: References: Message-ID: We will need to see a stack trace. You can use the debugger, -start_in_debugger, to get one. Also, always send the whole error message. Thanks, Matt On Mon, Jan 25, 2016 at 12:26 AM, praveen kumar wrote: > I am employing PETSc for DD in existing serial fortran code. the program > ran for few seconds and showed segmentation fault core dumped. would anyone > suggest how to fix this > error message from GDB: > > Program received signal SIGSEGV, Segmentation fault. > 0x00007ffff64f7cc0 in PetscCheckPointer (ptr=0x15e00007fff, > dtype=PETSC_OBJECT) at /home/praveen/petsc/src/sys/error/checkptr.c:106 > 106 PETSC_UNUSED volatile PetscClassId classid = > ((PetscObject)ptr)->classid; > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From praveenpetsc at gmail.com Mon Jan 25 04:26:40 2016 From: praveenpetsc at gmail.com (praveen kumar) Date: Mon, 25 Jan 2016 15:56:40 +0530 Subject: [petsc-users] [petsc-maint] error message from GDB In-Reply-To: References: Message-ID: Thanks Matt. I ran with -start_in_debugger and fixed the error. The code is running but I can't figure out why the results are wrong when compared with serial code. if you get time, please go through the code. it is a simple 2D conduction code and I?ve employed DMDAcreate2D. Thanks, Praveen On Mon, Jan 25, 2016 at 12:11 PM, Matthew Knepley wrote: > We will need to see a stack trace. You can use the debugger, > -start_in_debugger, to get one. > > Also, always send the whole error message. > > Thanks, > > Matt > > On Mon, Jan 25, 2016 at 12:26 AM, praveen kumar > wrote: > >> I am employing PETSc for DD in existing serial fortran code. the program >> ran for few seconds and showed segmentation fault core dumped. would anyone >> suggest how to fix this >> error message from GDB: >> >> Program received signal SIGSEGV, Segmentation fault. >> 0x00007ffff64f7cc0 in PetscCheckPointer (ptr=0x15e00007fff, >> dtype=PETSC_OBJECT) at /home/praveen/petsc/src/sys/error/checkptr.c:106 >> 106 PETSC_UNUSED volatile PetscClassId classid = >> ((PetscObject)ptr)->classid; >> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test.F90 Type: text/x-fortran Size: 14263 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: input Type: application/octet-stream Size: 454 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: makefile Type: application/octet-stream Size: 515 bytes Desc: not available URL: From zocca.marco at gmail.com Mon Jan 25 04:34:00 2016 From: zocca.marco at gmail.com (Marco Zocca) Date: Mon, 25 Jan 2016 11:34:00 +0100 Subject: [petsc-users] erratic bug with nested PETSc and SLEPc Message-ID: Dear all, I have a simple code with a matrix filling, assembly and output to stdout. This in turn is wrapped within a SLEPc bracket, and that in turn in a PETSc bracket, both called with default options (`XInitializeNoArguments`). I run this on my laptop, MPI comm size == 1. Issue: _sometimes_ the above crashes upon exit from the SLEPc bracket (i.e. after printing out the matrix), with an error code > 8000 . I haven't found this documented anywhere. It's funny because this doesn't happen with probability 1 and no conditions change (running from a makefile). Other times it simply works. In general, when using both PETSc and SLEPc functionality, is it enough to link to SLEPc alone? Does it take care of importing all of PETSc? Any hints re. this behaviour? Thank you in advance and kind regards, Marco From torquil at gmail.com Mon Jan 25 04:58:41 2016 From: torquil at gmail.com (=?UTF-8?Q?Torquil_Macdonald_S=c3=b8rensen?=) Date: Mon, 25 Jan 2016 11:58:41 +0100 Subject: [petsc-users] Question about TSSetIJacobian examples Message-ID: <56A5FFE1.6030402@gmail.com> Hi! I have been looking at some of the PETSc examples where TSSetIJacobian, and there is one thing which is unclear to me. Consider e.g. the example: http://www.mcs.anl.gov/petsc/petsc-current/src/ts/examples/tutorials/ex8.c.html In the function RoberJacobian(), CEJacobian(), OregoJacobian(), there are two matrix function arguments A and B. The matrix A is the one that is actually set in the code. My question is: what is the purpose of MatAssemblyBegin(B,MAT_FINAL_ASSEMBLY); MatAssemblyEnd(B,MAT_FINAL_ASSEMBLY); when A != B at the end of the function? How does that piece of code affect B? In the documentation of these functions it says that they are to be called after e.g. MatSetValues. But MatSetValues have not been called on B in those functions, so that's why I'm wondering what those lines are for. Best regards, Torquil S?rensen From jroman at dsic.upv.es Mon Jan 25 05:03:01 2016 From: jroman at dsic.upv.es (Jose E. Roman) Date: Mon, 25 Jan 2016 12:03:01 +0100 Subject: [petsc-users] erratic bug with nested PETSc and SLEPc In-Reply-To: References: Message-ID: > El 25 ene 2016, a las 11:34, Marco Zocca escribi?: > > Dear all, > > I have a simple code with a matrix filling, assembly and output to stdout. > This in turn is wrapped within a SLEPc bracket, and that in turn in a > PETSc bracket, both called with default options > (`XInitializeNoArguments`). > > I run this on my laptop, MPI comm size == 1. > > Issue: _sometimes_ the above crashes upon exit from the SLEPc bracket > (i.e. after printing out the matrix), with an error code > 8000 . I > haven't found this documented anywhere. > It's funny because this doesn't happen with probability 1 and no > conditions change (running from a makefile). > > Other times it simply works. > > In general, when using both PETSc and SLEPc functionality, is it > enough to link to SLEPc alone? Does it take care of importing all of > PETSc? > > Any hints re. this behaviour? > > > Thank you in advance and kind regards, > > Marco SLEPc makefiles include PETSc makefiles. Generally you just need to include ${SLEPC_DIR}/lib/slepc/conf/slepc_common and then add e.g. ${SLEPC_EPS_LIB} in your link line, which in turn will add ${PETSC_KSP_LIB}. [Note: if you need other components of PETSc such as SNES or TS you may need to add these, but it is usually not necessary unless PETSc has been configured --with-single-library=0]. If you call both SlepcInitialize() and PetscInitialize(), in any order, it should work. It should work also with the NoArguments versions. So I don't know where the problem is. If you share a test code I could try to reproduce the problem. Jose From torquil at gmail.com Mon Jan 25 05:09:09 2016 From: torquil at gmail.com (=?UTF-8?Q?Torquil_Macdonald_S=c3=b8rensen?=) Date: Mon, 25 Jan 2016 12:09:09 +0100 Subject: [petsc-users] Question about TSSetIJacobian examples In-Reply-To: <56A5FFE1.6030402@gmail.com> References: <56A5FFE1.6030402@gmail.com> Message-ID: <56A60255.5000908@gmail.com> Sorry, I meant: what is the reason for MatAssemblyBegin/End being run for matrix A, in the case when A != B? Best regards, Torquil S?rensen On 25/01/16 11:58, Torquil Macdonald S?rensen wrote: > Hi! > > I have been looking at some of the PETSc examples where TSSetIJacobian, > and there is one thing which is unclear to me. Consider e.g. the example: > > http://www.mcs.anl.gov/petsc/petsc-current/src/ts/examples/tutorials/ex8.c.html > > In the function RoberJacobian(), CEJacobian(), OregoJacobian(), there > are two matrix function arguments A and B. The matrix A is the one that > is actually set in the code. My question is: what is the purpose of > > MatAssemblyBegin(B,MAT_FINAL_ASSEMBLY); > MatAssemblyEnd(B,MAT_FINAL_ASSEMBLY); > > when A != B at the end of the function? How does that piece of code > affect B? In the documentation of these functions it says that they are > to be called after e.g. MatSetValues. But MatSetValues have not been > called on B in those functions, so that's why I'm wondering what those > lines are for. > > Best regards, > Torquil S?rensen > From hgbk2008 at gmail.com Mon Jan 25 11:13:58 2016 From: hgbk2008 at gmail.com (Hoang Giang Bui) Date: Mon, 25 Jan 2016 18:13:58 +0100 Subject: [petsc-users] Why use MATMPIBAIJ? In-Reply-To: References: <2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov> <87fuy07zvi.fsf@jedbrown.org> <87si1ug8hl.fsf@jedbrown.org> Message-ID: OK, let's come back to my problem. I got your point about the interaction between components in one block. In my case, the interaction is strong. As you said, I try this: ierr = KSPSetFromOptions(ksp); CHKERRQ(ierr); ierr = PCFieldSplitGetSubKSP(pc, &nsplits, &sub_ksp); CHKERRQ(ierr); ksp_U = sub_ksp[0]; ierr = KSPGetOperators(ksp_U, &A_U, &P_U); CHKERRQ(ierr); ierr = MatSetBlockSize(A_U, 3); CHKERRQ(ierr); ierr = MatSetBlockSize(P_U, 3); CHKERRQ(ierr); ierr = PetscFree(sub_ksp); CHKERRQ(ierr); But it seems doesn't work. The output from -ksp_view shows that matrix passed to Hypre still has bs=1 KSP Object: (fieldsplit_u_) 8 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (fieldsplit_u_) 8 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.25 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type PMIS HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: (fieldsplit_u_) 8 MPI processes type: mpiaij rows=792333, cols=792333 total: nonzeros=1.39004e+08, allocated nonzeros=1.39004e+08 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 30057 nodes, limit used is 5 In other test, I can see the block size bs=3 in the section of Mat Object Regardless the setup cost of Hypre AMG, I saw it gives quite a radical performance, providing that the material parameters does not vary strongly, and the geometry is regular enough. Giang On Fri, Jan 22, 2016 at 2:57 PM, Matthew Knepley wrote: > On Fri, Jan 22, 2016 at 7:27 AM, Hoang Giang Bui > wrote: > >> DO you mean the option pc_fieldsplit_block_size? In this thread: >> >> http://petsc-users.mcs.anl.narkive.com/qSHIOFhh/fieldsplit-error >> > > No. "Block Size" is confusing on PETSc since it is used to do several > things. Here block size > is being used to split the matrix. You do not need this since you are > prescribing your splits. The > matrix block size is used two ways: > > 1) To indicate that matrix values come in logically dense blocks > > 2) To change the storage to match this logical arrangement > > After everything works, we can just indicate to the submatrix which is > extracted that it has a > certain block size. However, for the Laplacian I expect it not to matter. > > >> It assumes you have a constant number of fields at each grid point, am I >> right? However, my field split is not constant, like >> [u1_x u1_y u1_z p_1 u2_x u2_y u2_z u3_x u3_y >> u3_z p_3 u4_x u4_y u4_z] >> >> Subsequently the fieldsplit is >> [u1_x u1_y u1_z u2_x u2_y u2_z u3_x u3_y u3_z >> u4_x u4_y u4_z] >> [p_1 p_3] >> >> Then what is the option to set block size 3 for split 0? >> >> Sorry, I search several forum threads but cannot figure out the options >> as you said. >> >> >> >>> You can still do that. It can be done with options once the >>> decomposition is working. Its true that these solvers >>> work better with the block size set. However, if its the P2 Laplacian it >>> does not really matter since its uncoupled. >>> >>> Yes, I agree it's uncoupled with the other field, but the crucial factor >> defining the quality of the block preconditioner is the approximate >> inversion of individual block. I would merely try block Jacobi first, >> because it's quite simple. Nevertheless, fieldsplit implements other nice >> things, like Schur complement, etc. >> > > I think concepts are getting confused here. I was talking about the > interaction of components in one block (the P2 block). You > are talking about interaction between blocks. > > Thanks, > > Matt > > >> Giang >> >> >> >> On Fri, Jan 22, 2016 at 11:15 AM, Matthew Knepley >> wrote: >> >>> On Fri, Jan 22, 2016 at 3:40 AM, Hoang Giang Bui >>> wrote: >>> >>>> Hi Matt >>>> I would rather like to set the block size for block P2 too. Why? >>>> >>>> Because in one of my test (for problem involves only [u_x u_y u_z]), >>>> the gmres + Hypre AMG converges in 50 steps with block size 3, whereby it >>>> increases to 140 if block size is 1 (see attached files). >>>> >>> >>> You can still do that. It can be done with options once the >>> decomposition is working. Its true that these solvers >>> work better with the block size set. However, if its the P2 Laplacian it >>> does not really matter since its uncoupled. >>> >>> This gives me the impression that AMG will give better inversion for >>>> "P2" block if I can set its block size to 3. Of course it's still an >>>> hypothesis but worth to try. >>>> >>>> Another question: In one of the Petsc presentation, you said the Hypre >>>> AMG does not scale well, because set up cost amortize the iterations. How >>>> is it quantified? and what is the memory overhead? >>>> >>> >>> I said the Hypre setup cost is not scalable, but it can be amortized >>> over the iterations. You can quantify this >>> just by looking at the PCSetUp time as your increase the number of >>> processes. I don't think they have a good >>> model for the memory usage, and if they do, I do not know what it is. >>> However, generally Hypre takes more >>> memory than the agglomeration MG like ML or GAMG. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> >>>> Giang >>>> >>>> On Mon, Jan 18, 2016 at 5:25 PM, Jed Brown wrote: >>>> >>>>> Hoang Giang Bui writes: >>>>> >>>>> > Why P2/P2 is not for co-located discretization? >>>>> >>>>> Matt typed "P2/P2" when me meant "P2/P1". >>>>> >>>> >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Mon Jan 25 11:34:01 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 25 Jan 2016 11:34:01 -0600 Subject: [petsc-users] Question about TSSetIJacobian examples In-Reply-To: <56A5FFE1.6030402@gmail.com> References: <56A5FFE1.6030402@gmail.com> Message-ID: <5BE437F5-F68D-454D-BA55-345581C53C63@mcs.anl.gov> The reason for the MatAssembly... on A when A is not B is when using a matrix-free A. For example -snes_mf_operator Recall that matrix free matrix vector products with finite differences are computed with F(U + alpha*dx) - F(U) J(U)*dx = ----------------------------------- alpha*dx dx, of course, is different for each call to the multiply. Each new Newton step uses a new U. The MatAssemblyBegin/End() is when the matrix free matrix A is informed of the new U value (otherwise even with new Newton steps the original U from the first Newton step would be used forever); this is handled internally by the MatCreateSNESMF() object. Barry > On Jan 25, 2016, at 4:58 AM, Torquil Macdonald S?rensen wrote: > > Hi! > > I have been looking at some of the PETSc examples where TSSetIJacobian, > and there is one thing which is unclear to me. Consider e.g. the example: > > http://www.mcs.anl.gov/petsc/petsc-current/src/ts/examples/tutorials/ex8.c.html > > In the function RoberJacobian(), CEJacobian(), OregoJacobian(), there > are two matrix function arguments A and B. The matrix A is the one that > is actually set in the code. My question is: what is the purpose of > > MatAssemblyBegin(B,MAT_FINAL_ASSEMBLY); > MatAssemblyEnd(B,MAT_FINAL_ASSEMBLY); > > when A != B at the end of the function? How does that piece of code > affect B? In the documentation of these functions it says that they are > to be called after e.g. MatSetValues. But MatSetValues have not been > called on B in those functions, so that's why I'm wondering what those > lines are for. > > Best regards, > Torquil S?rensen > From kalan019 at umn.edu Mon Jan 25 12:15:04 2016 From: kalan019 at umn.edu (Vasileios Kalantzis) Date: Mon, 25 Jan 2016 12:15:04 -0600 Subject: [petsc-users] MatSetValues Message-ID: Dear all, I am trying to form an approximation of the Schur complement S = C-E'*(B\E) of a matrix A = [B, E ; E', C]. Matrix C is stored as a distributed Mat object while matrices B and E are locally distributed to each processor (the block partitioning comes from a Domain Decomposition point-of-view). All matrices B, E, and C are sparse. I already have a sparse version of -E'*(B\E) computed. Moreover, -E'*(B\E) is block-diagonal. The only issue now is how to merge (add) C and -E'*(B\E). The way i do the addition right now is based on checking every entry of -E'*(B\E) and, if larger than a threshold value, add it to C using the MatSetValue routine. The above is being done in parallel for each diagonal block of -E'*(B\E). The code works fine numerically but my approach is too slow if -E'*(B\E) is not highly sparse. I know that I can set many entries together by using the MatSetValues routine but I am not sure how to do it because the sparsity pattern of each column of -E'*(B\E) differs. Maybe I can assemble the sparsified Schur complement column-by-column using MatSetValues but is there any other idea perhaps? Thanks ! :) -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Mon Jan 25 12:43:26 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 25 Jan 2016 12:43:26 -0600 Subject: [petsc-users] Why use MATMPIBAIJ? In-Reply-To: References: <2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov> <87fuy07zvi.fsf@jedbrown.org> <87si1ug8hl.fsf@jedbrown.org> Message-ID: > On Jan 25, 2016, at 11:13 AM, Hoang Giang Bui wrote: > > OK, let's come back to my problem. I got your point about the interaction between components in one block. In my case, the interaction is strong. > > As you said, I try this: > > ierr = KSPSetFromOptions(ksp); CHKERRQ(ierr); > ierr = PCFieldSplitGetSubKSP(pc, &nsplits, &sub_ksp); CHKERRQ(ierr); > ksp_U = sub_ksp[0]; > ierr = KSPGetOperators(ksp_U, &A_U, &P_U); CHKERRQ(ierr); > ierr = MatSetBlockSize(A_U, 3); CHKERRQ(ierr); > ierr = MatSetBlockSize(P_U, 3); CHKERRQ(ierr); > ierr = PetscFree(sub_ksp); CHKERRQ(ierr); > > But it seems doesn't work. The output from -ksp_view shows that matrix passed to Hypre still has bs=1 Hmm, this is strange. MatSetBlockSize() should have either set the block size to 3 or generated an error. Can you run in the debugger on one process and put a break point in MatSetBlockSize() and see what it is setting the block size to. Then in PCSetUp_hypre() you can see what it is passing to hypre as the block size and maybe figure out how it becomes 1. Barry > > KSP Object: (fieldsplit_u_) 8 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > left preconditioning > using NONE norm type for convergence test > PC Object: (fieldsplit_u_) 8 MPI processes > type: hypre > HYPRE BoomerAMG preconditioning > HYPRE BoomerAMG: Cycle type V > HYPRE BoomerAMG: Maximum number of levels 25 > HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 > HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 > HYPRE BoomerAMG: Threshold for strong coupling 0.25 > HYPRE BoomerAMG: Interpolation truncation factor 0 > HYPRE BoomerAMG: Interpolation: max elements per row 0 > HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 > HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 > HYPRE BoomerAMG: Maximum row sums 0.9 > HYPRE BoomerAMG: Sweeps down 1 > HYPRE BoomerAMG: Sweeps up 1 > HYPRE BoomerAMG: Sweeps on coarse 1 > HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi > HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi > HYPRE BoomerAMG: Relax on coarse Gaussian-elimination > HYPRE BoomerAMG: Relax weight (all) 1 > HYPRE BoomerAMG: Outer relax weight (all) 1 > HYPRE BoomerAMG: Using CF-relaxation > HYPRE BoomerAMG: Measure type local > HYPRE BoomerAMG: Coarsen type PMIS > HYPRE BoomerAMG: Interpolation type classical > linear system matrix = precond matrix: > Mat Object: (fieldsplit_u_) 8 MPI processes > type: mpiaij > rows=792333, cols=792333 > total: nonzeros=1.39004e+08, allocated nonzeros=1.39004e+08 > total number of mallocs used during MatSetValues calls =0 > using I-node (on process 0) routines: found 30057 nodes, limit used is 5 > > In other test, I can see the block size bs=3 in the section of Mat Object > > Regardless the setup cost of Hypre AMG, I saw it gives quite a radical performance, providing that the material parameters does not vary strongly, and the geometry is regular enough. > > > Giang > > On Fri, Jan 22, 2016 at 2:57 PM, Matthew Knepley wrote: > On Fri, Jan 22, 2016 at 7:27 AM, Hoang Giang Bui wrote: > DO you mean the option pc_fieldsplit_block_size? In this thread: > > http://petsc-users.mcs.anl.narkive.com/qSHIOFhh/fieldsplit-error > > No. "Block Size" is confusing on PETSc since it is used to do several things. Here block size > is being used to split the matrix. You do not need this since you are prescribing your splits. The > matrix block size is used two ways: > > 1) To indicate that matrix values come in logically dense blocks > > 2) To change the storage to match this logical arrangement > > After everything works, we can just indicate to the submatrix which is extracted that it has a > certain block size. However, for the Laplacian I expect it not to matter. > > It assumes you have a constant number of fields at each grid point, am I right? However, my field split is not constant, like > [u1_x u1_y u1_z p_1 u2_x u2_y u2_z u3_x u3_y u3_z p_3 u4_x u4_y u4_z] > > Subsequently the fieldsplit is > [u1_x u1_y u1_z u2_x u2_y u2_z u3_x u3_y u3_z u4_x u4_y u4_z] > [p_1 p_3] > > Then what is the option to set block size 3 for split 0? > > Sorry, I search several forum threads but cannot figure out the options as you said. > > > > You can still do that. It can be done with options once the decomposition is working. Its true that these solvers > work better with the block size set. However, if its the P2 Laplacian it does not really matter since its uncoupled. > > Yes, I agree it's uncoupled with the other field, but the crucial factor defining the quality of the block preconditioner is the approximate inversion of individual block. I would merely try block Jacobi first, because it's quite simple. Nevertheless, fieldsplit implements other nice things, like Schur complement, etc. > > I think concepts are getting confused here. I was talking about the interaction of components in one block (the P2 block). You > are talking about interaction between blocks. > > Thanks, > > Matt > > Giang > > > > On Fri, Jan 22, 2016 at 11:15 AM, Matthew Knepley wrote: > On Fri, Jan 22, 2016 at 3:40 AM, Hoang Giang Bui wrote: > Hi Matt > I would rather like to set the block size for block P2 too. Why? > > Because in one of my test (for problem involves only [u_x u_y u_z]), the gmres + Hypre AMG converges in 50 steps with block size 3, whereby it increases to 140 if block size is 1 (see attached files). > > You can still do that. It can be done with options once the decomposition is working. Its true that these solvers > work better with the block size set. However, if its the P2 Laplacian it does not really matter since its uncoupled. > > This gives me the impression that AMG will give better inversion for "P2" block if I can set its block size to 3. Of course it's still an hypothesis but worth to try. > > Another question: In one of the Petsc presentation, you said the Hypre AMG does not scale well, because set up cost amortize the iterations. How is it quantified? and what is the memory overhead? > > I said the Hypre setup cost is not scalable, but it can be amortized over the iterations. You can quantify this > just by looking at the PCSetUp time as your increase the number of processes. I don't think they have a good > model for the memory usage, and if they do, I do not know what it is. However, generally Hypre takes more > memory than the agglomeration MG like ML or GAMG. > > Thanks, > > Matt > > > Giang > > On Mon, Jan 18, 2016 at 5:25 PM, Jed Brown wrote: > Hoang Giang Bui writes: > > > Why P2/P2 is not for co-located discretization? > > Matt typed "P2/P2" when me meant "P2/P1". > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > From bsmith at mcs.anl.gov Mon Jan 25 13:07:33 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 25 Jan 2016 13:07:33 -0600 Subject: [petsc-users] MatSetValues In-Reply-To: References: Message-ID: <73951966-F40A-4E10-99B6-8D925EAD78D5@mcs.anl.gov> > On Jan 25, 2016, at 12:15 PM, Vasileios Kalantzis wrote: > > Dear all, > > I am trying to form an approximation of the > Schur complement S = C-E'*(B\E) of a matrix > A = [B, E ; E', C]. Matrix C is stored as a > distributed Mat object while matrices B and E > are locally distributed to each processor (the > block partitioning comes from a Domain > Decomposition point-of-view). All matrices B, E, > and C are sparse. > > I already have a sparse version of -E'*(B\E) > computed. Moreover, -E'*(B\E) is block-diagonal. > The only issue now is how to merge (add) C and > -E'*(B\E). The way i do the addition right now is > based on checking every entry of -E'*(B\E) and, > if larger than a threshold value, add it to C using > the MatSetValue routine. The above is being done > in parallel for each diagonal block of -E'*(B\E). > > The code works fine numerically but my approach > is too slow if -E'*(B\E) is not highly sparse. This is a guess, but if you inserting new nonzero locations into C with MatSetValues() then this will be very slow. Are you inserting new locations? If so, here is what you need to do. Sweep through the rows of C/-E'*(B\E) determining the number of nonzeros that will be in the result and then use MatCreateMPIAIJ() or MatMPIAIJSetPreallocation() to preallocate the space in a new matrix, say D. Then sweep through all the rows again actually calling the MatSetValues() and put the entries into D. Switching from non-preallocation to preallocation will speed it up dramatically (factors of 100's or more) if you were inserting new locations. Barry > > I know that I can set many entries together by > using the MatSetValues routine but I am not > sure how to do it because the sparsity pattern of > each column of -E'*(B\E) differs. Maybe I can > assemble the sparsified Schur complement > column-by-column using MatSetValues but is > there any other idea perhaps? > > Thanks ! :) From kalan019 at umn.edu Mon Jan 25 13:37:58 2016 From: kalan019 at umn.edu (Vasileios Kalantzis) Date: Mon, 25 Jan 2016 13:37:58 -0600 Subject: [petsc-users] MatSetValues In-Reply-To: <73951966-F40A-4E10-99B6-8D925EAD78D5@mcs.anl.gov> References: <73951966-F40A-4E10-99B6-8D925EAD78D5@mcs.anl.gov> Message-ID: Hi Barry, yes, I am inserting new locations. I was actually copying matrix C to a new matrix D, and then I was inserting the "thresholded" values -E*(B\E) in this matrix D. I was pretty much sure that the missing pre-allocation was the reason for the slow code -- when I commented out the MatSetValues() part (i.e. I approximated the Schur complement only by matrix C), this part was much much faster. I will definitely follow your suggestion -- Thanks!!! On Mon, Jan 25, 2016 at 1:07 PM, Barry Smith wrote: > > > On Jan 25, 2016, at 12:15 PM, Vasileios Kalantzis > wrote: > > > > Dear all, > > > > I am trying to form an approximation of the > > Schur complement S = C-E'*(B\E) of a matrix > > A = [B, E ; E', C]. Matrix C is stored as a > > distributed Mat object while matrices B and E > > are locally distributed to each processor (the > > block partitioning comes from a Domain > > Decomposition point-of-view). All matrices B, E, > > and C are sparse. > > > > I already have a sparse version of -E'*(B\E) > > computed. Moreover, -E'*(B\E) is block-diagonal. > > The only issue now is how to merge (add) C and > > -E'*(B\E). The way i do the addition right now is > > based on checking every entry of -E'*(B\E) and, > > if larger than a threshold value, add it to C using > > the MatSetValue routine. The above is being done > > in parallel for each diagonal block of -E'*(B\E). > > > > The code works fine numerically but my approach > > is too slow if -E'*(B\E) is not highly sparse. > > This is a guess, but if you inserting new nonzero locations into C with > MatSetValues() then this will be very slow. Are you inserting new locations? > > If so, here is what you need to do. Sweep through the rows of > C/-E'*(B\E) determining the number of nonzeros that will be in the result > and then use MatCreateMPIAIJ() or MatMPIAIJSetPreallocation() to > preallocate the space in a new matrix, say D. Then sweep through all the > rows again actually calling the MatSetValues() and put the entries into D. > Switching from non-preallocation to preallocation will speed it up > dramatically (factors of 100's or more) if you were inserting new locations. > > Barry > > > > > > > I know that I can set many entries together by > > using the MatSetValues routine but I am not > > sure how to do it because the sparsity pattern of > > each column of -E'*(B\E) differs. Maybe I can > > assemble the sparsified Schur complement > > column-by-column using MatSetValues but is > > there any other idea perhaps? > > > > Thanks ! :) > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hgbk2008 at gmail.com Tue Jan 26 02:58:20 2016 From: hgbk2008 at gmail.com (Hoang Giang Bui) Date: Tue, 26 Jan 2016 09:58:20 +0100 Subject: [petsc-users] Why use MATMPIBAIJ? In-Reply-To: References: <2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov> <87fuy07zvi.fsf@jedbrown.org> <87si1ug8hl.fsf@jedbrown.org> Message-ID: Hi I assert this line to the hypre.c to see what block size it set to /* special case for BoomerAMG */ if (jac->setup == HYPRE_BoomerAMGSetup) { ierr = MatGetBlockSize(pc->pmat,&bs);CHKERRQ(ierr); // check block size passed to HYPRE PetscPrintf(PetscObjectComm((PetscObject)pc),"the block size passed to HYPRE is %d\n",bs); if (bs > 1) PetscStackCallStandard(HYPRE_BoomerAMGSetNumFunctions,(jac->hsolver,bs)); } It shows that the passing block size is 1. So my hypothesis is correct. In the manual of MatSetBlockSize ( http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatSetBlockSize.html), it has to be called before MatSetUp. Hence I guess the matrix passed to HYPRE is created before I set the block size. Given that, I set the block size after the call to PCFieldSplitSetIS ierr = PCFieldSplitSetIS(pc, "u", IS_u); CHKERRQ(ierr); ierr = PCFieldSplitSetIS(pc, "p", IS_p); CHKERRQ(ierr); /* Set block size for sub-matrix, */ ierr = PCFieldSplitGetSubKSP(pc, &nsplits, &sub_ksp); CHKERRQ(ierr); ksp_U = sub_ksp[0]; ierr = KSPGetOperators(ksp_U, &A_U, &P_U); CHKERRQ(ierr); ierr = MatSetBlockSize(A_U, 3); CHKERRQ(ierr); ierr = MatSetBlockSize(P_U, 3); CHKERRQ(ierr); I guess the sub-matrices is created at PCFieldSplitSetIS. If that's correct then it's not possible to set the block size this way. Giang On Mon, Jan 25, 2016 at 7:43 PM, Barry Smith wrote: > > > On Jan 25, 2016, at 11:13 AM, Hoang Giang Bui > wrote: > > > > OK, let's come back to my problem. I got your point about the > interaction between components in one block. In my case, the interaction is > strong. > > > > As you said, I try this: > > > > ierr = KSPSetFromOptions(ksp); CHKERRQ(ierr); > > ierr = PCFieldSplitGetSubKSP(pc, &nsplits, &sub_ksp); > CHKERRQ(ierr); > > ksp_U = sub_ksp[0]; > > ierr = KSPGetOperators(ksp_U, &A_U, &P_U); CHKERRQ(ierr); > > ierr = MatSetBlockSize(A_U, 3); CHKERRQ(ierr); > > ierr = MatSetBlockSize(P_U, 3); CHKERRQ(ierr); > > ierr = PetscFree(sub_ksp); CHKERRQ(ierr); > > > > But it seems doesn't work. The output from -ksp_view shows that matrix > passed to Hypre still has bs=1 > > Hmm, this is strange. MatSetBlockSize() should have either set the > block size to 3 or generated an error. Can you run in the debugger on one > process and put a break point in MatSetBlockSize() and see what it is > setting the block size to. Then in PCSetUp_hypre() you can see what it is > passing to hypre as the block size and maybe figure out how it becomes 1. > > Barry > > > > > > KSP Object: (fieldsplit_u_) 8 MPI processes > > type: preonly > > maximum iterations=10000, initial guess is zero > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > > left preconditioning > > using NONE norm type for convergence test > > PC Object: (fieldsplit_u_) 8 MPI processes > > type: hypre > > HYPRE BoomerAMG preconditioning > > HYPRE BoomerAMG: Cycle type V > > HYPRE BoomerAMG: Maximum number of levels 25 > > HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 > > HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 > > HYPRE BoomerAMG: Threshold for strong coupling 0.25 > > HYPRE BoomerAMG: Interpolation truncation factor 0 > > HYPRE BoomerAMG: Interpolation: max elements per row 0 > > HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 > > HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 > > HYPRE BoomerAMG: Maximum row sums 0.9 > > HYPRE BoomerAMG: Sweeps down 1 > > HYPRE BoomerAMG: Sweeps up 1 > > HYPRE BoomerAMG: Sweeps on coarse 1 > > HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi > > HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi > > HYPRE BoomerAMG: Relax on coarse Gaussian-elimination > > HYPRE BoomerAMG: Relax weight (all) 1 > > HYPRE BoomerAMG: Outer relax weight (all) 1 > > HYPRE BoomerAMG: Using CF-relaxation > > HYPRE BoomerAMG: Measure type local > > HYPRE BoomerAMG: Coarsen type PMIS > > HYPRE BoomerAMG: Interpolation type classical > > linear system matrix = precond matrix: > > Mat Object: (fieldsplit_u_) 8 MPI processes > > type: mpiaij > > rows=792333, cols=792333 > > total: nonzeros=1.39004e+08, allocated nonzeros=1.39004e+08 > > total number of mallocs used during MatSetValues calls =0 > > using I-node (on process 0) routines: found 30057 nodes, limit > used is 5 > > > > In other test, I can see the block size bs=3 in the section of Mat Object > > > > Regardless the setup cost of Hypre AMG, I saw it gives quite a radical > performance, providing that the material parameters does not vary strongly, > and the geometry is regular enough. > > > > > > Giang > > > > On Fri, Jan 22, 2016 at 2:57 PM, Matthew Knepley > wrote: > > On Fri, Jan 22, 2016 at 7:27 AM, Hoang Giang Bui > wrote: > > DO you mean the option pc_fieldsplit_block_size? In this thread: > > > > http://petsc-users.mcs.anl.narkive.com/qSHIOFhh/fieldsplit-error > > > > No. "Block Size" is confusing on PETSc since it is used to do several > things. Here block size > > is being used to split the matrix. You do not need this since you are > prescribing your splits. The > > matrix block size is used two ways: > > > > 1) To indicate that matrix values come in logically dense blocks > > > > 2) To change the storage to match this logical arrangement > > > > After everything works, we can just indicate to the submatrix which is > extracted that it has a > > certain block size. However, for the Laplacian I expect it not to matter. > > > > It assumes you have a constant number of fields at each grid point, am I > right? However, my field split is not constant, like > > [u1_x u1_y u1_z p_1 u2_x u2_y u2_z u3_x u3_y > u3_z p_3 u4_x u4_y u4_z] > > > > Subsequently the fieldsplit is > > [u1_x u1_y u1_z u2_x u2_y u2_z u3_x u3_y u3_z > u4_x u4_y u4_z] > > [p_1 p_3] > > > > Then what is the option to set block size 3 for split 0? > > > > Sorry, I search several forum threads but cannot figure out the options > as you said. > > > > > > > > You can still do that. It can be done with options once the > decomposition is working. Its true that these solvers > > work better with the block size set. However, if its the P2 Laplacian it > does not really matter since its uncoupled. > > > > Yes, I agree it's uncoupled with the other field, but the crucial factor > defining the quality of the block preconditioner is the approximate > inversion of individual block. I would merely try block Jacobi first, > because it's quite simple. Nevertheless, fieldsplit implements other nice > things, like Schur complement, etc. > > > > I think concepts are getting confused here. I was talking about the > interaction of components in one block (the P2 block). You > > are talking about interaction between blocks. > > > > Thanks, > > > > Matt > > > > Giang > > > > > > > > On Fri, Jan 22, 2016 at 11:15 AM, Matthew Knepley > wrote: > > On Fri, Jan 22, 2016 at 3:40 AM, Hoang Giang Bui > wrote: > > Hi Matt > > I would rather like to set the block size for block P2 too. Why? > > > > Because in one of my test (for problem involves only [u_x u_y u_z]), the > gmres + Hypre AMG converges in 50 steps with block size 3, whereby it > increases to 140 if block size is 1 (see attached files). > > > > You can still do that. It can be done with options once the > decomposition is working. Its true that these solvers > > work better with the block size set. However, if its the P2 Laplacian it > does not really matter since its uncoupled. > > > > This gives me the impression that AMG will give better inversion for > "P2" block if I can set its block size to 3. Of course it's still an > hypothesis but worth to try. > > > > Another question: In one of the Petsc presentation, you said the Hypre > AMG does not scale well, because set up cost amortize the iterations. How > is it quantified? and what is the memory overhead? > > > > I said the Hypre setup cost is not scalable, but it can be amortized > over the iterations. You can quantify this > > just by looking at the PCSetUp time as your increase the number of > processes. I don't think they have a good > > model for the memory usage, and if they do, I do not know what it is. > However, generally Hypre takes more > > memory than the agglomeration MG like ML or GAMG. > > > > Thanks, > > > > Matt > > > > > > Giang > > > > On Mon, Jan 18, 2016 at 5:25 PM, Jed Brown wrote: > > Hoang Giang Bui writes: > > > > > Why P2/P2 is not for co-located discretization? > > > > Matt typed "P2/P2" when me meant "P2/P1". > > > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Tue Jan 26 08:01:49 2016 From: mfadams at lbl.gov (Mark Adams) Date: Tue, 26 Jan 2016 09:01:49 -0500 Subject: [petsc-users] Why use MATMPIBAIJ? In-Reply-To: References: <2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov> <87fuy07zvi.fsf@jedbrown.org> <87si1ug8hl.fsf@jedbrown.org> Message-ID: On Tue, Jan 26, 2016 at 3:58 AM, Hoang Giang Bui wrote: > Hi > > I assert this line to the hypre.c to see what block size it set to > > /* special case for BoomerAMG */ > if (jac->setup == HYPRE_BoomerAMGSetup) { > ierr = MatGetBlockSize(pc->pmat,&bs);CHKERRQ(ierr); > > // check block size passed to HYPRE > PetscPrintf(PetscObjectComm((PetscObject)pc),"the block size passed to > HYPRE is %d\n",bs); > > if (bs > 1) > PetscStackCallStandard(HYPRE_BoomerAMGSetNumFunctions,(jac->hsolver,bs)); > } > > It shows that the passing block size is 1. So my hypothesis is correct. > > In the manual of MatSetBlockSize ( > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatSetBlockSize.html), > it has to be called before MatSetUp. Hence I guess the matrix passed to > HYPRE is created before I set the block size. Given that, I set the block > size after the call to PCFieldSplitSetIS > > ierr = PCFieldSplitSetIS(pc, "u", IS_u); CHKERRQ(ierr); > ierr = PCFieldSplitSetIS(pc, "p", IS_p); CHKERRQ(ierr); > > /* > Set block size for sub-matrix, > */ > ierr = PCFieldSplitGetSubKSP(pc, &nsplits, &sub_ksp); > CHKERRQ(ierr); > ksp_U = sub_ksp[0]; > ierr = KSPGetOperators(ksp_U, &A_U, &P_U); CHKERRQ(ierr); > ierr = MatSetBlockSize(A_U, 3); CHKERRQ(ierr); > ierr = MatSetBlockSize(P_U, 3); CHKERRQ(ierr); > > I guess the sub-matrices is created at PCFieldSplitSetIS. If that's > correct then it's not possible to set the block size this way. > You set the block size in the ISs that you give to FieldSplit. FieldSplit will give it to the matrices. > > > Giang > > On Mon, Jan 25, 2016 at 7:43 PM, Barry Smith wrote: > >> >> > On Jan 25, 2016, at 11:13 AM, Hoang Giang Bui >> wrote: >> > >> > OK, let's come back to my problem. I got your point about the >> interaction between components in one block. In my case, the interaction is >> strong. >> > >> > As you said, I try this: >> > >> > ierr = KSPSetFromOptions(ksp); CHKERRQ(ierr); >> > ierr = PCFieldSplitGetSubKSP(pc, &nsplits, &sub_ksp); >> CHKERRQ(ierr); >> > ksp_U = sub_ksp[0]; >> > ierr = KSPGetOperators(ksp_U, &A_U, &P_U); CHKERRQ(ierr); >> > ierr = MatSetBlockSize(A_U, 3); CHKERRQ(ierr); >> > ierr = MatSetBlockSize(P_U, 3); CHKERRQ(ierr); >> > ierr = PetscFree(sub_ksp); CHKERRQ(ierr); >> > >> > But it seems doesn't work. The output from -ksp_view shows that matrix >> passed to Hypre still has bs=1 >> >> Hmm, this is strange. MatSetBlockSize() should have either set the >> block size to 3 or generated an error. Can you run in the debugger on one >> process and put a break point in MatSetBlockSize() and see what it is >> setting the block size to. Then in PCSetUp_hypre() you can see what it is >> passing to hypre as the block size and maybe figure out how it becomes 1. >> >> Barry >> >> >> > >> > KSP Object: (fieldsplit_u_) 8 MPI processes >> > type: preonly >> > maximum iterations=10000, initial guess is zero >> > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 >> > left preconditioning >> > using NONE norm type for convergence test >> > PC Object: (fieldsplit_u_) 8 MPI processes >> > type: hypre >> > HYPRE BoomerAMG preconditioning >> > HYPRE BoomerAMG: Cycle type V >> > HYPRE BoomerAMG: Maximum number of levels 25 >> > HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 >> > HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 >> > HYPRE BoomerAMG: Threshold for strong coupling 0.25 >> > HYPRE BoomerAMG: Interpolation truncation factor 0 >> > HYPRE BoomerAMG: Interpolation: max elements per row 0 >> > HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 >> > HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 >> > HYPRE BoomerAMG: Maximum row sums 0.9 >> > HYPRE BoomerAMG: Sweeps down 1 >> > HYPRE BoomerAMG: Sweeps up 1 >> > HYPRE BoomerAMG: Sweeps on coarse 1 >> > HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi >> > HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi >> > HYPRE BoomerAMG: Relax on coarse Gaussian-elimination >> > HYPRE BoomerAMG: Relax weight (all) 1 >> > HYPRE BoomerAMG: Outer relax weight (all) 1 >> > HYPRE BoomerAMG: Using CF-relaxation >> > HYPRE BoomerAMG: Measure type local >> > HYPRE BoomerAMG: Coarsen type PMIS >> > HYPRE BoomerAMG: Interpolation type classical >> > linear system matrix = precond matrix: >> > Mat Object: (fieldsplit_u_) 8 MPI processes >> > type: mpiaij >> > rows=792333, cols=792333 >> > total: nonzeros=1.39004e+08, allocated nonzeros=1.39004e+08 >> > total number of mallocs used during MatSetValues calls =0 >> > using I-node (on process 0) routines: found 30057 nodes, >> limit used is 5 >> > >> > In other test, I can see the block size bs=3 in the section of Mat >> Object >> > >> > Regardless the setup cost of Hypre AMG, I saw it gives quite a radical >> performance, providing that the material parameters does not vary strongly, >> and the geometry is regular enough. >> > >> > >> > Giang >> > >> > On Fri, Jan 22, 2016 at 2:57 PM, Matthew Knepley >> wrote: >> > On Fri, Jan 22, 2016 at 7:27 AM, Hoang Giang Bui >> wrote: >> > DO you mean the option pc_fieldsplit_block_size? In this thread: >> > >> > http://petsc-users.mcs.anl.narkive.com/qSHIOFhh/fieldsplit-error >> > >> > No. "Block Size" is confusing on PETSc since it is used to do several >> things. Here block size >> > is being used to split the matrix. You do not need this since you are >> prescribing your splits. The >> > matrix block size is used two ways: >> > >> > 1) To indicate that matrix values come in logically dense blocks >> > >> > 2) To change the storage to match this logical arrangement >> > >> > After everything works, we can just indicate to the submatrix which is >> extracted that it has a >> > certain block size. However, for the Laplacian I expect it not to >> matter. >> > >> > It assumes you have a constant number of fields at each grid point, am >> I right? However, my field split is not constant, like >> > [u1_x u1_y u1_z p_1 u2_x u2_y u2_z u3_x u3_y >> u3_z p_3 u4_x u4_y u4_z] >> > >> > Subsequently the fieldsplit is >> > [u1_x u1_y u1_z u2_x u2_y u2_z u3_x u3_y u3_z >> u4_x u4_y u4_z] >> > [p_1 p_3] >> > >> > Then what is the option to set block size 3 for split 0? >> > >> > Sorry, I search several forum threads but cannot figure out the options >> as you said. >> > >> > >> > >> > You can still do that. It can be done with options once the >> decomposition is working. Its true that these solvers >> > work better with the block size set. However, if its the P2 Laplacian >> it does not really matter since its uncoupled. >> > >> > Yes, I agree it's uncoupled with the other field, but the crucial >> factor defining the quality of the block preconditioner is the approximate >> inversion of individual block. I would merely try block Jacobi first, >> because it's quite simple. Nevertheless, fieldsplit implements other nice >> things, like Schur complement, etc. >> > >> > I think concepts are getting confused here. I was talking about the >> interaction of components in one block (the P2 block). You >> > are talking about interaction between blocks. >> > >> > Thanks, >> > >> > Matt >> > >> > Giang >> > >> > >> > >> > On Fri, Jan 22, 2016 at 11:15 AM, Matthew Knepley >> wrote: >> > On Fri, Jan 22, 2016 at 3:40 AM, Hoang Giang Bui >> wrote: >> > Hi Matt >> > I would rather like to set the block size for block P2 too. Why? >> > >> > Because in one of my test (for problem involves only [u_x u_y u_z]), >> the gmres + Hypre AMG converges in 50 steps with block size 3, whereby it >> increases to 140 if block size is 1 (see attached files). >> > >> > You can still do that. It can be done with options once the >> decomposition is working. Its true that these solvers >> > work better with the block size set. However, if its the P2 Laplacian >> it does not really matter since its uncoupled. >> > >> > This gives me the impression that AMG will give better inversion for >> "P2" block if I can set its block size to 3. Of course it's still an >> hypothesis but worth to try. >> > >> > Another question: In one of the Petsc presentation, you said the Hypre >> AMG does not scale well, because set up cost amortize the iterations. How >> is it quantified? and what is the memory overhead? >> > >> > I said the Hypre setup cost is not scalable, but it can be amortized >> over the iterations. You can quantify this >> > just by looking at the PCSetUp time as your increase the number of >> processes. I don't think they have a good >> > model for the memory usage, and if they do, I do not know what it is. >> However, generally Hypre takes more >> > memory than the agglomeration MG like ML or GAMG. >> > >> > Thanks, >> > >> > Matt >> > >> > >> > Giang >> > >> > On Mon, Jan 18, 2016 at 5:25 PM, Jed Brown wrote: >> > Hoang Giang Bui writes: >> > >> > > Why P2/P2 is not for co-located discretization? >> > >> > Matt typed "P2/P2" when me meant "P2/P1". >> > >> > >> > >> > >> > -- >> > What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> > -- Norbert Wiener >> > >> > >> > >> > >> > -- >> > What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> > -- Norbert Wiener >> > >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hgbk2008 at gmail.com Tue Jan 26 11:41:08 2016 From: hgbk2008 at gmail.com (Hoang Giang Bui) Date: Tue, 26 Jan 2016 18:41:08 +0100 Subject: [petsc-users] Why use MATMPIBAIJ? In-Reply-To: References: <2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov> <87fuy07zvi.fsf@jedbrown.org> <87si1ug8hl.fsf@jedbrown.org> Message-ID: Clear enough. Thank you :-) Giang On Tue, Jan 26, 2016 at 3:01 PM, Mark Adams wrote: > > > On Tue, Jan 26, 2016 at 3:58 AM, Hoang Giang Bui > wrote: > >> Hi >> >> I assert this line to the hypre.c to see what block size it set to >> >> /* special case for BoomerAMG */ >> if (jac->setup == HYPRE_BoomerAMGSetup) { >> ierr = MatGetBlockSize(pc->pmat,&bs);CHKERRQ(ierr); >> >> // check block size passed to HYPRE >> PetscPrintf(PetscObjectComm((PetscObject)pc),"the block size passed >> to HYPRE is %d\n",bs); >> >> if (bs > 1) >> PetscStackCallStandard(HYPRE_BoomerAMGSetNumFunctions,(jac->hsolver,bs)); >> } >> >> It shows that the passing block size is 1. So my hypothesis is correct. >> >> In the manual of MatSetBlockSize ( >> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatSetBlockSize.html), >> it has to be called before MatSetUp. Hence I guess the matrix passed to >> HYPRE is created before I set the block size. Given that, I set the block >> size after the call to PCFieldSplitSetIS >> >> ierr = PCFieldSplitSetIS(pc, "u", IS_u); CHKERRQ(ierr); >> ierr = PCFieldSplitSetIS(pc, "p", IS_p); CHKERRQ(ierr); >> >> /* >> Set block size for sub-matrix, >> */ >> ierr = PCFieldSplitGetSubKSP(pc, &nsplits, &sub_ksp); >> CHKERRQ(ierr); >> ksp_U = sub_ksp[0]; >> ierr = KSPGetOperators(ksp_U, &A_U, &P_U); CHKERRQ(ierr); >> ierr = MatSetBlockSize(A_U, 3); CHKERRQ(ierr); >> ierr = MatSetBlockSize(P_U, 3); CHKERRQ(ierr); >> >> I guess the sub-matrices is created at PCFieldSplitSetIS. If that's >> correct then it's not possible to set the block size this way. >> > > You set the block size in the ISs that you give to FieldSplit. FieldSplit > will give it to the matrices. > > >> >> >> Giang >> >> On Mon, Jan 25, 2016 at 7:43 PM, Barry Smith wrote: >> >>> >>> > On Jan 25, 2016, at 11:13 AM, Hoang Giang Bui >>> wrote: >>> > >>> > OK, let's come back to my problem. I got your point about the >>> interaction between components in one block. In my case, the interaction is >>> strong. >>> > >>> > As you said, I try this: >>> > >>> > ierr = KSPSetFromOptions(ksp); CHKERRQ(ierr); >>> > ierr = PCFieldSplitGetSubKSP(pc, &nsplits, &sub_ksp); >>> CHKERRQ(ierr); >>> > ksp_U = sub_ksp[0]; >>> > ierr = KSPGetOperators(ksp_U, &A_U, &P_U); CHKERRQ(ierr); >>> > ierr = MatSetBlockSize(A_U, 3); CHKERRQ(ierr); >>> > ierr = MatSetBlockSize(P_U, 3); CHKERRQ(ierr); >>> > ierr = PetscFree(sub_ksp); CHKERRQ(ierr); >>> > >>> > But it seems doesn't work. The output from -ksp_view shows that matrix >>> passed to Hypre still has bs=1 >>> >>> Hmm, this is strange. MatSetBlockSize() should have either set the >>> block size to 3 or generated an error. Can you run in the debugger on one >>> process and put a break point in MatSetBlockSize() and see what it is >>> setting the block size to. Then in PCSetUp_hypre() you can see what it is >>> passing to hypre as the block size and maybe figure out how it becomes 1. >>> >>> Barry >>> >>> >>> > >>> > KSP Object: (fieldsplit_u_) 8 MPI processes >>> > type: preonly >>> > maximum iterations=10000, initial guess is zero >>> > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 >>> > left preconditioning >>> > using NONE norm type for convergence test >>> > PC Object: (fieldsplit_u_) 8 MPI processes >>> > type: hypre >>> > HYPRE BoomerAMG preconditioning >>> > HYPRE BoomerAMG: Cycle type V >>> > HYPRE BoomerAMG: Maximum number of levels 25 >>> > HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 >>> > HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 >>> > HYPRE BoomerAMG: Threshold for strong coupling 0.25 >>> > HYPRE BoomerAMG: Interpolation truncation factor 0 >>> > HYPRE BoomerAMG: Interpolation: max elements per row 0 >>> > HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 >>> > HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 >>> > HYPRE BoomerAMG: Maximum row sums 0.9 >>> > HYPRE BoomerAMG: Sweeps down 1 >>> > HYPRE BoomerAMG: Sweeps up 1 >>> > HYPRE BoomerAMG: Sweeps on coarse 1 >>> > HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi >>> > HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi >>> > HYPRE BoomerAMG: Relax on coarse Gaussian-elimination >>> > HYPRE BoomerAMG: Relax weight (all) 1 >>> > HYPRE BoomerAMG: Outer relax weight (all) 1 >>> > HYPRE BoomerAMG: Using CF-relaxation >>> > HYPRE BoomerAMG: Measure type local >>> > HYPRE BoomerAMG: Coarsen type PMIS >>> > HYPRE BoomerAMG: Interpolation type classical >>> > linear system matrix = precond matrix: >>> > Mat Object: (fieldsplit_u_) 8 MPI processes >>> > type: mpiaij >>> > rows=792333, cols=792333 >>> > total: nonzeros=1.39004e+08, allocated nonzeros=1.39004e+08 >>> > total number of mallocs used during MatSetValues calls =0 >>> > using I-node (on process 0) routines: found 30057 nodes, >>> limit used is 5 >>> > >>> > In other test, I can see the block size bs=3 in the section of Mat >>> Object >>> > >>> > Regardless the setup cost of Hypre AMG, I saw it gives quite a radical >>> performance, providing that the material parameters does not vary strongly, >>> and the geometry is regular enough. >>> > >>> > >>> > Giang >>> > >>> > On Fri, Jan 22, 2016 at 2:57 PM, Matthew Knepley >>> wrote: >>> > On Fri, Jan 22, 2016 at 7:27 AM, Hoang Giang Bui >>> wrote: >>> > DO you mean the option pc_fieldsplit_block_size? In this thread: >>> > >>> > http://petsc-users.mcs.anl.narkive.com/qSHIOFhh/fieldsplit-error >>> > >>> > No. "Block Size" is confusing on PETSc since it is used to do several >>> things. Here block size >>> > is being used to split the matrix. You do not need this since you are >>> prescribing your splits. The >>> > matrix block size is used two ways: >>> > >>> > 1) To indicate that matrix values come in logically dense blocks >>> > >>> > 2) To change the storage to match this logical arrangement >>> > >>> > After everything works, we can just indicate to the submatrix which is >>> extracted that it has a >>> > certain block size. However, for the Laplacian I expect it not to >>> matter. >>> > >>> > It assumes you have a constant number of fields at each grid point, am >>> I right? However, my field split is not constant, like >>> > [u1_x u1_y u1_z p_1 u2_x u2_y u2_z u3_x u3_y >>> u3_z p_3 u4_x u4_y u4_z] >>> > >>> > Subsequently the fieldsplit is >>> > [u1_x u1_y u1_z u2_x u2_y u2_z u3_x u3_y u3_z >>> u4_x u4_y u4_z] >>> > [p_1 p_3] >>> > >>> > Then what is the option to set block size 3 for split 0? >>> > >>> > Sorry, I search several forum threads but cannot figure out the >>> options as you said. >>> > >>> > >>> > >>> > You can still do that. It can be done with options once the >>> decomposition is working. Its true that these solvers >>> > work better with the block size set. However, if its the P2 Laplacian >>> it does not really matter since its uncoupled. >>> > >>> > Yes, I agree it's uncoupled with the other field, but the crucial >>> factor defining the quality of the block preconditioner is the approximate >>> inversion of individual block. I would merely try block Jacobi first, >>> because it's quite simple. Nevertheless, fieldsplit implements other nice >>> things, like Schur complement, etc. >>> > >>> > I think concepts are getting confused here. I was talking about the >>> interaction of components in one block (the P2 block). You >>> > are talking about interaction between blocks. >>> > >>> > Thanks, >>> > >>> > Matt >>> > >>> > Giang >>> > >>> > >>> > >>> > On Fri, Jan 22, 2016 at 11:15 AM, Matthew Knepley >>> wrote: >>> > On Fri, Jan 22, 2016 at 3:40 AM, Hoang Giang Bui >>> wrote: >>> > Hi Matt >>> > I would rather like to set the block size for block P2 too. Why? >>> > >>> > Because in one of my test (for problem involves only [u_x u_y u_z]), >>> the gmres + Hypre AMG converges in 50 steps with block size 3, whereby it >>> increases to 140 if block size is 1 (see attached files). >>> > >>> > You can still do that. It can be done with options once the >>> decomposition is working. Its true that these solvers >>> > work better with the block size set. However, if its the P2 Laplacian >>> it does not really matter since its uncoupled. >>> > >>> > This gives me the impression that AMG will give better inversion for >>> "P2" block if I can set its block size to 3. Of course it's still an >>> hypothesis but worth to try. >>> > >>> > Another question: In one of the Petsc presentation, you said the Hypre >>> AMG does not scale well, because set up cost amortize the iterations. How >>> is it quantified? and what is the memory overhead? >>> > >>> > I said the Hypre setup cost is not scalable, but it can be amortized >>> over the iterations. You can quantify this >>> > just by looking at the PCSetUp time as your increase the number of >>> processes. I don't think they have a good >>> > model for the memory usage, and if they do, I do not know what it is. >>> However, generally Hypre takes more >>> > memory than the agglomeration MG like ML or GAMG. >>> > >>> > Thanks, >>> > >>> > Matt >>> > >>> > >>> > Giang >>> > >>> > On Mon, Jan 18, 2016 at 5:25 PM, Jed Brown wrote: >>> > Hoang Giang Bui writes: >>> > >>> > > Why P2/P2 is not for co-located discretization? >>> > >>> > Matt typed "P2/P2" when me meant "P2/P1". >>> > >>> > >>> > >>> > >>> > -- >>> > What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> > -- Norbert Wiener >>> > >>> > >>> > >>> > >>> > -- >>> > What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> > -- Norbert Wiener >>> > >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From salazardetro1 at llnl.gov Wed Jan 27 11:34:09 2016 From: salazardetro1 at llnl.gov (Salazar De Troya, Miguel) Date: Wed, 27 Jan 2016 17:34:09 +0000 Subject: [petsc-users] Basic vector calculation question Message-ID: Hello Suppose I have four vectors A,B, C and D with the same number of components. I would like to do the following component wise vector operation: (A ? B) / (C + D) I could calculate A-B and C+D and store the results in temporary PETSc vector temp_result1 and temp_result2 (I need to keep A,B, C and D unchanged) and then call VecPointwiseDivide(temp_result1,temp_result1,temp_result2) and get my result in temp_result1. On the other hand, to avoid creating two temporary PETSc vectors, I could call VecGetArray()/VecRestoreArray() on the four vectors, iterate over the local indices and perform the same operations at once and store the result in another vector. What is the best way and why? I think that VecGetArray() creates temporary sequential vectors as well. I?m not sure if this is what the above mentioned PETSc routines internally do. Thanks Miguel -------------- next part -------------- An HTML attachment was scrubbed... URL: From hong at aspiritech.org Wed Jan 27 12:02:15 2016 From: hong at aspiritech.org (hong at aspiritech.org) Date: Wed, 27 Jan 2016 12:02:15 -0600 Subject: [petsc-users] Basic vector calculation question In-Reply-To: References: Message-ID: Salazar: Use VecGetArray()/VecRestoreArray(). They access arrays, do not create any temporary sequential vectors. Hong Hello > > Suppose I have four vectors A,B, C and D with the same number of > components. I would like to do the following component wise vector > operation: > > (A ? B) / (C + D) > > I could calculate A-B and C+D and store the results in temporary PETSc > vector temp_result1 and temp_result2 (I need to keep A,B, C and D > unchanged) and then call > VecPointwiseDivide(temp_result1,temp_result1,temp_result2) and get my > result in temp_result1. > > On the other hand, to avoid creating two temporary PETSc vectors, I could > call VecGetArray()/VecRestoreArray() on the four vectors, iterate over the > local indices and perform the same operations at once and store the result > in another vector. What is the best way and why? I think that VecGetArray() > creates temporary sequential vectors as well. I?m not sure if this is what > the above mentioned PETSc routines internally do. > > Thanks > Miguel > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From epscodes at gmail.com Wed Jan 27 15:49:02 2016 From: epscodes at gmail.com (Xiangdong) Date: Wed, 27 Jan 2016 16:49:02 -0500 Subject: [petsc-users] repartition for dynamic load balancing Message-ID: Hello everyone, I have a question on dynamic load balance in petsc. I started running a simulation with one partition. As the simulation goes on, that partition may lead to load imbalance since it is a non-steady problem. If it is worth to perform the load balance, is there an easy way to re-partition the mesh and continue the simulation? Thanks. Xiangdong -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed Jan 27 23:20:21 2016 From: jed at jedbrown.org (Jed Brown) Date: Wed, 27 Jan 2016 22:20:21 -0700 Subject: [petsc-users] repartition for dynamic load balancing In-Reply-To: References: Message-ID: <87lh7ap9ey.fsf@jedbrown.org> Xiangdong writes: > I have a question on dynamic load balance in petsc. I started running a > simulation with one partition. As the simulation goes on, that partition > may lead to load imbalance since it is a non-steady problem. If it is worth > to perform the load balance, is there an easy way to re-partition the mesh > and continue the simulation? Are you using a PETSc DM? What "mesh"? If you own it, then repartitioning it is entirely your business. In general, after adapting the mesh, you rebuild all algebraic data structures. Solvers can be reset (SNESReset, etc.). -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From bikash at umich.edu Thu Jan 28 01:32:15 2016 From: bikash at umich.edu (Bikash Kanungo) Date: Thu, 28 Jan 2016 02:32:15 -0500 Subject: [petsc-users] MPI_AllReduce error with -xcore-avx2 flags Message-ID: Hi, I was trying to use BVOrthogonalize() function in SLEPc. For smaller problems (10-20 vectors of length < 20,000) I'm able to use it without any trouble. For larger problems ( > 150 vectors of length > 400,000) the code aborts citing an MPI_AllReduce error with following message: Scalar value must be same on all processes, argument # 3. I was skeptical that the PETSc compilation might be faulty and tried to build a minimalistic version omitting the previously used -xcore-avx2 flags in CFLAGS abd CXXFLAGS. That seemed to have done the cure. What perplexes me is that I have been using the same code with -xcore-avx2 flags in PETSc build on a local cluster at the University of Michigan without any problem. It is only until recently when I moved to Xsede's Comet machine, that I started getting this MPI_AllReduce error with -xcore-avx2. Do you have any clue on why the same PETSc build fails on two different machines just because of a build flag? Regards, Bikash -- Bikash S. Kanungo PhD Student Computational Materials Physics Group Mechanical Engineering University of Michigan -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Thu Jan 28 01:56:42 2016 From: jroman at dsic.upv.es (Jose E. Roman) Date: Thu, 28 Jan 2016 08:56:42 +0100 Subject: [petsc-users] MPI_AllReduce error with -xcore-avx2 flags In-Reply-To: References: Message-ID: > El 28 ene 2016, a las 8:32, Bikash Kanungo escribi?: > > Hi, > > I was trying to use BVOrthogonalize() function in SLEPc. For smaller problems (10-20 vectors of length < 20,000) I'm able to use it without any trouble. For larger problems ( > 150 vectors of length > 400,000) the code aborts citing an MPI_AllReduce error with following message: > > Scalar value must be same on all processes, argument # 3. > > I was skeptical that the PETSc compilation might be faulty and tried to build a minimalistic version omitting the previously used -xcore-avx2 flags in CFLAGS abd CXXFLAGS. That seemed to have done the cure. > > What perplexes me is that I have been using the same code with -xcore-avx2 flags in PETSc build on a local cluster at the University of Michigan without any problem. It is only until recently when I moved to Xsede's Comet machine, that I started getting this MPI_AllReduce error with -xcore-avx2. > > Do you have any clue on why the same PETSc build fails on two different machines just because of a build flag? > > Regards, > Bikash > > -- > Bikash S. Kanungo > PhD Student > Computational Materials Physics Group > Mechanical Engineering > University of Michigan > Without the complete error message I cannot tell the exact point where it is failing. Jose From bikash at umich.edu Thu Jan 28 02:13:38 2016 From: bikash at umich.edu (Bikash Kanungo) Date: Thu, 28 Jan 2016 03:13:38 -0500 Subject: [petsc-users] MPI_AllReduce error with -xcore-avx2 flags In-Reply-To: References: Message-ID: Hi Jose, Here is the complete error message: [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Invalid argument [0]PETSC ERROR: Scalar value must be same on all processes, argument # 3 [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.5.2, Sep, 08, 2014 [0]PETSC ERROR: Unknown Name on a intel-openmpi_ib named comet-03-60.sdsc.edu by bikashk Thu Jan 28 00:09:17 2016 [0]PETSC ERROR: Configure options CFLAGS="-fPIC -xcore-avx2" FFLAGS="-fPIC -xcore-avx2" CXXFLAGS="-fPIC -xcore-avx2" --prefix=/opt/petsc/intel/openmpi_ib --with-mpi=true --download-pastix=../pastix_5.2.2.12.tar.bz2 --download-ptscotch=../scotch_6.0.0_esmumps.tar.gz --with-blas-lib="-Wl,--start-group /opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64/libmkl_intel_lp64.a /opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64/libmkl_sequential.a /opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64/libmkl_core.a -Wl,--end-group -lpthread -lm" --with-lapack-lib="-Wl,--start-group /opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64/libmkl_intel_lp64.a /opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64/libmkl_sequential.a /opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64/libmkl_core.a -Wl,--end-group -lpthread -lm" --with-superlu_dist-include=/opt/superlu/intel/openmpi_ib/include --with-superlu_dist-lib="-L/opt/superlu/intel/openmpi_ib/lib -lsuperlu" --with-parmetis-dir=/opt/parmetis/intel/openmpi_ib --with-metis-dir=/opt/parmetis/intel/openmpi_ib --with-mpi-dir=/opt/openmpi/intel/ib --with-scalapack-dir=/opt/scalapack/intel/openmpi_ib --download-mumps=../MUMPS_4.10.0-p3.tar.gz --download-blacs=../blacs-dev.tar.gz --download-fblaslapack=../fblaslapack-3.4.2.tar.gz --with-pic=true --with-shared-libraries=1 --with-hdf5=true --with-hdf5-dir=/opt/hdf5/intel/openmpi_ib --with-debugging=false [0]PETSC ERROR: #1 BVScaleColumn() line 380 in /scratch/build/git/math-roll/BUILD/sdsc-slepc_intel_openmpi_ib-3.5.3/slepc-3.5.3/src/sys/classes/bv/interface/bvops.c [0]PETSC ERROR: #2 BVOrthogonalize_GS() line 474 in /scratch/build/git/math-roll/BUILD/sdsc-slepc_intel_openmpi_ib-3.5.3/slepc-3.5.3/src/sys/classes/bv/interface/bvorthog.c [0]PETSC ERROR: #3 BVOrthogonalize() line 535 in /scratch/build/git/math-roll/BUILD/sdsc-slepc_intel_openmpi_ib-3.5.3/slepc-3.5.3/src/sys/classes/bv/interface/bvorthog.c [comet-03-60:27927] *** Process received signal *** [comet-03-60:27927] Signal: Aborted (6) On Thu, Jan 28, 2016 at 2:56 AM, Jose E. Roman wrote: > > > El 28 ene 2016, a las 8:32, Bikash Kanungo escribi?: > > > > Hi, > > > > I was trying to use BVOrthogonalize() function in SLEPc. For smaller > problems (10-20 vectors of length < 20,000) I'm able to use it without any > trouble. For larger problems ( > 150 vectors of length > 400,000) the code > aborts citing an MPI_AllReduce error with following message: > > > > Scalar value must be same on all processes, argument # 3. > > > > I was skeptical that the PETSc compilation might be faulty and tried to > build a minimalistic version omitting the previously used -xcore-avx2 flags > in CFLAGS abd CXXFLAGS. That seemed to have done the cure. > > > > What perplexes me is that I have been using the same code with > -xcore-avx2 flags in PETSc build on a local cluster at the University of > Michigan without any problem. It is only until recently when I moved to > Xsede's Comet machine, that I started getting this MPI_AllReduce error with > -xcore-avx2. > > > > Do you have any clue on why the same PETSc build fails on two different > machines just because of a build flag? > > > > Regards, > > Bikash > > > > -- > > Bikash S. Kanungo > > PhD Student > > Computational Materials Physics Group > > Mechanical Engineering > > University of Michigan > > > > Without the complete error message I cannot tell the exact point where it > is failing. > Jose > > -- Bikash S. Kanungo PhD Student Computational Materials Physics Group Mechanical Engineering University of Michigan -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.magri at dicea.unipd.it Thu Jan 28 03:04:17 2016 From: victor.magri at dicea.unipd.it (victor.magri at dicea.unipd.it) Date: Thu, 28 Jan 2016 10:04:17 +0100 Subject: [petsc-users] PETSc interface to MueLu Message-ID: <12e160516897f8a313d892a35bd8a438@dicea.unipd.it> Dear PETSc developers, is it possible to create an interface for MueLu (given its dependencies to other Trilinos packages)? Do you plan to do that in the future? Thank you! -- Victor A. P. Magri - PhD student Dept. of Civil, Environmental and Architectural Eng. University of Padova Via Marzolo, 9 - 35131 Padova, Italy -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Thu Jan 28 04:18:25 2016 From: jroman at dsic.upv.es (Jose E. Roman) Date: Thu, 28 Jan 2016 11:18:25 +0100 Subject: [petsc-users] MPI_AllReduce error with -xcore-avx2 flags In-Reply-To: References: Message-ID: > El 28 ene 2016, a las 9:13, Bikash Kanungo escribi?: > > Hi Jose, > > Here is the complete error message: > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Invalid argument > [0]PETSC ERROR: Scalar value must be same on all processes, argument # 3 > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.5.2, Sep, 08, 2014 > [0]PETSC ERROR: Unknown Name on a intel-openmpi_ib named comet-03-60.sdsc.edu by bikashk Thu Jan 28 00:09:17 2016 > [0]PETSC ERROR: Configure options CFLAGS="-fPIC -xcore-avx2" FFLAGS="-fPIC -xcore-avx2" CXXFLAGS="-fPIC -xcore-avx2" --prefix=/opt/petsc/intel/openmpi_ib --with-mpi=true --download-pastix=../pastix_5.2.2.12.tar.bz2 --download-ptscotch=../scotch_6.0.0_esmumps.tar.gz --with-blas-lib="-Wl,--start-group /opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64/libmkl_intel_lp64.a /opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64/libmkl_sequential.a /opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64/libmkl_core.a -Wl,--end-group -lpthread -lm" --with-lapack-lib="-Wl,--start-group /opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64/libmkl_intel_lp64.a /opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64/libmkl_sequential.a /opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64/libmkl_core.a -Wl,--end-group -lpthread -lm" --with-superlu_dist-include=/opt/superlu/intel/openmpi_ib/include --with-superlu_dist-lib="-L/opt/superlu/intel/openmpi_ib/lib -lsuperlu" --with-parmetis-dir=/opt/parmetis/intel/openmpi_ib --with-metis-dir=/opt/parmetis/intel/openmpi_ib --with-mpi-dir=/opt/openmpi/intel/ib --with-scalapack-dir=/opt/scalapack/intel/openmpi_ib --download-mumps=../MUMPS_4.10.0-p3.tar.gz --download-blacs=../blacs-dev.tar.gz --download-fblaslapack=../fblaslapack-3.4.2.tar.gz --with-pic=true --with-shared-libraries=1 --with-hdf5=true --with-hdf5-dir=/opt/hdf5/intel/openmpi_ib --with-debugging=false > [0]PETSC ERROR: #1 BVScaleColumn() line 380 in /scratch/build/git/math-roll/BUILD/sdsc-slepc_intel_openmpi_ib-3.5.3/slepc-3.5.3/src/sys/classes/bv/interface/bvops.c > [0]PETSC ERROR: #2 BVOrthogonalize_GS() line 474 in /scratch/build/git/math-roll/BUILD/sdsc-slepc_intel_openmpi_ib-3.5.3/slepc-3.5.3/src/sys/classes/bv/interface/bvorthog.c > [0]PETSC ERROR: #3 BVOrthogonalize() line 535 in /scratch/build/git/math-roll/BUILD/sdsc-slepc_intel_openmpi_ib-3.5.3/slepc-3.5.3/src/sys/classes/bv/interface/bvorthog.c > [comet-03-60:27927] *** Process received signal *** > [comet-03-60:27927] Signal: Aborted (6) > > Here are some comments: - These kind of errors appear only in debugging mode. I don't know why you are getting them since you have --with-debugging=false - The flag -xcore-avx2 enables fused multiply-add (FMA) instructions, which means you get slightly more accurate floating-point results. This could explain why you get different behaviour with/without this flag. - The argument of BVScaleColumn() is guaranteed to be the same in all processes, so the only explanation is that it has become a NaN. [Note that in petsc-master (and hence petsc-3.7) NaN's no longer trigger this error.] - My conclusion is that your column vectors of the BV object are not linearly independent, so eventually the vector norm is (almost) zero. The error will appear only if the computed value is exactly zero. In summary: BVOrthogonalize() is new in SLEPc, and it is not very well tested. In particular, linearly dependent vectors are not handled well. For the next release I will add code to take into account rank-deficient BV's. In the meantime, you may want to try running with '-bv_orthog_block chol' (it uses a different orthogonalization algorithm). Jose From bikash at umich.edu Thu Jan 28 04:45:44 2016 From: bikash at umich.edu (Bikash Kanungo) Date: Thu, 28 Jan 2016 05:45:44 -0500 Subject: [petsc-users] MPI_AllReduce error with -xcore-avx2 flags In-Reply-To: References: Message-ID: Yeah I suspected linear dependence. But I was puzzled by the error occurring in one machine and not the other. But even on the machine that it failed, it failed for some runs and passed successfully for others. So it suggests that the vector norm is almost zero in certain cases (i.e, in the runs that survive) and zero in others (i.e., the runs that fail). I'll use -bv_orthog_block chol to see if the error persists. Thanks a ton, Jose. Regards, Bikash On Thu, Jan 28, 2016 at 5:18 AM, Jose E. Roman wrote: > > > El 28 ene 2016, a las 9:13, Bikash Kanungo escribi?: > > > > Hi Jose, > > > > Here is the complete error message: > > > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [0]PETSC ERROR: Invalid argument > > [0]PETSC ERROR: Scalar value must be same on all processes, argument # 3 > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > > [0]PETSC ERROR: Petsc Release Version 3.5.2, Sep, 08, 2014 > > [0]PETSC ERROR: Unknown Name on a intel-openmpi_ib named > comet-03-60.sdsc.edu by bikashk Thu Jan 28 00:09:17 2016 > > [0]PETSC ERROR: Configure options CFLAGS="-fPIC -xcore-avx2" > FFLAGS="-fPIC -xcore-avx2" CXXFLAGS="-fPIC -xcore-avx2" > --prefix=/opt/petsc/intel/openmpi_ib --with-mpi=true > --download-pastix=../pastix_5.2.2.12.tar.bz2 > --download-ptscotch=../scotch_6.0.0_esmumps.tar.gz > --with-blas-lib="-Wl,--start-group > /opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64/libmkl_intel_lp64.a > > /opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64/libmkl_sequential.a > /opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64/libmkl_core.a > -Wl,--end-group -lpthread -lm" --with-lapack-lib="-Wl,--start-group > /opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64/libmkl_intel_lp64.a > > /opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64/libmkl_sequential.a > /opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64/libmkl_core.a > -Wl,--end-group -lpthread -lm" > --with-superlu_dist-include=/opt/superlu/intel/openmpi_ib/include > --with-superlu_dist-lib="-L/opt/superlu/intel/openmpi_ib/lib -lsuperlu" > --with-parmetis-dir=/opt/parmetis/intel/openmpi_ib > --with-metis-dir=/opt/parmetis/intel/openmpi_ib > --with-mpi-dir=/opt/openmpi/intel/ib > --with-scalapack-dir=/opt/scalapack/intel/openmpi_ib > --download-mumps=../MUMPS_4.10.0-p3.tar.gz > --download-blacs=../blacs-dev.tar.gz > --download-fblaslapack=../fblaslapack-3.4.2.tar.gz --with-pic=true > --with-shared-libraries=1 --with-hdf5=true > --with-hdf5-dir=/opt/hdf5/intel/openmpi_ib --with-debugging=false > > [0]PETSC ERROR: #1 BVScaleColumn() line 380 in > /scratch/build/git/math-roll/BUILD/sdsc-slepc_intel_openmpi_ib-3.5.3/slepc-3.5.3/src/sys/classes/bv/interface/bvops.c > > [0]PETSC ERROR: #2 BVOrthogonalize_GS() line 474 in > /scratch/build/git/math-roll/BUILD/sdsc-slepc_intel_openmpi_ib-3.5.3/slepc-3.5.3/src/sys/classes/bv/interface/bvorthog.c > > [0]PETSC ERROR: #3 BVOrthogonalize() line 535 in > /scratch/build/git/math-roll/BUILD/sdsc-slepc_intel_openmpi_ib-3.5.3/slepc-3.5.3/src/sys/classes/bv/interface/bvorthog.c > > [comet-03-60:27927] *** Process received signal *** > > [comet-03-60:27927] Signal: Aborted (6) > > > > > > Here are some comments: > - These kind of errors appear only in debugging mode. I don't know why you > are getting them since you have --with-debugging=false > - The flag -xcore-avx2 enables fused multiply-add (FMA) instructions, > which means you get slightly more accurate floating-point results. This > could explain why you get different behaviour with/without this flag. > - The argument of BVScaleColumn() is guaranteed to be the same in all > processes, so the only explanation is that it has become a NaN. [Note that > in petsc-master (and hence petsc-3.7) NaN's no longer trigger this error.] > - My conclusion is that your column vectors of the BV object are not > linearly independent, so eventually the vector norm is (almost) zero. The > error will appear only if the computed value is exactly zero. > > In summary: BVOrthogonalize() is new in SLEPc, and it is not very well > tested. In particular, linearly dependent vectors are not handled well. For > the next release I will add code to take into account rank-deficient BV's. > In the meantime, you may want to try running with '-bv_orthog_block chol' > (it uses a different orthogonalization algorithm). > > Jose > > -- Bikash S. Kanungo PhD Student Computational Materials Physics Group Mechanical Engineering University of Michigan -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Thu Jan 28 10:32:53 2016 From: hzhang at mcs.anl.gov (Hong) Date: Thu, 28 Jan 2016 10:32:53 -0600 Subject: [petsc-users] PETSc interface to MueLu In-Reply-To: <12e160516897f8a313d892a35bd8a438@dicea.unipd.it> References: <12e160516897f8a313d892a35bd8a438@dicea.unipd.it> Message-ID: Victor, What are the differences between MueLu and ML? Hong On Thu, Jan 28, 2016 at 3:04 AM, wrote: > Dear PETSc developers, > > is it possible to create an interface for MueLu (given its dependencies to > other Trilinos packages)? Do you plan to do that in the future? > > Thank you! > -- > Victor A. P. Magri - PhD student > Dept. of Civil, Environmental and Architectural Eng. > University of Padova > Via Marzolo, 9 - 35131 Padova, Italy > -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.magri at dicea.unipd.it Thu Jan 28 11:09:17 2016 From: victor.magri at dicea.unipd.it (victor.magri at dicea.unipd.it) Date: Thu, 28 Jan 2016 18:09:17 +0100 Subject: [petsc-users] PETSc interface to MueLu In-Reply-To: References: <12e160516897f8a313d892a35bd8a438@dicea.unipd.it> Message-ID: <47f4ad28cd4241b61690f8bf2dc8e9e7@dicea.unipd.it> Dear Hong, According to this link http://www.fastmath-scidac.org/software/mlmuelu.html [1] MueLu is the sucessor to ML and should support a larger number of scalar types. Also according to this presentation https://cfwebprod.sandia.gov/cfdocs/CompResearch/docs/MueLuOverview_TUG2013.pdf [2] I suppose that MueLu would give a cleaner implementation of ML's features and possibly give a faster code. Also, as it supports the Kokkos library, we would have the possibility to run on MPI+threads or MPI+GPU. However, I think that implementing an interface for this could be a problem since PETSc works better with pure MPI, please correct me if I am wrong about this. Anyway, if it were possible, I just would like to try both multigrid implementations through PETSc and see how they behave. Thank you! Il 28-01-2016 17:32 Hong ha scritto: > Victor, > What are the differences between MueLu and ML? > Hong > > On Thu, Jan 28, 2016 at 3:04 AM, wrote: > >> Dear PETSc developers, >> >> is it possible to create an interface for MueLu (given its dependencies to other Trilinos packages)? Do you plan to do that in the future? >> >> Thank you! >> -- >> >> Victor A. P. Magri - PhD student >> Dept. of Civil, Environmental and Architectural Eng. >> University of Padova >> Via Marzolo, 9 - 35131 Padova, Italy -- Victor A. P. Magri - PhD student Dept. of Civil, Environmental and Architectural Eng. University of Padova Via Marzolo, 9 - 35131 Padova, Italy Links: ------ [1] http://www.fastmath-scidac.org/software/mlmuelu.html [2] https://cfwebprod.sandia.gov/cfdocs/CompResearch/docs/MueLuOverview_TUG2013.pdf -------------- next part -------------- An HTML attachment was scrubbed... URL: From epscodes at gmail.com Thu Jan 28 11:11:50 2016 From: epscodes at gmail.com (Xiangdong) Date: Thu, 28 Jan 2016 12:11:50 -0500 Subject: [petsc-users] repartition for dynamic load balancing In-Reply-To: <87lh7ap9ey.fsf@jedbrown.org> References: <87lh7ap9ey.fsf@jedbrown.org> Message-ID: Yes, it can be either DMDA or DMPlex. For example, I have 1D DMDA with Nx=10 and np=2. At the beginning each processor owns 5 cells. After some simulation time, I found that repartition the 10 cells into 3 and 7 is better for load balancing. Is there an easy/efficient way to migrate data from one partition to another partition? I am wondering whether there are some functions or libraries help me manage this redistribution. Thanks. Xiangdong On Thu, Jan 28, 2016 at 12:20 AM, Jed Brown wrote: > Xiangdong writes: > > > I have a question on dynamic load balance in petsc. I started running a > > simulation with one partition. As the simulation goes on, that partition > > may lead to load imbalance since it is a non-steady problem. If it is > worth > > to perform the load balance, is there an easy way to re-partition the > mesh > > and continue the simulation? > > Are you using a PETSc DM? What "mesh"? If you own it, then > repartitioning it is entirely your business. > > In general, after adapting the mesh, you rebuild all algebraic data > structures. Solvers can be reset (SNESReset, etc.). > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu Jan 28 11:21:52 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 28 Jan 2016 11:21:52 -0600 Subject: [petsc-users] repartition for dynamic load balancing In-Reply-To: References: <87lh7ap9ey.fsf@jedbrown.org> Message-ID: <3BE17FF7-4D30-48DF-B1AE-979C22E145DF@mcs.anl.gov> > On Jan 28, 2016, at 11:11 AM, Xiangdong wrote: > > Yes, it can be either DMDA or DMPlex. For example, I have 1D DMDA with Nx=10 and np=2. At the beginning each processor owns 5 cells. After some simulation time, I found that repartition the 10 cells into 3 and 7 is better for load balancing. Is there an easy/efficient way to migrate data from one partition to another partition? I am wondering whether there are some functions or libraries help me manage this redistribution. For DMDA we don't provide tools for doing this, nor do we expect to. For this type of need for dynamic migration we recommend using DMPlex or some external mesh management system. Barry > > Thanks. > Xiangdong > > On Thu, Jan 28, 2016 at 12:20 AM, Jed Brown wrote: > Xiangdong writes: > > > I have a question on dynamic load balance in petsc. I started running a > > simulation with one partition. As the simulation goes on, that partition > > may lead to load imbalance since it is a non-steady problem. If it is worth > > to perform the load balance, is there an easy way to re-partition the mesh > > and continue the simulation? > > Are you using a PETSc DM? What "mesh"? If you own it, then > repartitioning it is entirely your business. > > In general, after adapting the mesh, you rebuild all algebraic data > structures. Solvers can be reset (SNESReset, etc.). > From knepley at gmail.com Thu Jan 28 11:25:07 2016 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 28 Jan 2016 11:25:07 -0600 Subject: [petsc-users] PETSc interface to MueLu In-Reply-To: <12e160516897f8a313d892a35bd8a438@dicea.unipd.it> References: <12e160516897f8a313d892a35bd8a438@dicea.unipd.it> Message-ID: On Thu, Jan 28, 2016 at 3:04 AM, wrote: > Dear PETSc developers, > > is it possible to create an interface for MueLu (given its dependencies to > other Trilinos packages)? Do you plan to do that in the future? > > Right now, itsa not clear that this provides anything our ML interface does not. Matt > Thank you! > -- > Victor A. P. Magri - PhD student > Dept. of Civil, Environmental and Architectural Eng. > University of Padova > Via Marzolo, 9 - 35131 Padova, Italy > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From epscodes at gmail.com Thu Jan 28 11:36:42 2016 From: epscodes at gmail.com (Xiangdong) Date: Thu, 28 Jan 2016 12:36:42 -0500 Subject: [petsc-users] repartition for dynamic load balancing In-Reply-To: <3BE17FF7-4D30-48DF-B1AE-979C22E145DF@mcs.anl.gov> References: <87lh7ap9ey.fsf@jedbrown.org> <3BE17FF7-4D30-48DF-B1AE-979C22E145DF@mcs.anl.gov> Message-ID: What functions/tools can I use for dynamic migration in DMPlex framework? Can you also name some external mesh management systems? Thanks. Xiangdong On Thu, Jan 28, 2016 at 12:21 PM, Barry Smith wrote: > > > On Jan 28, 2016, at 11:11 AM, Xiangdong wrote: > > > > Yes, it can be either DMDA or DMPlex. For example, I have 1D DMDA with > Nx=10 and np=2. At the beginning each processor owns 5 cells. After some > simulation time, I found that repartition the 10 cells into 3 and 7 is > better for load balancing. Is there an easy/efficient way to migrate data > from one partition to another partition? I am wondering whether there are > some functions or libraries help me manage this redistribution. > > For DMDA we don't provide tools for doing this, nor do we expect to. For > this type of need for dynamic migration we recommend using DMPlex or some > external mesh management system. > > Barry > > > > > Thanks. > > Xiangdong > > > > On Thu, Jan 28, 2016 at 12:20 AM, Jed Brown wrote: > > Xiangdong writes: > > > > > I have a question on dynamic load balance in petsc. I started running a > > > simulation with one partition. As the simulation goes on, that > partition > > > may lead to load imbalance since it is a non-steady problem. If it is > worth > > > to perform the load balance, is there an easy way to re-partition the > mesh > > > and continue the simulation? > > > > Are you using a PETSc DM? What "mesh"? If you own it, then > > repartitioning it is entirely your business. > > > > In general, after adapting the mesh, you rebuild all algebraic data > > structures. Solvers can be reset (SNESReset, etc.). > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Jan 28 11:47:49 2016 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 28 Jan 2016 11:47:49 -0600 Subject: [petsc-users] repartition for dynamic load balancing In-Reply-To: References: <87lh7ap9ey.fsf@jedbrown.org> <3BE17FF7-4D30-48DF-B1AE-979C22E145DF@mcs.anl.gov> Message-ID: On Thu, Jan 28, 2016 at 11:36 AM, Xiangdong wrote: > What functions/tools can I use for dynamic migration in DMPlex framework? > In this paper, http://arxiv.org/abs/1506.06194, we explain how to use the DMPlexMigrate() function to redistribute data. In the future, its likely we will add a function that wraps it up with determination of the new partition at the same time. > Can you also name some external mesh management systems? Thanks. > I will note that if load balance in the solve is your only concern, PCTelescope can redistribute the DMDA solve. Thanks, Matt > > Xiangdong > > On Thu, Jan 28, 2016 at 12:21 PM, Barry Smith wrote: > >> >> > On Jan 28, 2016, at 11:11 AM, Xiangdong wrote: >> > >> > Yes, it can be either DMDA or DMPlex. For example, I have 1D DMDA with >> Nx=10 and np=2. At the beginning each processor owns 5 cells. After some >> simulation time, I found that repartition the 10 cells into 3 and 7 is >> better for load balancing. Is there an easy/efficient way to migrate data >> from one partition to another partition? I am wondering whether there are >> some functions or libraries help me manage this redistribution. >> >> For DMDA we don't provide tools for doing this, nor do we expect to. >> For this type of need for dynamic migration we recommend using DMPlex or >> some external mesh management system. >> >> Barry >> >> > >> > Thanks. >> > Xiangdong >> > >> > On Thu, Jan 28, 2016 at 12:20 AM, Jed Brown wrote: >> > Xiangdong writes: >> > >> > > I have a question on dynamic load balance in petsc. I started running >> a >> > > simulation with one partition. As the simulation goes on, that >> partition >> > > may lead to load imbalance since it is a non-steady problem. If it is >> worth >> > > to perform the load balance, is there an easy way to re-partition the >> mesh >> > > and continue the simulation? >> > >> > Are you using a PETSc DM? What "mesh"? If you own it, then >> > repartitioning it is entirely your business. >> > >> > In general, after adapting the mesh, you rebuild all algebraic data >> > structures. Solvers can be reset (SNESReset, etc.). >> > >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.mayhem23 at gmail.com Thu Jan 28 13:37:49 2016 From: dave.mayhem23 at gmail.com (Dave May) Date: Thu, 28 Jan 2016 20:37:49 +0100 Subject: [petsc-users] repartition for dynamic load balancing In-Reply-To: References: <87lh7ap9ey.fsf@jedbrown.org> <3BE17FF7-4D30-48DF-B1AE-979C22E145DF@mcs.anl.gov> Message-ID: On Thursday, 28 January 2016, Matthew Knepley > wrote: > On Thu, Jan 28, 2016 at 11:36 AM, Xiangdong wrote: > >> What functions/tools can I use for dynamic migration in DMPlex framework? >> > > In this paper, http://arxiv.org/abs/1506.06194, we explain how to use the > DMPlexMigrate() function to redistribute data. > In the future, its likely we will add a function that wraps it up with > determination of the new partition at the same time. > > >> Can you also name some external mesh management systems? Thanks. >> > > I will note that if load balance in the solve is your only concern, > PCTelescope can redistribute the DMDA solve. > Currently Telescope will only repartition 2d and 3d DMDA's. It does perform data migration and allows users to specify the number of ranks to be used in each I,j,k direction via -xxx_grid_x etc. I wouldn't say it supports "load balancing", as there is no mechanism to define number of points in each sub-domain Cheers Dave > > Thanks, > > Matt > > >> >> Xiangdong >> >> On Thu, Jan 28, 2016 at 12:21 PM, Barry Smith wrote: >> >>> >>> > On Jan 28, 2016, at 11:11 AM, Xiangdong wrote: >>> > >>> > Yes, it can be either DMDA or DMPlex. For example, I have 1D DMDA with >>> Nx=10 and np=2. At the beginning each processor owns 5 cells. After some >>> simulation time, I found that repartition the 10 cells into 3 and 7 is >>> better for load balancing. Is there an easy/efficient way to migrate data >>> from one partition to another partition? I am wondering whether there are >>> some functions or libraries help me manage this redistribution. >>> >>> For DMDA we don't provide tools for doing this, nor do we expect to. >>> For this type of need for dynamic migration we recommend using DMPlex or >>> some external mesh management system. >>> >>> Barry >>> >>> > >>> > Thanks. >>> > Xiangdong >>> > >>> > On Thu, Jan 28, 2016 at 12:20 AM, Jed Brown wrote: >>> > Xiangdong writes: >>> > >>> > > I have a question on dynamic load balance in petsc. I started >>> running a >>> > > simulation with one partition. As the simulation goes on, that >>> partition >>> > > may lead to load imbalance since it is a non-steady problem. If it >>> is worth >>> > > to perform the load balance, is there an easy way to re-partition >>> the mesh >>> > > and continue the simulation? >>> > >>> > Are you using a PETSc DM? What "mesh"? If you own it, then >>> > repartitioning it is entirely your business. >>> > >>> > In general, after adapting the mesh, you rebuild all algebraic data >>> > structures. Solvers can be reset (SNESReset, etc.). >>> > >>> >>> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Jan 28 13:41:45 2016 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 28 Jan 2016 13:41:45 -0600 Subject: [petsc-users] repartition for dynamic load balancing In-Reply-To: References: <87lh7ap9ey.fsf@jedbrown.org> <3BE17FF7-4D30-48DF-B1AE-979C22E145DF@mcs.anl.gov> Message-ID: On Thu, Jan 28, 2016 at 1:37 PM, Dave May wrote: > > > On Thursday, 28 January 2016, Matthew Knepley wrote: > >> On Thu, Jan 28, 2016 at 11:36 AM, Xiangdong wrote: >> >>> What functions/tools can I use for dynamic migration in DMPlex framework? >>> >> >> In this paper, http://arxiv.org/abs/1506.06194, we explain how to use >> the DMPlexMigrate() function to redistribute data. >> In the future, its likely we will add a function that wraps it up with >> determination of the new partition at the same time. >> >> >>> Can you also name some external mesh management systems? Thanks. >>> >> >> I will note that if load balance in the solve is your only concern, >> PCTelescope can redistribute the DMDA solve. >> > > Currently Telescope will only repartition 2d and 3d DMDA's. It > does perform data migration and allows users to specify the number of ranks > to be used in each I,j,k direction via -xxx_grid_x etc. I wouldn't say it > supports "load balancing", as there is no mechanism to define number of > points in each sub-domain > Let me be more precise. All I have suggested for any of this are redistribution tools. You will have to determine the right weights for "load balance", which I think is always true. Using the default weights is crazy. Matt > Cheers > Dave > > > >> >> Thanks, >> >> Matt >> >> >>> >>> Xiangdong >>> >>> On Thu, Jan 28, 2016 at 12:21 PM, Barry Smith >>> wrote: >>> >>>> >>>> > On Jan 28, 2016, at 11:11 AM, Xiangdong wrote: >>>> > >>>> > Yes, it can be either DMDA or DMPlex. For example, I have 1D DMDA >>>> with Nx=10 and np=2. At the beginning each processor owns 5 cells. After >>>> some simulation time, I found that repartition the 10 cells into 3 and 7 is >>>> better for load balancing. Is there an easy/efficient way to migrate data >>>> from one partition to another partition? I am wondering whether there are >>>> some functions or libraries help me manage this redistribution. >>>> >>>> For DMDA we don't provide tools for doing this, nor do we expect to. >>>> For this type of need for dynamic migration we recommend using DMPlex or >>>> some external mesh management system. >>>> >>>> Barry >>>> >>>> > >>>> > Thanks. >>>> > Xiangdong >>>> > >>>> > On Thu, Jan 28, 2016 at 12:20 AM, Jed Brown wrote: >>>> > Xiangdong writes: >>>> > >>>> > > I have a question on dynamic load balance in petsc. I started >>>> running a >>>> > > simulation with one partition. As the simulation goes on, that >>>> partition >>>> > > may lead to load imbalance since it is a non-steady problem. If it >>>> is worth >>>> > > to perform the load balance, is there an easy way to re-partition >>>> the mesh >>>> > > and continue the simulation? >>>> > >>>> > Are you using a PETSc DM? What "mesh"? If you own it, then >>>> > repartitioning it is entirely your business. >>>> > >>>> > In general, after adapting the mesh, you rebuild all algebraic data >>>> > structures. Solvers can be reset (SNESReset, etc.). >>>> > >>>> >>>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From epscodes at gmail.com Thu Jan 28 15:02:45 2016 From: epscodes at gmail.com (Xiangdong) Date: Thu, 28 Jan 2016 16:02:45 -0500 Subject: [petsc-users] repartition for dynamic load balancing In-Reply-To: References: <87lh7ap9ey.fsf@jedbrown.org> <3BE17FF7-4D30-48DF-B1AE-979C22E145DF@mcs.anl.gov> Message-ID: I am thinking to use parmetis to repartition the mesh (based on new updated weights for vertices), and use some functions (maybe DMPlexMigrate) to redistribute the data. I will look into Matt's paper to see whether it is possible. Thanks. Xiangdong On Thu, Jan 28, 2016 at 2:41 PM, Matthew Knepley wrote: > On Thu, Jan 28, 2016 at 1:37 PM, Dave May wrote: > >> >> >> On Thursday, 28 January 2016, Matthew Knepley wrote: >> >>> On Thu, Jan 28, 2016 at 11:36 AM, Xiangdong wrote: >>> >>>> What functions/tools can I use for dynamic migration in DMPlex >>>> framework? >>>> >>> >>> In this paper, http://arxiv.org/abs/1506.06194, we explain how to use >>> the DMPlexMigrate() function to redistribute data. >>> In the future, its likely we will add a function that wraps it up with >>> determination of the new partition at the same time. >>> >>> >>>> Can you also name some external mesh management systems? Thanks. >>>> >>> >>> I will note that if load balance in the solve is your only concern, >>> PCTelescope can redistribute the DMDA solve. >>> >> >> Currently Telescope will only repartition 2d and 3d DMDA's. It >> does perform data migration and allows users to specify the number of ranks >> to be used in each I,j,k direction via -xxx_grid_x etc. I wouldn't say it >> supports "load balancing", as there is no mechanism to define number of >> points in each sub-domain >> > > Let me be more precise. All I have suggested for any of this are > redistribution tools. You will have to determine > the right weights for "load balance", which I think is always true. Using > the default weights is crazy. > > Matt > > >> Cheers >> Dave >> >> >> >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> >>>> Xiangdong >>>> >>>> On Thu, Jan 28, 2016 at 12:21 PM, Barry Smith >>>> wrote: >>>> >>>>> >>>>> > On Jan 28, 2016, at 11:11 AM, Xiangdong wrote: >>>>> > >>>>> > Yes, it can be either DMDA or DMPlex. For example, I have 1D DMDA >>>>> with Nx=10 and np=2. At the beginning each processor owns 5 cells. After >>>>> some simulation time, I found that repartition the 10 cells into 3 and 7 is >>>>> better for load balancing. Is there an easy/efficient way to migrate data >>>>> from one partition to another partition? I am wondering whether there are >>>>> some functions or libraries help me manage this redistribution. >>>>> >>>>> For DMDA we don't provide tools for doing this, nor do we expect to. >>>>> For this type of need for dynamic migration we recommend using DMPlex or >>>>> some external mesh management system. >>>>> >>>>> Barry >>>>> >>>>> > >>>>> > Thanks. >>>>> > Xiangdong >>>>> > >>>>> > On Thu, Jan 28, 2016 at 12:20 AM, Jed Brown >>>>> wrote: >>>>> > Xiangdong writes: >>>>> > >>>>> > > I have a question on dynamic load balance in petsc. I started >>>>> running a >>>>> > > simulation with one partition. As the simulation goes on, that >>>>> partition >>>>> > > may lead to load imbalance since it is a non-steady problem. If it >>>>> is worth >>>>> > > to perform the load balance, is there an easy way to re-partition >>>>> the mesh >>>>> > > and continue the simulation? >>>>> > >>>>> > Are you using a PETSc DM? What "mesh"? If you own it, then >>>>> > repartitioning it is entirely your business. >>>>> > >>>>> > In general, after adapting the mesh, you rebuild all algebraic data >>>>> > structures. Solvers can be reset (SNESReset, etc.). >>>>> > >>>>> >>>>> >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From amneetb at live.unc.edu Thu Jan 28 16:53:13 2016 From: amneetb at live.unc.edu (Bhalla, Amneet Pal S) Date: Thu, 28 Jan 2016 22:53:13 +0000 Subject: [petsc-users] MatCreateSeqDense Message-ID: <629BA9C4-5294-42B3-AA82-F404CBE027F1@ad.unc.edu> Hi Folks, Is there a way to get back the user allocated raw data pointer (column-major order) used in creating MatCreateSeqDense() from the Mat object? Thanks, --Amneet From knepley at gmail.com Thu Jan 28 17:06:34 2016 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 28 Jan 2016 17:06:34 -0600 Subject: [petsc-users] MatCreateSeqDense In-Reply-To: <629BA9C4-5294-42B3-AA82-F404CBE027F1@ad.unc.edu> References: <629BA9C4-5294-42B3-AA82-F404CBE027F1@ad.unc.edu> Message-ID: On Thu, Jan 28, 2016 at 4:53 PM, Bhalla, Amneet Pal S wrote: > Hi Folks, > > Is there a way to get back the user allocated raw data pointer > (column-major order) used in creating MatCreateSeqDense() from the Mat > object? > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatDenseGetArray.html Matt > Thanks, > --Amneet -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From amneetb at live.unc.edu Thu Jan 28 18:23:17 2016 From: amneetb at live.unc.edu (Bhalla, Amneet Pal S) Date: Fri, 29 Jan 2016 00:23:17 +0000 Subject: [petsc-users] MatCreateSeqDense In-Reply-To: References: <629BA9C4-5294-42B3-AA82-F404CBE027F1@ad.unc.edu> Message-ID: Thanks! Another related question: If I do something like this: double* data; // do stuff with data data[i] = ... Mat A; MatCreateSeqDense(...,data,..., &A); // do more stuff with data data[i] = .. Now would the matrix A reflect the change (i.e updated A[i][j]) without making an explicit call to PetscObjectStateIncrease((PetscObject)A)? On Jan 28, 2016, at 3:06 PM, Matthew Knepley > wrote: On Thu, Jan 28, 2016 at 4:53 PM, Bhalla, Amneet Pal S > wrote: Hi Folks, Is there a way to get back the user allocated raw data pointer (column-major order) used in creating MatCreateSeqDense() from the Mat object? http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatDenseGetArray.html Matt Thanks, --Amneet -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu Jan 28 18:26:45 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 28 Jan 2016 18:26:45 -0600 Subject: [petsc-users] MatCreateSeqDense In-Reply-To: References: <629BA9C4-5294-42B3-AA82-F404CBE027F1@ad.unc.edu> Message-ID: > On Jan 28, 2016, at 6:23 PM, Bhalla, Amneet Pal S wrote: > > Thanks! > > Another related question: If I do something like this: > > double* data; > > // do stuff with data > data[i] = ... > > Mat A; > MatCreateSeqDense(...,data,..., &A); > > // do more stuff with data > data[i] = .. > > Now would the matrix A reflect the change (i.e updated A[i][j]) without making an explicit call to PetscObjectStateIncrease((PetscObject)A)? The values in the matrix will be different but the matrix object will not know you have changed anything and hence strange stuff might happen. We highly recommend that after you have created the matrix you do not use the data[] array anymore. Instead call MatDenseGetArray() change stuff MatDenseRestoreArray() this automatically increases the matrix state and is cleaner code anyways (calling MatDenseGetArray() is super fast because it only gets the pointer you already provided). Barry > > >> On Jan 28, 2016, at 3:06 PM, Matthew Knepley wrote: >> >> On Thu, Jan 28, 2016 at 4:53 PM, Bhalla, Amneet Pal S wrote: >> Hi Folks, >> >> Is there a way to get back the user allocated raw data pointer (column-major order) used in creating MatCreateSeqDense() from the Mat object? >> >> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatDenseGetArray.html >> >> Matt >> >> Thanks, >> --Amneet >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener > From bisheshkh at gmail.com Fri Jan 29 11:22:12 2016 From: bisheshkh at gmail.com (Bishesh Khanal) Date: Fri, 29 Jan 2016 18:22:12 +0100 Subject: [petsc-users] no protocol specified Message-ID: Hello, I installed petsc today in our new cluster environment, everything looked fine except for several mpi related deprecated function warnings such as below: /data/asclepios/user/bkhanal/softwares/petsc/src/sys/objects/pname.c: In function ?PetscErrorCode PetscObjectName(PetscObject)?: /data/asclepios/user/bkhanal/softwares/petsc/src/sys/objects/pname.c:128:12: warning: ?int MPI_Attr_get(MPI_Comm, int, void*, int*)? is deprecated (declared at /opt/openmpi/gcc/current/include/mpi.h:1227): MPI_Attr_get is superseded by MPI_Comm_get_attr in MPI-2.0 [-Wdeprecated-declarations] ierr = MPI_Attr_get(obj->comm,Petsc_Counter_keyval,(void*)&counter,&flg);CHKERRQ(ierr); When I ran make test after installation, I got the following results: Running test examples to verify correct installation Using PETSC_DIR=/data/asclepios/user/bkhanal/softwares/petscInstalledDebug and PETSC_ARCH= Possible error running C/C++ src/snes/examples/tutorials/ex19 with 1 MPI process See http://www.mcs.anl.gov/petsc/documentation/faq.html No protocol specified No protocol specified lid velocity = 0.0016, prandtl # = 1, grashof # = 1 Number of SNES iterations = 2 Possible error running C/C++ src/snes/examples/tutorials/ex19 with 2 MPI processes See http://www.mcs.anl.gov/petsc/documentation/faq.html No protocol specified No protocol specified lid velocity = 0.0016, prandtl # = 1, grashof # = 1 Number of SNES iterations = 2 Possible error running Fortran example src/snes/examples/tutorials/ex5f with 1 MPI process See http://www.mcs.anl.gov/petsc/documentation/faq.html No protocol specified No protocol specified Number of SNES iterations = 4 Completed test examples ========================================= I also tested one of my codes with this new setup. It seems to give me correct results but the output also displays No protocol specified (twice). Is this a mere warning or should I worry about it ? Thanks, Bishesh -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Jan 29 11:45:28 2016 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 29 Jan 2016 11:45:28 -0600 Subject: [petsc-users] no protocol specified In-Reply-To: References: Message-ID: On Fri, Jan 29, 2016 at 11:22 AM, Bishesh Khanal wrote: > Hello, > I installed petsc today in our new cluster environment, everything looked > fine except for several mpi related deprecated function warnings such as > below: > > /data/asclepios/user/bkhanal/softwares/petsc/src/sys/objects/pname.c: In > function ?PetscErrorCode PetscObjectName(PetscObject)?: > /data/asclepios/user/bkhanal/softwares/petsc/src/sys/objects/pname.c:128:12: > warning: ?int MPI_Attr_get(MPI_Comm, int, void*, int*)? is deprecated > (declared at /opt/openmpi/gcc/current/include/mpi.h:1227): MPI_Attr_get is > superseded by MPI_Comm_get_attr in MPI-2.0 [-Wdeprecated-declarations] > ierr = > MPI_Attr_get(obj->comm,Petsc_Counter_keyval,(void*)&counter,&flg);CHKERRQ(ierr); > > > When I ran make test after installation, I got the following results: > > Running test examples to verify correct installation > Using PETSC_DIR=/data/asclepios/user/bkhanal/softwares/petscInstalledDebug > and PETSC_ARCH= > Possible error running C/C++ src/snes/examples/tutorials/ex19 with 1 MPI > process > See http://www.mcs.anl.gov/petsc/documentation/faq.html > No protocol specified > No protocol specified > lid velocity = 0.0016, prandtl # = 1, grashof # = 1 > Number of SNES iterations = 2 > Possible error running C/C++ src/snes/examples/tutorials/ex19 with 2 MPI > processes > See http://www.mcs.anl.gov/petsc/documentation/faq.html > No protocol specified > No protocol specified > lid velocity = 0.0016, prandtl # = 1, grashof # = 1 > Number of SNES iterations = 2 > Possible error running Fortran example src/snes/examples/tutorials/ex5f > with 1 MPI process > See http://www.mcs.anl.gov/petsc/documentation/faq.html > No protocol specified > No protocol specified > Number of SNES iterations = 4 > Completed test examples > ========================================= > > I also tested one of my codes with this new setup. It seems to give me > correct results but the output also displays No protocol specified (twice). > > Is this a mere warning or should I worry about it ? > It looks like it is connected to your MPI configuration on this machine: https://www-auth.cs.wisc.edu/lists/htcondor-users/2013-March/msg00022.shtml Thanks, Matt > Thanks, > Bishesh > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Fri Jan 29 15:44:34 2016 From: jed at jedbrown.org (Jed Brown) Date: Fri, 29 Jan 2016 14:44:34 -0700 Subject: [petsc-users] PETSc interface to MueLu In-Reply-To: <47f4ad28cd4241b61690f8bf2dc8e9e7@dicea.unipd.it> References: <12e160516897f8a313d892a35bd8a438@dicea.unipd.it> <47f4ad28cd4241b61690f8bf2dc8e9e7@dicea.unipd.it> Message-ID: <87oac4cb7h.fsf@jedbrown.org> victor.magri at dicea.unipd.it writes: > I suppose that MueLu would give a cleaner implementation of ML's > features and possibly give a faster code. Last I heard, the MueLu implementation was slower. I don't know if that has been fixed more recently, but we would be more motivated to write the interface _after_ they demonstrate some clear benefit in a direct comparison. > Also, as it supports the Kokkos library, we would have the possibility > to run on MPI+threads or MPI+GPU. However, I think that implementing > an interface for this could be a problem since PETSc works better with > pure MPI, please correct me if I am wrong about this. There's a fair chance their code also works better with pure MPI. (There's no fundamental reason why MPI+threads should be faster than MPI-only, and some reasons why it could be slower.) That said, you can see lots of implementation artifacts on particular machines or for particular implementations. Anyway, there is nothing preventing an interface except the opportunity cost of working on something with such dubious expected value and zero research value, versus things with immediate, direct impact. > Anyway, if it were possible, I just would like to try both multigrid > implementations through PETSc and see how they behave. Patches welcome. (In the short term, and in lieu of some convincing demonstration.) -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: From bikash at umich.edu Sat Jan 30 18:56:26 2016 From: bikash at umich.edu (Bikash Kanungo) Date: Sat, 30 Jan 2016 19:56:26 -0500 Subject: [petsc-users] Segmentation fault in MatAssembly Message-ID: Hi, I'm getting segmentation fault while assembling large matrices (~4000000 X 4000000) across 480 processors. The error usually shows up randomly only in large problems (i.e, when I exceed matrix size of ~2000000x200000). There are few rows in my matrix for which non-local contributions are added from all other processors. So I believe the buffer size during MatSetValues gets larger with matrix size which in some way signals a segmentation fault. So is there a smart way of avoiding such error? Regards, Bikash -- Bikash S. Kanungo PhD Student Computational Materials Physics Group Mechanical Engineering University of Michigan -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sat Jan 30 19:15:06 2016 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sat, 30 Jan 2016 19:15:06 -0600 Subject: [petsc-users] Segmentation fault in MatAssembly In-Reply-To: References: Message-ID: <11F56552-499F-465B-91D0-787BC6857B86@mcs.anl.gov> Bikash The most likely cause is due to integer overflow. For such large problems you should make a new PETSC_ARCH say arch-large and configure with the additional option --with-64-bit-indices then PetscInt will become a 64 bit integer which will never overflow. Make sure that your code always uses PetscInt for integers passed to PETSc and not int or Fortran integer. If you do get crashes you can run in the debugger and likely you will find that somewhere you still have an int around instead of a PetscInt Barry > On Jan 30, 2016, at 6:56 PM, Bikash Kanungo wrote: > > Hi, > > I'm getting segmentation fault while assembling large matrices (~4000000 X 4000000) across 480 processors. The error usually shows up randomly only in large problems (i.e, when I exceed matrix size of ~2000000x200000). There are few rows in my matrix for which non-local contributions are added from all other processors. So I believe the buffer size during MatSetValues gets larger with matrix size which in some way signals a segmentation fault. So is there a smart way of avoiding such error? > > Regards, > Bikash > > -- > Bikash S. Kanungo > PhD Student > Computational Materials Physics Group > Mechanical Engineering > University of Michigan >