From zocca.marco at gmail.com  Sun Jan  3 05:59:41 2016
From: zocca.marco at gmail.com (Marco Zocca)
Date: Sun, 3 Jan 2016 12:59:41 +0100
Subject: [petsc-users] installation on cloud platform
Message-ID: <CAKE6T0T2wHoed3ga_RSEupBwc+KSdGLh0jkrskqkVzzk3qvyBA@mail.gmail.com>

Dear all,

  has anyone here tried/managed to install PETSc on e.g. Amazon AWS or
the Google Compute Engine?

I believe some extra components are needed for coordination, e.g.
Kubernetes or Mesos (in turn requiring that the library be compiled
within some sort of container, e.g. Docker), but I'm a bit lost amid
all the options.

Are the MPI functions (e.g. broadcast, scatter, gather ..?) used by
PETSc compatible with those platforms?

Thank you in advance,

Marco

From knepley at gmail.com  Sun Jan  3 07:07:46 2016
From: knepley at gmail.com (Matthew Knepley)
Date: Sun, 3 Jan 2016 07:07:46 -0600
Subject: [petsc-users] installation on cloud platform
In-Reply-To: <CAKE6T0T2wHoed3ga_RSEupBwc+KSdGLh0jkrskqkVzzk3qvyBA@mail.gmail.com>
References: <CAKE6T0T2wHoed3ga_RSEupBwc+KSdGLh0jkrskqkVzzk3qvyBA@mail.gmail.com>
Message-ID: <CAMYG4GmC-Q40SUfOnYU125qO7r1wL4Z-6yUAy00vx0cC90dvSQ@mail.gmail.com>

On Sun, Jan 3, 2016 at 5:59 AM, Marco Zocca <zocca.marco at gmail.com> wrote:

> Dear all,
>
>   has anyone here tried/managed to install PETSc on e.g. Amazon AWS or
> the Google Compute Engine?
>
> I believe some extra components are needed for coordination, e.g.
> Kubernetes or Mesos (in turn requiring that the library be compiled
> within some sort of container, e.g. Docker), but I'm a bit lost amid
> all the options.
>

I have no idea what those even do.


> Are the MPI functions (e.g. broadcast, scatter, gather ..?) used by
> PETSc compatible with those platforms?
>

There are a bunch of papers documenting MPI performance on AWS. We just use
vanilla MPI,
so you request a configuration that has it installed.

   Matt


> Thank you in advance,
>
> Marco
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160103/7000af3d/attachment.html>

From stali at geology.wisc.edu  Mon Jan  4 09:48:05 2016
From: stali at geology.wisc.edu (Tabrez Ali)
Date: Mon, 04 Jan 2016 09:48:05 -0600
Subject: [petsc-users] installation on cloud platform
In-Reply-To: <CAMYG4GmC-Q40SUfOnYU125qO7r1wL4Z-6yUAy00vx0cC90dvSQ@mail.gmail.com>
References: <CAKE6T0T2wHoed3ga_RSEupBwc+KSdGLh0jkrskqkVzzk3qvyBA@mail.gmail.com>
	<CAMYG4GmC-Q40SUfOnYU125qO7r1wL4Z-6yUAy00vx0cC90dvSQ@mail.gmail.com>
Message-ID: <568A9435.40103@geology.wisc.edu>

Or you can install everything yourself.

On vanilla Debian based AMIs (e.g., Ubuntu 14.04 LTS) just make sure to 
add "127.0.1.1 ip-x-x-x-x" to your /etc/hosts followed by

$ cd
$ ssh-keygen -t rsa
$ cat .ssh/id_rsa.pub  >> .ssh/authorized_keys

After that the usual stuff, e.g.,

$ sudo apt-get update
$ sudo apt-get upgrade
$ sudo apt-get install gcc gfortran g++ cmake wget
$ wget http://ftp.mcs.anl.gov/pub/petsc/release-snapshots/petsc-3.6.3.tar.gz
$ ./configure --with-cc=gcc --with-fc=gfortran --download-mpich 
--download-fblaslapack --with-metis=1 --download-metis=1 --with-debugging=0
$ export PETSC_DIR=/home/ubuntu/petsc-3.6.3
$ export PETSC_ARCH=arch-linux2-c-opt
$ make all
$ export PATH=$PATH:$PETSC_DIR/$PETSC_ARCH/bin

Tabrez


On 01/03/2016 07:07 AM, Matthew Knepley wrote:
> On Sun, Jan 3, 2016 at 5:59 AM, Marco Zocca <zocca.marco at gmail.com 
> <mailto:zocca.marco at gmail.com>> wrote:
>
>     Dear all,
>
>       has anyone here tried/managed to install PETSc on e.g. Amazon AWS or
>     the Google Compute Engine?
>
>     I believe some extra components are needed for coordination, e.g.
>     Kubernetes or Mesos (in turn requiring that the library be compiled
>     within some sort of container, e.g. Docker), but I'm a bit lost amid
>     all the options.
>
>
> I have no idea what those even do.
>
>     Are the MPI functions (e.g. broadcast, scatter, gather ..?) used by
>     PETSc compatible with those platforms?
>
>
> There are a bunch of papers documenting MPI performance on AWS. We 
> just use vanilla MPI,
> so you request a configuration that has it installed.
>
>    Matt
>
>     Thank you in advance,
>
>     Marco
>
>
>
>
> -- 
> What most experimenters take for granted before they begin their 
> experiments is infinitely more interesting than any results to which 
> their experiments lead.
> -- Norbert Wiener

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160104/1c4cc81f/attachment.html>

From zocca.marco at gmail.com  Mon Jan  4 14:07:05 2016
From: zocca.marco at gmail.com (Marco Zocca)
Date: Mon, 4 Jan 2016 21:07:05 +0100
Subject: [petsc-users] installation on cloud platform
Message-ID: <CAKE6T0TRviM0qQZ4pGU6XJ6HWXeL4CP_S-K4kJxkKs2AbQJFAg@mail.gmail.com>

Hello Tabrez,

  thank you for the walkthrough; I'll give it a try as soon as possible.

My main doubt was indeed regarding the host resolution and security;

what does the special hostfile line do? What about that "ip-x-x-x-x"
construct?

Thank you and kindest regards,

Marco


>
> Or you can install everything yourself.
>
> On vanilla Debian based AMIs (e.g., Ubuntu 14.04 LTS) just make sure to
> add "127.0.1.1 ip-x-x-x-x" to your /etc/hosts followed by
>
> $ cd
> $ ssh-keygen -t rsa
> $ cat .ssh/id_rsa.pub  >> .ssh/authorized_keys
>
> After that the usual stuff, e.g.,
>
> $ sudo apt-get update
> $ sudo apt-get upgrade
> $ sudo apt-get install gcc gfortran g++ cmake wget
> $ wget http://ftp.mcs.anl.gov/pub/petsc/release-snapshots/petsc-3.6.3.tar.gz
> $ ./configure --with-cc=gcc --with-fc=gfortran --download-mpich
> --download-fblaslapack --with-metis=1 --download-metis=1 --with-debugging=0
> $ export PETSC_DIR=/home/ubuntu/petsc-3.6.3
> $ export PETSC_ARCH=arch-linux2-c-opt
> $ make all
> $ export PATH=$PATH:$PETSC_DIR/$PETSC_ARCH/bin
>
> Tabrez
>
>
> On 01/03/2016 07:07 AM, Matthew Knepley wrote:
>> On Sun, Jan 3, 2016 at 5:59 AM, Marco Zocca <zocca.marco at gmail.com
>> <mailto:zocca.marco at gmail.com>> wrote:
>>
>>     Dear all,
>>
>>       has anyone here tried/managed to install PETSc on e.g. Amazon AWS or
>>     the Google Compute Engine?
>>
>>     I believe some extra components are needed for coordination, e.g.
>>     Kubernetes or Mesos (in turn requiring that the library be compiled
>>     within some sort of container, e.g. Docker), but I'm a bit lost amid
>>     all the options.
>>

>>

From balay at mcs.anl.gov  Mon Jan  4 14:29:11 2016
From: balay at mcs.anl.gov (Satish Balay)
Date: Mon, 4 Jan 2016 14:29:11 -0600
Subject: [petsc-users] installation on cloud platform
In-Reply-To: <CAKE6T0TRviM0qQZ4pGU6XJ6HWXeL4CP_S-K4kJxkKs2AbQJFAg@mail.gmail.com>
References: <CAKE6T0TRviM0qQZ4pGU6XJ6HWXeL4CP_S-K4kJxkKs2AbQJFAg@mail.gmail.com>
Message-ID: <alpine.LFD.2.20.1601041421290.10506@asterix>

Are you interested in using 1 node [with multiple cores] - or multiple
nodes - aka cluster on amazon?

For a single node - I don't think any additional config should be
necesary [/etc/hosts - or ssh keys]. It should be same as any laptop
config.

Its possible that 'hostname' is not setup properly on amazon nodes - and
MPICH misbehaves.  In this case - you might need any entry to
/etc/hosts. Perhaps something like:

echo 127.0.0.1 `hostname` >> /etc/hosts

If cluster - then there might be a tutorial to setup a proper cluster with AWS. Googles gives
http://cs.smith.edu/dftwiki/index.php/Tutorial:_Create_an_MPI_Cluster_on_the_Amazon_Elastic_Cloud_%28EC2%29

BTW: --download-mpich is a convient way to install MPI. [we default to
device=ch3:sock]. But you might want to figureout if there is a better
performing MPI for the amazon config.  [perhaps mpich with nemesis
works well. Or perhaps openmpi. Both are available prebuit on
ubunutu...]

Satish

On Mon, 4 Jan 2016, Marco Zocca wrote:

> Hello Tabrez,
> 
>   thank you for the walkthrough; I'll give it a try as soon as possible.
> 
> My main doubt was indeed regarding the host resolution and security;
> 
> what does the special hostfile line do? What about that "ip-x-x-x-x"
> construct?
> 
> Thank you and kindest regards,
> 
> Marco
> 
> 
> 
> 
> >
> > Or you can install everything yourself.
> >
> > On vanilla Debian based AMIs (e.g., Ubuntu 14.04 LTS) just make sure to
> > add "127.0.1.1 ip-x-x-x-x" to your /etc/hosts followed by
> >
> > $ cd
> > $ ssh-keygen -t rsa
> > $ cat .ssh/id_rsa.pub  >> .ssh/authorized_keys
> >
> > After that the usual stuff, e.g.,
> >
> > $ sudo apt-get update
> > $ sudo apt-get upgrade
> > $ sudo apt-get install gcc gfortran g++ cmake wget
> > $ wget http://ftp.mcs.anl.gov/pub/petsc/release-snapshots/petsc-3.6.3.tar.gz
> > $ ./configure --with-cc=gcc --with-fc=gfortran --download-mpich
> > --download-fblaslapack --with-metis=1 --download-metis=1 --with-debugging=0
> > $ export PETSC_DIR=/home/ubuntu/petsc-3.6.3
> > $ export PETSC_ARCH=arch-linux2-c-opt
> > $ make all
> > $ export PATH=$PATH:$PETSC_DIR/$PETSC_ARCH/bin
> >
> > Tabrez
> >
> >
> > On 01/03/2016 07:07 AM, Matthew Knepley wrote:
> >> On Sun, Jan 3, 2016 at 5:59 AM, Marco Zocca <zocca.marco at gmail.com
> >> <mailto:zocca.marco at gmail.com>> wrote:
> >>
> >>     Dear all,
> >>
> >>       has anyone here tried/managed to install PETSc on e.g. Amazon AWS or
> >>     the Google Compute Engine?
> >>
> >>     I believe some extra components are needed for coordination, e.g.
> >>     Kubernetes or Mesos (in turn requiring that the library be compiled
> >>     within some sort of container, e.g. Docker), but I'm a bit lost amid
> >>     all the options.
> >>
> 
> >>
> 


From stali at geology.wisc.edu  Mon Jan  4 14:33:15 2016
From: stali at geology.wisc.edu (Tabrez Ali)
Date: Mon, 04 Jan 2016 14:33:15 -0600
Subject: [petsc-users] installation on cloud platform
In-Reply-To: <CAKE6T0TRviM0qQZ4pGU6XJ6HWXeL4CP_S-K4kJxkKs2AbQJFAg@mail.gmail.com>
References: <CAKE6T0TRviM0qQZ4pGU6XJ6HWXeL4CP_S-K4kJxkKs2AbQJFAg@mail.gmail.com>
Message-ID: <568AD70B.9010707@geology.wisc.edu>

Its just the hostname. Without it mpiexec was hanging for n>1. Might not 
be an issue with non Debian AMIs (I didn't try).

Tabrez

On 01/04/2016 02:07 PM, Marco Zocca wrote:
> Hello Tabrez,
>
>    thank you for the walkthrough; I'll give it a try as soon as possible.
>
> My main doubt was indeed regarding the host resolution and security;
>
> what does the special hostfile line do? What about that "ip-x-x-x-x"
> construct?
>
> Thank you and kindest regards,
>
> Marco
>
>
>
>
>> Or you can install everything yourself.
>>
>> On vanilla Debian based AMIs (e.g., Ubuntu 14.04 LTS) just make sure to
>> add "127.0.1.1 ip-x-x-x-x" to your /etc/hosts followed by
>>
>> $ cd
>> $ ssh-keygen -t rsa
>> $ cat .ssh/id_rsa.pub>>  .ssh/authorized_keys
>>
>> After that the usual stuff, e.g.,
>>
>> $ sudo apt-get update
>> $ sudo apt-get upgrade
>> $ sudo apt-get install gcc gfortran g++ cmake wget
>> $ wget http://ftp.mcs.anl.gov/pub/petsc/release-snapshots/petsc-3.6.3.tar.gz
>> $ ./configure --with-cc=gcc --with-fc=gfortran --download-mpich
>> --download-fblaslapack --with-metis=1 --download-metis=1 --with-debugging=0
>> $ export PETSC_DIR=/home/ubuntu/petsc-3.6.3
>> $ export PETSC_ARCH=arch-linux2-c-opt
>> $ make all
>> $ export PATH=$PATH:$PETSC_DIR/$PETSC_ARCH/bin
>>
>> Tabrez
>>
>>
>> On 01/03/2016 07:07 AM, Matthew Knepley wrote:
>>> On Sun, Jan 3, 2016 at 5:59 AM, Marco Zocca<zocca.marco at gmail.com
>>> <mailto:zocca.marco at gmail.com>>  wrote:
>>>
>>>      Dear all,
>>>
>>>        has anyone here tried/managed to install PETSc on e.g. Amazon AWS or
>>>      the Google Compute Engine?
>>>
>>>      I believe some extra components are needed for coordination, e.g.
>>>      Kubernetes or Mesos (in turn requiring that the library be compiled
>>>      within some sort of container, e.g. Docker), but I'm a bit lost amid
>>>      all the options.
>>>


From bsmith at mcs.anl.gov  Mon Jan  4 14:49:51 2016
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Mon, 4 Jan 2016 14:49:51 -0600
Subject: [petsc-users] installation on cloud platform
In-Reply-To: <568A9435.40103@geology.wisc.edu>
References: <CAKE6T0T2wHoed3ga_RSEupBwc+KSdGLh0jkrskqkVzzk3qvyBA@mail.gmail.com>
	<CAMYG4GmC-Q40SUfOnYU125qO7r1wL4Z-6yUAy00vx0cC90dvSQ@mail.gmail.com>
	<568A9435.40103@geology.wisc.edu>
Message-ID: <4E2A2D98-E904-47F9-A2A8-CF3611C54EC2@mcs.anl.gov>


   Tabrez,

     This is great, thanks for sending it. Do you mind if Satish adds it to the http://www.mcs.anl.gov/petsc/documentation/installation.html  file as an example?

  Barry

> On Jan 4, 2016, at 9:48 AM, Tabrez Ali <stali at geology.wisc.edu> wrote:
> 
> Or you can install everything yourself. 
> 
> On vanilla Debian based AMIs (e.g., Ubuntu 14.04 LTS) just make sure to add "127.0.1.1 ip-x-x-x-x" to your /etc/hosts followed by
> 
> $ cd
> $ ssh-keygen -t rsa
> $ cat .ssh/id_rsa.pub  >> .ssh/authorized_keys
> 
> After that the usual stuff, e.g.,
> 
> $ sudo apt-get update
> $ sudo apt-get upgrade
> $ sudo apt-get install gcc gfortran g++ cmake wget
> $ wget http://ftp.mcs.anl.gov/pub/petsc/release-snapshots/petsc-3.6.3.tar.gz
> $ ./configure --with-cc=gcc --with-fc=gfortran --download-mpich --download-fblaslapack --with-metis=1 --download-metis=1 --with-debugging=0
> $ export PETSC_DIR=/home/ubuntu/petsc-3.6.3
> $ export PETSC_ARCH=arch-linux2-c-opt
> $ make all
> $ export PATH=$PATH:$PETSC_DIR/$PETSC_ARCH/bin
> 
> Tabrez
> 
> 
> On 01/03/2016 07:07 AM, Matthew Knepley wrote:
>> On Sun, Jan 3, 2016 at 5:59 AM, Marco Zocca <zocca.marco at gmail.com> wrote:
>> Dear all,
>> 
>>   has anyone here tried/managed to install PETSc on e.g. Amazon AWS or
>> the Google Compute Engine?
>> 
>> I believe some extra components are needed for coordination, e.g.
>> Kubernetes or Mesos (in turn requiring that the library be compiled
>> within some sort of container, e.g. Docker), but I'm a bit lost amid
>> all the options.
>> 
>> I have no idea what those even do.
>>  
>> Are the MPI functions (e.g. broadcast, scatter, gather ..?) used by
>> PETSc compatible with those platforms?
>> 
>> There are a bunch of papers documenting MPI performance on AWS. We just use vanilla MPI,
>> so you request a configuration that has it installed.
>> 
>>    Matt
>>  
>> Thank you in advance,
>> 
>> Marco
>> 
>> 
>> 
>> -- 
>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>> -- Norbert Wiener
> 


From stali at geology.wisc.edu  Mon Jan  4 15:08:59 2016
From: stali at geology.wisc.edu (Tabrez Ali)
Date: Mon, 04 Jan 2016 15:08:59 -0600
Subject: [petsc-users] installation on cloud platform
In-Reply-To: <4E2A2D98-E904-47F9-A2A8-CF3611C54EC2@mcs.anl.gov>
References: <CAKE6T0T2wHoed3ga_RSEupBwc+KSdGLh0jkrskqkVzzk3qvyBA@mail.gmail.com>
	<CAMYG4GmC-Q40SUfOnYU125qO7r1wL4Z-6yUAy00vx0cC90dvSQ@mail.gmail.com>
	<568A9435.40103@geology.wisc.edu>
	<4E2A2D98-E904-47F9-A2A8-CF3611C54EC2@mcs.anl.gov>
Message-ID: <568ADF6B.4050804@geology.wisc.edu>

Yes, of course. Although additional steps might be needed for enabling 
GPU support on GPU enabled instances (hard to find otherwise).

Regards,

Tabrez

On 01/04/2016 02:49 PM, Barry Smith wrote:
>     Tabrez,
>
>       This is great, thanks for sending it. Do you mind if Satish adds it to the http://www.mcs.anl.gov/petsc/documentation/installation.html  file as an example?
>
>    Barry
>
>> On Jan 4, 2016, at 9:48 AM, Tabrez Ali<stali at geology.wisc.edu>  wrote:
>>
>> Or you can install everything yourself.
>>
>> On vanilla Debian based AMIs (e.g., Ubuntu 14.04 LTS) just make sure to add "127.0.1.1 ip-x-x-x-x" to your /etc/hosts followed by
>>
>> $ cd
>> $ ssh-keygen -t rsa
>> $ cat .ssh/id_rsa.pub>>  .ssh/authorized_keys
>>
>> After that the usual stuff, e.g.,
>>
>> $ sudo apt-get update
>> $ sudo apt-get upgrade
>> $ sudo apt-get install gcc gfortran g++ cmake wget
>> $ wget http://ftp.mcs.anl.gov/pub/petsc/release-snapshots/petsc-3.6.3.tar.gz
>> $ ./configure --with-cc=gcc --with-fc=gfortran --download-mpich --download-fblaslapack --with-metis=1 --download-metis=1 --with-debugging=0
>> $ export PETSC_DIR=/home/ubuntu/petsc-3.6.3
>> $ export PETSC_ARCH=arch-linux2-c-opt
>> $ make all
>> $ export PATH=$PATH:$PETSC_DIR/$PETSC_ARCH/bin
>>
>> Tabrez
>>
>>
>> On 01/03/2016 07:07 AM, Matthew Knepley wrote:
>>> On Sun, Jan 3, 2016 at 5:59 AM, Marco Zocca<zocca.marco at gmail.com>  wrote:
>>> Dear all,
>>>
>>>    has anyone here tried/managed to install PETSc on e.g. Amazon AWS or
>>> the Google Compute Engine?
>>>
>>> I believe some extra components are needed for coordination, e.g.
>>> Kubernetes or Mesos (in turn requiring that the library be compiled
>>> within some sort of container, e.g. Docker), but I'm a bit lost amid
>>> all the options.
>>>
>>> I have no idea what those even do.
>>>
>>> Are the MPI functions (e.g. broadcast, scatter, gather ..?) used by
>>> PETSc compatible with those platforms?
>>>
>>> There are a bunch of papers documenting MPI performance on AWS. We just use vanilla MPI,
>>> so you request a configuration that has it installed.
>>>
>>>     Matt
>>>
>>> Thank you in advance,
>>>
>>> Marco
>>>
>>>
>>>
>>> -- 
>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>> -- Norbert Wiener


From zocca.marco at gmail.com  Mon Jan  4 16:48:57 2016
From: zocca.marco at gmail.com (Marco Zocca)
Date: Mon, 4 Jan 2016 23:48:57 +0100
Subject: [petsc-users] installation on cloud platform
In-Reply-To: <alpine.LFD.2.20.1601041421290.10506@asterix>
References: <CAKE6T0TRviM0qQZ4pGU6XJ6HWXeL4CP_S-K4kJxkKs2AbQJFAg@mail.gmail.com>
	<alpine.LFD.2.20.1601041421290.10506@asterix>
Message-ID: <CAKE6T0QamsxgWQubZ9DYSawp2KQKNPZWnQahqmAOxtjitgt0qg@mail.gmail.com>

Hi Satish,

 thank you for the input;

I was really looking for something that lets one abstract out "where"
the code lives, so as to possibly work both in a single-node and
cluster setting.

This is why a "container" approach sounds meaningful. Configure once, run many.
For message-passing codes such as our case, there's this
`docker-compose` [ https://docs.docker.com/compose ] which aggregates
the compilation and network setup steps.
I have found this approach [
http://qnib.org/2015/04/14/qnibterminal-mpi-hello-world/ ] that runs
an MPI benchmark using `docker-compose`. The "Consul" library takes
care of the DNS resolution as far as I can tell, and SLURM is the
queue manager.

The downsides: it's yet another third party tool (albeit a widespread
one), with yet another scripting syntax (very much similar but
incompatible with shell script).
Latencies will be much larger, I expect, and also one should pay a
much higher attention to security (building on top of someone else's
images, freely available from the Docker Hub, is tantamount to running
arbitrary code at compile time).

There are however trusted Docker images, containing various
combinations of Linux distributions and software.

I'm very much looking forward to continuing this discussion;

Kind regards,

Marco


> Are you interested in using 1 node [with multiple cores] - or multiple
> nodes - aka cluster on amazon?
>
> For a single node - I don't think any additional config should be
> necesary [/etc/hosts - or ssh keys]. It should be same as any laptop
> config.
>
> Its possible that 'hostname' is not setup properly on amazon nodes - and
> MPICH misbehaves.  In this case - you might need any entry to
> /etc/hosts. Perhaps something like:
>
> echo 127.0.0.1 `hostname` >> /etc/hosts
>
> If cluster - then there might be a tutorial to setup a proper cluster with AWS. Googles gives
> http://cs.smith.edu/dftwiki/index.php/Tutorial:_Create_an_MPI_Cluster_on_the_Amazon_Elastic_Cloud_%28EC2%29
>
> BTW: --download-mpich is a convient way to install MPI. [we default to
> device=ch3:sock]. But you might want to figureout if there is a better
> performing MPI for the amazon config.  [perhaps mpich with nemesis
> works well. Or perhaps openmpi. Both are available prebuit on
> ubunutu...]
>
> Satish
>
>> >>
>> >>       has anyone here tried/managed to install PETSc on e.g. Amazon AWS or
>> >>     the Google Compute Engine?
>> >>
>> >>     I believe some extra components are needed for coordination, e.g.
>> >>     Kubernetes or Mesos (in turn requiring that the library be compiled
>> >>     within some sort of container, e.g. Docker), but I'm a bit lost amid
>> >>     all the options.

From bsmith at mcs.anl.gov  Mon Jan  4 17:10:42 2016
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Mon, 4 Jan 2016 17:10:42 -0600
Subject: [petsc-users] installation on cloud platform
In-Reply-To: <CAKE6T0QamsxgWQubZ9DYSawp2KQKNPZWnQahqmAOxtjitgt0qg@mail.gmail.com>
References: <CAKE6T0TRviM0qQZ4pGU6XJ6HWXeL4CP_S-K4kJxkKs2AbQJFAg@mail.gmail.com>
	<alpine.LFD.2.20.1601041421290.10506@asterix>
	<CAKE6T0QamsxgWQubZ9DYSawp2KQKNPZWnQahqmAOxtjitgt0qg@mail.gmail.com>
Message-ID: <49D88394-EF0B-405E-B1F8-342FA35948B8@mcs.anl.gov>


  Marco,

  There are competitors to the "regular" cloud machines like Amazon focused specifically  on HPC that "come with" MPI and use high speed networks so are much like if you built a custom HPC machine. For example http://www.rescale.com/software/  

   I don't have direct experiences with any of these systems but suspect that if you really want to scale to a bunch of nodes you are likely far better off with the HPC cloud servers than with general purpose systems even if they have a higher cost (you get what you pay for).

  Barry


> On Jan 4, 2016, at 4:48 PM, Marco Zocca <zocca.marco at gmail.com> wrote:
> 
> Hi Satish,
> 
> thank you for the input;
> 
> I was really looking for something that lets one abstract out "where"
> the code lives, so as to possibly work both in a single-node and
> cluster setting.
> 
> This is why a "container" approach sounds meaningful. Configure once, run many.
> For message-passing codes such as our case, there's this
> `docker-compose` [ https://docs.docker.com/compose ] which aggregates
> the compilation and network setup steps.
> I have found this approach [
> http://qnib.org/2015/04/14/qnibterminal-mpi-hello-world/ ] that runs
> an MPI benchmark using `docker-compose`. The "Consul" library takes
> care of the DNS resolution as far as I can tell, and SLURM is the
> queue manager.
> 
> The downsides: it's yet another third party tool (albeit a widespread
> one), with yet another scripting syntax (very much similar but
> incompatible with shell script).
> Latencies will be much larger, I expect, and also one should pay a
> much higher attention to security (building on top of someone else's
> images, freely available from the Docker Hub, is tantamount to running
> arbitrary code at compile time).
> 
> There are however trusted Docker images, containing various
> combinations of Linux distributions and software.
> 
> I'm very much looking forward to continuing this discussion;
> 
> Kind regards,
> 
> Marco
> 
> 
> 
>> Are you interested in using 1 node [with multiple cores] - or multiple
>> nodes - aka cluster on amazon?
>> 
>> For a single node - I don't think any additional config should be
>> necesary [/etc/hosts - or ssh keys]. It should be same as any laptop
>> config.
>> 
>> Its possible that 'hostname' is not setup properly on amazon nodes - and
>> MPICH misbehaves.  In this case - you might need any entry to
>> /etc/hosts. Perhaps something like:
>> 
>> echo 127.0.0.1 `hostname` >> /etc/hosts
>> 
>> If cluster - then there might be a tutorial to setup a proper cluster with AWS. Googles gives
>> http://cs.smith.edu/dftwiki/index.php/Tutorial:_Create_an_MPI_Cluster_on_the_Amazon_Elastic_Cloud_%28EC2%29
>> 
>> BTW: --download-mpich is a convient way to install MPI. [we default to
>> device=ch3:sock]. But you might want to figureout if there is a better
>> performing MPI for the amazon config.  [perhaps mpich with nemesis
>> works well. Or perhaps openmpi. Both are available prebuit on
>> ubunutu...]
>> 
>> Satish
>> 
>>>>> 
>>>>>      has anyone here tried/managed to install PETSc on e.g. Amazon AWS or
>>>>>    the Google Compute Engine?
>>>>> 
>>>>>    I believe some extra components are needed for coordination, e.g.
>>>>>    Kubernetes or Mesos (in turn requiring that the library be compiled
>>>>>    within some sort of container, e.g. Docker), but I'm a bit lost amid
>>>>>    all the options.


From zonexo at gmail.com  Mon Jan  4 19:28:32 2016
From: zonexo at gmail.com (TAY wee-beng)
Date: Tue, 5 Jan 2016 09:28:32 +0800
Subject: [petsc-users] Segmentation error when calling PetscBarrier
Message-ID: <568B1C40.30600@gmail.com>

Hi,

I am trying to debug my CFD Fortran MPI code. I tried to add:

call PetscBarrier(PETSC_NULL_OBJECT); if (myid==0) print *, "xx"

to do a rough check on where the error is. xx is a different number for 
each line.

I found that whenever I add this line, the code aborts with segmentation 
error.

I am using the Intel compiler. Is there any error with my usage?

-- 
Thank you

Yours sincerely,

TAY wee-beng


From knepley at gmail.com  Mon Jan  4 19:31:30 2016
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 4 Jan 2016 19:31:30 -0600
Subject: [petsc-users] Segmentation error when calling PetscBarrier
In-Reply-To: <568B1C40.30600@gmail.com>
References: <568B1C40.30600@gmail.com>
Message-ID: <CAMYG4Gk5fNSkW_ALtZcf2WKcVHXTqjhpZF8qk3g8q7Nv72dq0g@mail.gmail.com>

On Mon, Jan 4, 2016 at 7:28 PM, TAY wee-beng <zonexo at gmail.com> wrote:

> Hi,
>
> I am trying to debug my CFD Fortran MPI code. I tried to add:
>
> call PetscBarrier(PETSC_NULL_OBJECT); if (myid==0) print *, "xx"
>
> to do a rough check on where the error is. xx is a different number for
> each line.
>
> I found that whenever I add this line, the code aborts with segmentation
> error.
>
> I am using the Intel compiler. Is there any error with my usage?


I don't think this makes sense since it will try to pull a communicator out
of the NULL_OBJECT.

   Matt


>
> --
> Thank you
>
> Yours sincerely,
>
> TAY wee-beng
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160104/ea14c4c9/attachment.html>

From zonexo at gmail.com  Mon Jan  4 19:41:28 2016
From: zonexo at gmail.com (TAY wee-beng)
Date: Tue, 5 Jan 2016 09:41:28 +0800
Subject: [petsc-users] Segmentation error when calling PetscBarrier
In-Reply-To: <CAMYG4Gk5fNSkW_ALtZcf2WKcVHXTqjhpZF8qk3g8q7Nv72dq0g@mail.gmail.com>
References: <568B1C40.30600@gmail.com>
	<CAMYG4Gk5fNSkW_ALtZcf2WKcVHXTqjhpZF8qk3g8q7Nv72dq0g@mail.gmail.com>
Message-ID: <568B1F48.1050505@gmail.com>

Hi Matt,

In that case, what would be a good or accurate way to debug the MPI 
code? I'm trying to determine where the fault lies.

Thank you

Yours sincerely,

TAY wee-beng

On 5/1/2016 9:31 AM, Matthew Knepley wrote:
> On Mon, Jan 4, 2016 at 7:28 PM, TAY wee-beng <zonexo at gmail.com 
> <mailto:zonexo at gmail.com>> wrote:
>
>     Hi,
>
>     I am trying to debug my CFD Fortran MPI code. I tried to add:
>
>     call PetscBarrier(PETSC_NULL_OBJECT); if (myid==0) print *, "xx"
>
>     to do a rough check on where the error is. xx is a different
>     number for each line.
>
>     I found that whenever I add this line, the code aborts with
>     segmentation error.
>
>     I am using the Intel compiler. Is there any error with my usage?
>
>
> I don't think this makes sense since it will try to pull a 
> communicator out of the NULL_OBJECT.
>
>    Matt
>
>
>     -- 
>     Thank you
>
>     Yours sincerely,
>
>     TAY wee-beng
>
>
>
>
> -- 
> What most experimenters take for granted before they begin their 
> experiments is infinitely more interesting than any results to which 
> their experiments lead.
> -- Norbert Wiener

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160105/021a8eb2/attachment.html>

From knepley at gmail.com  Mon Jan  4 19:51:47 2016
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 4 Jan 2016 19:51:47 -0600
Subject: [petsc-users] Segmentation error when calling PetscBarrier
In-Reply-To: <568B1F48.1050505@gmail.com>
References: <568B1C40.30600@gmail.com>
	<CAMYG4Gk5fNSkW_ALtZcf2WKcVHXTqjhpZF8qk3g8q7Nv72dq0g@mail.gmail.com>
	<568B1F48.1050505@gmail.com>
Message-ID: <CAMYG4GkM1HMtsRoa87OFxOcVed_+OQf4GoPnvJBwaMT66HTjoA@mail.gmail.com>

On Mon, Jan 4, 2016 at 7:41 PM, TAY wee-beng <zonexo at gmail.com> wrote:

> Hi Matt,
>
> In that case, what would be a good or accurate way to debug the MPI code?
> I'm trying to determine where the fault lies.
>

Is there a problem with -start_in_debugger? Also valgrind
--trace-children=yes is great.

  Thanks,

    Matt


> Thank you
>
> Yours sincerely,
>
> TAY wee-beng
>
> On 5/1/2016 9:31 AM, Matthew Knepley wrote:
>
> On Mon, Jan 4, 2016 at 7:28 PM, TAY wee-beng <zonexo at gmail.com> wrote:
>
>> Hi,
>>
>> I am trying to debug my CFD Fortran MPI code. I tried to add:
>>
>> call PetscBarrier(PETSC_NULL_OBJECT); if (myid==0) print *, "xx"
>>
>> to do a rough check on where the error is. xx is a different number for
>> each line.
>>
>> I found that whenever I add this line, the code aborts with segmentation
>> error.
>>
>> I am using the Intel compiler. Is there any error with my usage?
>
>
> I don't think this makes sense since it will try to pull a communicator
> out of the NULL_OBJECT.
>
>    Matt
>
>
>>
>> --
>> Thank you
>>
>> Yours sincerely,
>>
>> TAY wee-beng
>>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160104/898ec9f4/attachment.html>

From bsmith at mcs.anl.gov  Mon Jan  4 20:32:05 2016
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Mon, 4 Jan 2016 20:32:05 -0600
Subject: [petsc-users] Segmentation error when calling PetscBarrier
In-Reply-To: <568B1C40.30600@gmail.com>
References: <568B1C40.30600@gmail.com>
Message-ID: <26BF756B-7B4E-4630-B2BC-B95A68F60EDC@mcs.anl.gov>


  You are missing the ,ierr which BTW you would have caught immediately if you used the debugger.

  Barry

  The debugger is not a scary monster, it is one of your best friends. 

> On Jan 4, 2016, at 7:28 PM, TAY wee-beng <zonexo at gmail.com> wrote:
> 
> Hi,
> 
> I am trying to debug my CFD Fortran MPI code. I tried to add:
> 
> call PetscBarrier(PETSC_NULL_OBJECT); if (myid==0) print *, "xx"
> 
> to do a rough check on where the error is. xx is a different number for each line.
> 
> I found that whenever I add this line, the code aborts with segmentation error.
> 
> I am using the Intel compiler. Is there any error with my usage?
> 
> -- 
> Thank you
> 
> Yours sincerely,
> 
> TAY wee-beng
> 


From zonexo at gmail.com  Mon Jan  4 21:34:23 2016
From: zonexo at gmail.com (TAY wee-beng)
Date: Tue, 5 Jan 2016 11:34:23 +0800
Subject: [petsc-users] Segmentation error when calling PetscBarrier
In-Reply-To: <26BF756B-7B4E-4630-B2BC-B95A68F60EDC@mcs.anl.gov>
References: <568B1C40.30600@gmail.com>
	<26BF756B-7B4E-4630-B2BC-B95A68F60EDC@mcs.anl.gov>
Message-ID: <568B39BF.7020108@gmail.com>

Hi,

Ya sorry, that should be the tool to use. Was having some problems using 
MPI with the debugger.

I managed to run it as a serial code now.

My problem is that on the cluster, it works with the gnu fortran. But 
using Intel compiler, I get segmentation error at some point when 
running the opt ver. The debug ver works fine.

I am trying to find if the error is due to a bug in Intel, or it's my 
own problem.

Another thing is that on another cluster, the Intel opt ver works, but 
that's using a newer ver of the compiler.

I hope to get the Intel one working if possible, because it's about 30% 
faster.

So now coming back to the gdb, it worked fine using the debug ver of the 
code. But when using the opt ver, it only shows segmentation fault. When 
the X-window appears, the code has already exited. I am already using -g 
during compile.

So how should I debug it? The error seems to be when I tried to call 
DMDAVecRestoreArrayF90, although I still need to be more certain.


Thank you

Yours sincerely,

TAY wee-beng

On 5/1/2016 10:32 AM, Barry Smith wrote:
>    You are missing the ,ierr which BTW you would have caught immediately if you used the debugger.
>
>    Barry
>
>    The debugger is not a scary monster, it is one of your best friends.
>
>> On Jan 4, 2016, at 7:28 PM, TAY wee-beng <zonexo at gmail.com> wrote:
>>
>> Hi,
>>
>> I am trying to debug my CFD Fortran MPI code. I tried to add:
>>
>> call PetscBarrier(PETSC_NULL_OBJECT); if (myid==0) print *, "xx"
>>
>> to do a rough check on where the error is. xx is a different number for each line.
>>
>> I found that whenever I add this line, the code aborts with segmentation error.
>>
>> I am using the Intel compiler. Is there any error with my usage?
>>
>> -- 
>> Thank you
>>
>> Yours sincerely,
>>
>> TAY wee-beng
>>


From bsmith at mcs.anl.gov  Mon Jan  4 22:16:19 2016
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Mon, 4 Jan 2016 22:16:19 -0600
Subject: [petsc-users] Segmentation error when calling PetscBarrier
In-Reply-To: <568B39BF.7020108@gmail.com>
References: <568B1C40.30600@gmail.com>
	<26BF756B-7B4E-4630-B2BC-B95A68F60EDC@mcs.anl.gov>
	<568B39BF.7020108@gmail.com>
Message-ID: <348325AD-BEDD-4F06-A040-0A17B8C461BA@mcs.anl.gov>


  You could try instead -on_error_attach_debugger and see if that is better at catching the code when the error occurs

   If the code runs valgrind clean (when it runs) then I would say it is reasonable for you to conclude that the current trouble is due to a Intel optimization error and not debug further,

  Barry

> On Jan 4, 2016, at 9:34 PM, TAY wee-beng <zonexo at gmail.com> wrote:
> 
> Hi,
> 
> Ya sorry, that should be the tool to use. Was having some problems using MPI with the debugger.
> 
> I managed to run it as a serial code now.
> 
> My problem is that on the cluster, it works with the gnu fortran. But using Intel compiler, I get segmentation error at some point when running the opt ver. The debug ver works fine.
> 
> I am trying to find if the error is due to a bug in Intel, or it's my own problem.
> 
> Another thing is that on another cluster, the Intel opt ver works, but that's using a newer ver of the compiler.
> 
> I hope to get the Intel one working if possible, because it's about 30% faster.
> 
> So now coming back to the gdb, it worked fine using the debug ver of the code. But when using the opt ver, it only shows segmentation fault. When the X-window appears, the code has already exited. I am already using -g during compile.
> 
> So how should I debug it? The error seems to be when I tried to call DMDAVecRestoreArrayF90, although I still need to be more certain.
> 
> 
> Thank you
> 
> Yours sincerely,
> 
> TAY wee-beng
> 
> On 5/1/2016 10:32 AM, Barry Smith wrote:
>>   You are missing the ,ierr which BTW you would have caught immediately if you used the debugger.
>> 
>>   Barry
>> 
>>   The debugger is not a scary monster, it is one of your best friends.
>> 
>>> On Jan 4, 2016, at 7:28 PM, TAY wee-beng <zonexo at gmail.com> wrote:
>>> 
>>> Hi,
>>> 
>>> I am trying to debug my CFD Fortran MPI code. I tried to add:
>>> 
>>> call PetscBarrier(PETSC_NULL_OBJECT); if (myid==0) print *, "xx"
>>> 
>>> to do a rough check on where the error is. xx is a different number for each line.
>>> 
>>> I found that whenever I add this line, the code aborts with segmentation error.
>>> 
>>> I am using the Intel compiler. Is there any error with my usage?
>>> 
>>> -- 
>>> Thank you
>>> 
>>> Yours sincerely,
>>> 
>>> TAY wee-beng
>>> 
> 


From balay at mcs.anl.gov  Mon Jan  4 22:23:45 2016
From: balay at mcs.anl.gov (Satish Balay)
Date: Mon, 4 Jan 2016 22:23:45 -0600
Subject: [petsc-users] Segmentation error when calling PetscBarrier
In-Reply-To: <348325AD-BEDD-4F06-A040-0A17B8C461BA@mcs.anl.gov>
References: <568B1C40.30600@gmail.com>
	<26BF756B-7B4E-4630-B2BC-B95A68F60EDC@mcs.anl.gov>
	<568B39BF.7020108@gmail.com>
	<348325AD-BEDD-4F06-A040-0A17B8C461BA@mcs.anl.gov>
Message-ID: <alpine.LFD.2.20.1601042221380.19494@asterix>

verylikely valgrind will find bugs in code..

http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind

> When the X-window appears, the code has already exited

try --debugger_pause 60 [or higher if it takes longer to swawn the xterms]

Satish


On Mon, 4 Jan 2016, Barry Smith wrote:

> 
>   You could try instead -on_error_attach_debugger and see if that is better at catching the code when the error occurs
> 
>    If the code runs valgrind clean (when it runs) then I would say it is reasonable for you to conclude that the current trouble is due to a Intel optimization error and not debug further,
> 
>   Barry
> 
> > On Jan 4, 2016, at 9:34 PM, TAY wee-beng <zonexo at gmail.com> wrote:
> > 
> > Hi,
> > 
> > Ya sorry, that should be the tool to use. Was having some problems using MPI with the debugger.
> > 
> > I managed to run it as a serial code now.
> > 
> > My problem is that on the cluster, it works with the gnu fortran. But using Intel compiler, I get segmentation error at some point when running the opt ver. The debug ver works fine.
> > 
> > I am trying to find if the error is due to a bug in Intel, or it's my own problem.
> > 
> > Another thing is that on another cluster, the Intel opt ver works, but that's using a newer ver of the compiler.
> > 
> > I hope to get the Intel one working if possible, because it's about 30% faster.
> > 
> > So now coming back to the gdb, it worked fine using the debug ver of the code. But when using the opt ver, it only shows segmentation fault. When the X-window appears, the code has already exited. I am already using -g during compile.
> > 
> > So how should I debug it? The error seems to be when I tried to call DMDAVecRestoreArrayF90, although I still need to be more certain.
> > 
> > 
> > Thank you
> > 
> > Yours sincerely,
> > 
> > TAY wee-beng
> > 
> > On 5/1/2016 10:32 AM, Barry Smith wrote:
> >>   You are missing the ,ierr which BTW you would have caught immediately if you used the debugger.
> >> 
> >>   Barry
> >> 
> >>   The debugger is not a scary monster, it is one of your best friends.
> >> 
> >>> On Jan 4, 2016, at 7:28 PM, TAY wee-beng <zonexo at gmail.com> wrote:
> >>> 
> >>> Hi,
> >>> 
> >>> I am trying to debug my CFD Fortran MPI code. I tried to add:
> >>> 
> >>> call PetscBarrier(PETSC_NULL_OBJECT); if (myid==0) print *, "xx"
> >>> 
> >>> to do a rough check on where the error is. xx is a different number for each line.
> >>> 
> >>> I found that whenever I add this line, the code aborts with segmentation error.
> >>> 
> >>> I am using the Intel compiler. Is there any error with my usage?
> >>> 
> >>> -- 
> >>> Thank you
> >>> 
> >>> Yours sincerely,
> >>> 
> >>> TAY wee-beng
> >>> 
> > 
> 
> 


From amneetb at live.unc.edu  Tue Jan  5 17:14:16 2016
From: amneetb at live.unc.edu (Bhalla, Amneet Pal S)
Date: Tue, 5 Jan 2016 23:14:16 +0000
Subject: [petsc-users] Calling MatDestroy on MatGetDiagonalBlock
Message-ID: <BDE80D7D-1164-4868-9066-9A0D75480FA2@ad.unc.edu>

Hi Folks,

Is it safe to call MatDestroy on the sequential matrix returned by MatGetDiagonalBlock() after it?s no longer used?


Thanks,

? Amneet
=====================================================
Amneet Bhalla
Postdoctoral Research Associate
Department of Mathematics and McAllister Heart Institute
University of North Carolina at Chapel Hill
Email: amneet at unc.edu<mailto:amneet at unc.edu>
Web:  https://abhalla.web.unc.edu<https://abhalla.web.unc.edu/>
=====================================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160105/52b033f8/attachment.html>

From dave.mayhem23 at gmail.com  Tue Jan  5 17:20:14 2016
From: dave.mayhem23 at gmail.com (Dave May)
Date: Wed, 6 Jan 2016 00:20:14 +0100
Subject: [petsc-users] Calling MatDestroy on MatGetDiagonalBlock
In-Reply-To: <BDE80D7D-1164-4868-9066-9A0D75480FA2@ad.unc.edu>
References: <BDE80D7D-1164-4868-9066-9A0D75480FA2@ad.unc.edu>
Message-ID: <CAJ98EDry8QHQE6R=T-OQcMjXHzxWaiVgHSL7H73DV=fWL_Le_w@mail.gmail.com>

The manpage

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetDiagonalBlock.html
indicates the reference counter on the returned matrix (a) isn't
incremented.

This statement would imply that in the absence of calling
PetscObjectReference() yourself, you should not call MatDestroy() on the
matrix returned.
If you do call MatDestroy(), a double free will occur when you call
MatDestroy() on the parent matrix from which you pulled the block matrix
out of.

Cheers,
  Dave

On 6 January 2016 at 00:14, Bhalla, Amneet Pal S <amneetb at live.unc.edu>
wrote:

> Hi Folks,
>
> Is it safe to call MatDestroy on the sequential matrix returned by
> MatGetDiagonalBlock() after it?s no longer used?
>
>
> Thanks,
>
> ? Amneet
> =====================================================
> Amneet Bhalla
> Postdoctoral Research Associate
> Department of Mathematics and McAllister Heart Institute
> University of North Carolina at Chapel Hill
> Email: amneet at unc.edu
> Web:  https://abhalla.web.unc.edu
> =====================================================
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160106/d4758ac5/attachment.html>

From balay at mcs.anl.gov  Tue Jan  5 17:21:01 2016
From: balay at mcs.anl.gov (Satish Balay)
Date: Tue, 5 Jan 2016 17:21:01 -0600
Subject: [petsc-users] Calling MatDestroy on MatGetDiagonalBlock
In-Reply-To: <BDE80D7D-1164-4868-9066-9A0D75480FA2@ad.unc.edu>
References: <BDE80D7D-1164-4868-9066-9A0D75480FA2@ad.unc.edu>
Message-ID: <alpine.LFD.2.20.1601051719500.20124@asterix>

Looking at example usages in src/ksp/pc/impls/bjacobi/bjacobi.c or
src/ksp/pc/impls/gasm/gasm.c - there is no call to MatDestroy..
[or MatRestoreDiagonalBlock]

Satish

On Tue, 5 Jan 2016, Bhalla, Amneet Pal S wrote:

> Hi Folks,
> 
> Is it safe to call MatDestroy on the sequential matrix returned by MatGetDiagonalBlock() after it?s no longer used?
> 
> 
> Thanks,
> 
> ? Amneet
> =====================================================
> Amneet Bhalla
> Postdoctoral Research Associate
> Department of Mathematics and McAllister Heart Institute
> University of North Carolina at Chapel Hill
> Email: amneet at unc.edu<mailto:amneet at unc.edu>
> Web:  https://abhalla.web.unc.edu<https://abhalla.web.unc.edu/>
> =====================================================
> 
> 

From amneetb at live.unc.edu  Tue Jan  5 17:24:33 2016
From: amneetb at live.unc.edu (Bhalla, Amneet Pal S)
Date: Tue, 5 Jan 2016 23:24:33 +0000
Subject: [petsc-users] Calling MatDestroy on MatGetDiagonalBlock
In-Reply-To: <CAJ98EDry8QHQE6R=T-OQcMjXHzxWaiVgHSL7H73DV=fWL_Le_w@mail.gmail.com>
References: <BDE80D7D-1164-4868-9066-9A0D75480FA2@ad.unc.edu>
	<CAJ98EDry8QHQE6R=T-OQcMjXHzxWaiVgHSL7H73DV=fWL_Le_w@mail.gmail.com>
Message-ID: <99C59215-FEAD-4696-AE56-F00DF1819FF5@ad.unc.edu>


On Jan 5, 2016, at 3:20 PM, Dave May <dave.mayhem23 at gmail.com<mailto:dave.mayhem23 at gmail.com>> wrote:

This statement would imply that in the absence of calling PetscObjectReference() yourself, you should not call MatDestroy() on the matrix returned

Got it. Thanks!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160105/caa86687/attachment.html>

From bsmith at mcs.anl.gov  Tue Jan  5 17:32:18 2016
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Tue, 5 Jan 2016 17:32:18 -0600
Subject: [petsc-users] Calling MatDestroy on MatGetDiagonalBlock
In-Reply-To: <99C59215-FEAD-4696-AE56-F00DF1819FF5@ad.unc.edu>
References: <BDE80D7D-1164-4868-9066-9A0D75480FA2@ad.unc.edu>
	<CAJ98EDry8QHQE6R=T-OQcMjXHzxWaiVgHSL7H73DV=fWL_Le_w@mail.gmail.com>
	<99C59215-FEAD-4696-AE56-F00DF1819FF5@ad.unc.edu>
Message-ID: <E5A2F6CC-CC70-46D1-A1A7-65F7D9E96DA5@mcs.anl.gov>


  In general XXXGetYYY() do not increase the reference count and you should not destroy. Some XXXGetYYY() have a corresponding XXXRestoreYYY(). 

  XXXCreateYYY() DO increase the reference count and should have destroy called.

  So  get -> no destroy
        create -> destroy

  Barry

In the past we were not consistent between the usages but now I think it is consistent.


> On Jan 5, 2016, at 5:24 PM, Bhalla, Amneet Pal S <amneetb at live.unc.edu> wrote:
> 
> 
> 
>> On Jan 5, 2016, at 3:20 PM, Dave May <dave.mayhem23 at gmail.com> wrote:
>> 
>> This statement would imply that in the absence of calling PetscObjectReference() yourself, you should not call MatDestroy() on the matrix returned
> 
> Got it. Thanks!
> 


From amneetb at live.unc.edu  Tue Jan  5 17:46:58 2016
From: amneetb at live.unc.edu (Bhalla, Amneet Pal S)
Date: Tue, 5 Jan 2016 23:46:58 +0000
Subject: [petsc-users] Calling MatDestroy on MatGetDiagonalBlock
In-Reply-To: <E5A2F6CC-CC70-46D1-A1A7-65F7D9E96DA5@mcs.anl.gov>
References: <BDE80D7D-1164-4868-9066-9A0D75480FA2@ad.unc.edu>
	<CAJ98EDry8QHQE6R=T-OQcMjXHzxWaiVgHSL7H73DV=fWL_Le_w@mail.gmail.com>
	<99C59215-FEAD-4696-AE56-F00DF1819FF5@ad.unc.edu>
	<E5A2F6CC-CC70-46D1-A1A7-65F7D9E96DA5@mcs.anl.gov>
Message-ID: <8A228187-1D01-4EEA-8607-46B4AC87A2AE@ad.unc.edu>


On Jan 5, 2016, at 3:32 PM, Barry Smith <bsmith at mcs.anl.gov<mailto:bsmith at mcs.anl.gov>> wrote:

So  get -> no destroy
       create -> destroy

Is MatGetSubMatrices() exception to this rule? The manual says to call the destroy() function after done with it.

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetSubMatrices.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160105/48c007e2/attachment-0001.html>

From bsmith at mcs.anl.gov  Tue Jan  5 17:53:40 2016
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Tue, 5 Jan 2016 17:53:40 -0600
Subject: [petsc-users] Calling MatDestroy on MatGetDiagonalBlock
In-Reply-To: <8A228187-1D01-4EEA-8607-46B4AC87A2AE@ad.unc.edu>
References: <BDE80D7D-1164-4868-9066-9A0D75480FA2@ad.unc.edu>
	<CAJ98EDry8QHQE6R=T-OQcMjXHzxWaiVgHSL7H73DV=fWL_Le_w@mail.gmail.com>
	<99C59215-FEAD-4696-AE56-F00DF1819FF5@ad.unc.edu>
	<E5A2F6CC-CC70-46D1-A1A7-65F7D9E96DA5@mcs.anl.gov>
	<8A228187-1D01-4EEA-8607-46B4AC87A2AE@ad.unc.edu>
Message-ID: <487B7BAC-B727-45BA-8B7C-3EF33145E60F@mcs.anl.gov>


  Yeah, looks like MatGetSubMatrix() and MatGetSubMatrices() didn't get renamed to the "current" approach.

   Barry

> On Jan 5, 2016, at 5:46 PM, Bhalla, Amneet Pal S <amneetb at live.unc.edu> wrote:
> 
> 
> 
>> On Jan 5, 2016, at 3:32 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>> 
>> So  get -> no destroy
>>        create -> destroy
> 
> Is MatGetSubMatrices() exception to this rule? The manual says to call the destroy() function after done with it. 
> 
> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetSubMatrices.html


From knepley at gmail.com  Tue Jan  5 19:12:32 2016
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 5 Jan 2016 19:12:32 -0600
Subject: [petsc-users] Calling MatDestroy on MatGetDiagonalBlock
In-Reply-To: <487B7BAC-B727-45BA-8B7C-3EF33145E60F@mcs.anl.gov>
References: <BDE80D7D-1164-4868-9066-9A0D75480FA2@ad.unc.edu>
	<CAJ98EDry8QHQE6R=T-OQcMjXHzxWaiVgHSL7H73DV=fWL_Le_w@mail.gmail.com>
	<99C59215-FEAD-4696-AE56-F00DF1819FF5@ad.unc.edu>
	<E5A2F6CC-CC70-46D1-A1A7-65F7D9E96DA5@mcs.anl.gov>
	<8A228187-1D01-4EEA-8607-46B4AC87A2AE@ad.unc.edu>
	<487B7BAC-B727-45BA-8B7C-3EF33145E60F@mcs.anl.gov>
Message-ID: <CAMYG4Gme2mmrTL-A8jowikSLKL7GJWJ4QFpBn2ZYBbx=OPRncg@mail.gmail.com>

How devastating would it be for Deal.II if we renamed them
MatCreateSubMatrix()? ;)

   Matt

On Tue, Jan 5, 2016 at 5:53 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
>   Yeah, looks like MatGetSubMatrix() and MatGetSubMatrices() didn't get
> renamed to the "current" approach.
>
>    Barry
>
> > On Jan 5, 2016, at 5:46 PM, Bhalla, Amneet Pal S <amneetb at live.unc.edu>
> wrote:
> >
> >
> >
> >> On Jan 5, 2016, at 3:32 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >>
> >> So  get -> no destroy
> >>        create -> destroy
> >
> > Is MatGetSubMatrices() exception to this rule? The manual says to call
> the destroy() function after done with it.
> >
> >
> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetSubMatrices.html
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160105/a39d9edc/attachment.html>

From jed at jedbrown.org  Tue Jan  5 19:29:51 2016
From: jed at jedbrown.org (Jed Brown)
Date: Tue, 05 Jan 2016 18:29:51 -0700
Subject: [petsc-users] Calling MatDestroy on MatGetDiagonalBlock
In-Reply-To: <CAMYG4Gme2mmrTL-A8jowikSLKL7GJWJ4QFpBn2ZYBbx=OPRncg@mail.gmail.com>
References: <BDE80D7D-1164-4868-9066-9A0D75480FA2@ad.unc.edu>
	<CAJ98EDry8QHQE6R=T-OQcMjXHzxWaiVgHSL7H73DV=fWL_Le_w@mail.gmail.com>
	<99C59215-FEAD-4696-AE56-F00DF1819FF5@ad.unc.edu>
	<E5A2F6CC-CC70-46D1-A1A7-65F7D9E96DA5@mcs.anl.gov>
	<8A228187-1D01-4EEA-8607-46B4AC87A2AE@ad.unc.edu>
	<487B7BAC-B727-45BA-8B7C-3EF33145E60F@mcs.anl.gov>
	<CAMYG4Gme2mmrTL-A8jowikSLKL7GJWJ4QFpBn2ZYBbx=OPRncg@mail.gmail.com>
Message-ID: <87twmr3374.fsf@jedbrown.org>

Matthew Knepley <knepley at gmail.com> writes:

> How devastating would it be for Deal.II if we renamed them
> MatCreateSubMatrix()? ;)

I know it's consistent with respect to reference counting semantics, but
it might be harder for new users to find when searching the docs.  I
have no data either way.  I recall discussing years ago that having
paired MatGetSubMatrix/MatRestoreSubMatrix would simplify bookkeeping in
PCFieldSplit and some common user code.

If the name is changed, it's easy to support the deprecated name from C.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 818 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160105/d2288f90/attachment.pgp>

From bsmith at mcs.anl.gov  Tue Jan  5 19:36:47 2016
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Tue, 5 Jan 2016 19:36:47 -0600
Subject: [petsc-users] Calling MatDestroy on MatGetDiagonalBlock
In-Reply-To: <87twmr3374.fsf@jedbrown.org>
References: <BDE80D7D-1164-4868-9066-9A0D75480FA2@ad.unc.edu>
	<CAJ98EDry8QHQE6R=T-OQcMjXHzxWaiVgHSL7H73DV=fWL_Le_w@mail.gmail.com>
	<99C59215-FEAD-4696-AE56-F00DF1819FF5@ad.unc.edu>
	<E5A2F6CC-CC70-46D1-A1A7-65F7D9E96DA5@mcs.anl.gov>
	<8A228187-1D01-4EEA-8607-46B4AC87A2AE@ad.unc.edu>
	<487B7BAC-B727-45BA-8B7C-3EF33145E60F@mcs.anl.gov>
	<CAMYG4Gme2mmrTL-A8jowikSLKL7GJWJ4QFpBn2ZYBbx=OPRncg@mail.gmail.com>
	<87twmr3374.fsf@jedbrown.org>
Message-ID: <84C0B347-61B0-4031-A1AB-341C9C6B0B35@mcs.anl.gov>


> On Jan 5, 2016, at 7:29 PM, Jed Brown <jed at jedbrown.org> wrote:
> 
> Matthew Knepley <knepley at gmail.com> writes:
> 
>> How devastating would it be for Deal.II if we renamed them
>> MatCreateSubMatrix()? ;)
> 
> I know it's consistent with respect to reference counting semantics, but
> it might be harder for new users to find when searching the docs.  I
> have no data either way.  I recall discussing years ago that having
> paired MatGetSubMatrix/MatRestoreSubMatrix would simplify bookkeeping in
> PCFieldSplit and some common user code.

  Hmm, I don't recall this at all but it sounds like an intriguing idea.

  Barry

> 
> If the name is changed, it's easy to support the deprecated name from C.


From jychang48 at gmail.com  Tue Jan  5 20:53:03 2016
From: jychang48 at gmail.com (Justin Chang)
Date: Tue, 5 Jan 2016 19:53:03 -0700
Subject: [petsc-users] Array of SNES's
Message-ID: <CAP2=TMjsVs9zuhV60cPYFQeJiJgAXv1CeJx8Jzrd_AsLKgQUUQ@mail.gmail.com>

Hi all,

Is it possible to create an array of SNES's? If I have a problem size N
degrees of freedom, I want each dof to have its own SNES solver (basically
a pointer to N SNES's). Reason for this is because I am performing a
"post-processing" step where after my global solve, each entry of my
solution vector of size N will go through some algebraic manipulation.

If I did a standard LU solve for these individual SNES's, I could use the
same snes and this issue would be moot. But i am using the Variational
Inequality, which basically requires a fresh SNES for each problem.

Thanks,
Justin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160105/11cea4e4/attachment.html>

From jychang48 at gmail.com  Tue Jan  5 22:24:11 2016
From: jychang48 at gmail.com (Justin Chang)
Date: Tue, 5 Jan 2016 21:24:11 -0700
Subject: [petsc-users] Array of SNES's
In-Reply-To: <CAGi1ndRJ7JQ_XxGUbxSu4Z03tHWeUg62k3fWi1bQibbbOjgBXA@mail.gmail.com>
References: <CAP2=TMjsVs9zuhV60cPYFQeJiJgAXv1CeJx8Jzrd_AsLKgQUUQ@mail.gmail.com>
	<CAGi1ndRJ7JQ_XxGUbxSu4Z03tHWeUg62k3fWi1bQibbbOjgBXA@mail.gmail.com>
Message-ID: <CAP2=TMjynHrA0gm2MHnXtijvEnDSgNNXpwNHBBJk7-N69H8xkw@mail.gmail.com>

Timothee,

No i haven't tried, mainly because I don't know how. Btw I am not doing
this in C or FORTRAN, I want to do this in python (via petsc4py) since I am
trying to make this compatible with Firedrake (which is also python-based).

Thanks,
Justin

On Tue, Jan 5, 2016 at 8:57 PM, Timoth?e Nicolas <timothee.nicolas at gmail.com
> wrote:

> Hello and happy new year,
>
> Have you actually tried ? I just declared an array of 10 snes and created
> them, and there is no complaint whatsoever. Also, something I do usually is
> that I declare a derived type which contains some Petsc Objects (like SNES,
> KSP, matrices, vectors, whatever), and create arrays of this derived types.
> This works perfectly fine in my case (I use FORTRAN btw).
>
> Best wishes
>
> Timothee
>
>
> 2016-01-06 11:53 GMT+09:00 Justin Chang <jychang48 at gmail.com>:
>
>> Hi all,
>>
>> Is it possible to create an array of SNES's? If I have a problem size N
>> degrees of freedom, I want each dof to have its own SNES solver (basically
>> a pointer to N SNES's). Reason for this is because I am performing a
>> "post-processing" step where after my global solve, each entry of my
>> solution vector of size N will go through some algebraic manipulation.
>>
>> If I did a standard LU solve for these individual SNES's, I could use the
>> same snes and this issue would be moot. But i am using the Variational
>> Inequality, which basically requires a fresh SNES for each problem.
>>
>> Thanks,
>> Justin
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160105/de37e195/attachment.html>

From timothee.nicolas at gmail.com  Tue Jan  5 22:28:50 2016
From: timothee.nicolas at gmail.com (=?UTF-8?Q?Timoth=C3=A9e_Nicolas?=)
Date: Wed, 6 Jan 2016 13:28:50 +0900
Subject: [petsc-users] Array of SNES's
In-Reply-To: <CAP2=TMjynHrA0gm2MHnXtijvEnDSgNNXpwNHBBJk7-N69H8xkw@mail.gmail.com>
References: <CAP2=TMjsVs9zuhV60cPYFQeJiJgAXv1CeJx8Jzrd_AsLKgQUUQ@mail.gmail.com>
	<CAGi1ndRJ7JQ_XxGUbxSu4Z03tHWeUg62k3fWi1bQibbbOjgBXA@mail.gmail.com>
	<CAP2=TMjynHrA0gm2MHnXtijvEnDSgNNXpwNHBBJk7-N69H8xkw@mail.gmail.com>
Message-ID: <CAGi1ndS4+ayHGpTAi5nk_PMFbPSbZofa86y96Hhe9ToC9XCXwQ@mail.gmail.com>

(Sorry I forgot to answer all in the first message)

Well I have never used Petsc in python, but in FORTRAN, it seems to work
like any array. So why not use a python list for instance ? You would start
with SNESs = [], create a new snes and append it to the list with
SNESs.append(snes). Then you can use your list. That would not do it ?

Timothee

2016-01-06 13:24 GMT+09:00 Justin Chang <jychang48 at gmail.com>:

> Timothee,
>
> No i haven't tried, mainly because I don't know how. Btw I am not doing
> this in C or FORTRAN, I want to do this in python (via petsc4py) since I am
> trying to make this compatible with Firedrake (which is also python-based).
>
> Thanks,
> Justin
>
> On Tue, Jan 5, 2016 at 8:57 PM, Timoth?e Nicolas <
> timothee.nicolas at gmail.com> wrote:
>
>> Hello and happy new year,
>>
>> Have you actually tried ? I just declared an array of 10 snes and created
>> them, and there is no complaint whatsoever. Also, something I do usually is
>> that I declare a derived type which contains some Petsc Objects (like SNES,
>> KSP, matrices, vectors, whatever), and create arrays of this derived types.
>> This works perfectly fine in my case (I use FORTRAN btw).
>>
>> Best wishes
>>
>> Timothee
>>
>>
>> 2016-01-06 11:53 GMT+09:00 Justin Chang <jychang48 at gmail.com>:
>>
>>> Hi all,
>>>
>>> Is it possible to create an array of SNES's? If I have a problem size N
>>> degrees of freedom, I want each dof to have its own SNES solver (basically
>>> a pointer to N SNES's). Reason for this is because I am performing a
>>> "post-processing" step where after my global solve, each entry of my
>>> solution vector of size N will go through some algebraic manipulation.
>>>
>>> If I did a standard LU solve for these individual SNES's, I could use
>>> the same snes and this issue would be moot. But i am using the Variational
>>> Inequality, which basically requires a fresh SNES for each problem.
>>>
>>> Thanks,
>>> Justin
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160106/502d641b/attachment-0001.html>

From patrick.sanan at gmail.com  Tue Jan  5 22:30:30 2016
From: patrick.sanan at gmail.com (Patrick Sanan)
Date: Tue, 5 Jan 2016 20:30:30 -0800
Subject: [petsc-users] Array of SNES's
In-Reply-To: <CAP2=TMjynHrA0gm2MHnXtijvEnDSgNNXpwNHBBJk7-N69H8xkw@mail.gmail.com>
References: <CAP2=TMjsVs9zuhV60cPYFQeJiJgAXv1CeJx8Jzrd_AsLKgQUUQ@mail.gmail.com>
	<CAGi1ndRJ7JQ_XxGUbxSu4Z03tHWeUg62k3fWi1bQibbbOjgBXA@mail.gmail.com>
	<CAP2=TMjynHrA0gm2MHnXtijvEnDSgNNXpwNHBBJk7-N69H8xkw@mail.gmail.com>
Message-ID: <20160106043030.GA15679@Patricks-MacBook-Pro-4.local>

Do all the SNES's need to be constructed at the same time? It will
obviously require a lot of memory to store N SNES objects (or perhaps
your N is small), and if they don't all need to exist simultaneously, then do you have the option to
create and destroy one at a time as you loop over your grid points?
On Tue, Jan 05, 2016 at 09:24:11PM -0700, Justin Chang wrote:
> Timothee,
> 
> No i haven't tried, mainly because I don't know how. Btw I am not doing
> this in C or FORTRAN, I want to do this in python (via petsc4py) since I am
> trying to make this compatible with Firedrake (which is also python-based).
> 
> Thanks,
> Justin
> 
> On Tue, Jan 5, 2016 at 8:57 PM, Timoth?e Nicolas <timothee.nicolas at gmail.com
> > wrote:
> 
> > Hello and happy new year,
> >
> > Have you actually tried ? I just declared an array of 10 snes and created
> > them, and there is no complaint whatsoever. Also, something I do usually is
> > that I declare a derived type which contains some Petsc Objects (like SNES,
> > KSP, matrices, vectors, whatever), and create arrays of this derived types.
> > This works perfectly fine in my case (I use FORTRAN btw).
> >
> > Best wishes
> >
> > Timothee
> >
> >
> > 2016-01-06 11:53 GMT+09:00 Justin Chang <jychang48 at gmail.com>:
> >
> >> Hi all,
> >>
> >> Is it possible to create an array of SNES's? If I have a problem size N
> >> degrees of freedom, I want each dof to have its own SNES solver (basically
> >> a pointer to N SNES's). Reason for this is because I am performing a
> >> "post-processing" step where after my global solve, each entry of my
> >> solution vector of size N will go through some algebraic manipulation.
> >>
> >> If I did a standard LU solve for these individual SNES's, I could use the
> >> same snes and this issue would be moot. But i am using the Variational
> >> Inequality, which basically requires a fresh SNES for each problem.
> >>
> >> Thanks,
> >> Justin
> >>
> >
> >

From jychang48 at gmail.com  Wed Jan  6 01:29:04 2016
From: jychang48 at gmail.com (Justin Chang)
Date: Wed, 6 Jan 2016 00:29:04 -0700
Subject: [petsc-users] Array of SNES's
In-Reply-To: <20160106043030.GA15679@Patricks-MacBook-Pro-4.local>
References: <CAP2=TMjsVs9zuhV60cPYFQeJiJgAXv1CeJx8Jzrd_AsLKgQUUQ@mail.gmail.com>
	<CAGi1ndRJ7JQ_XxGUbxSu4Z03tHWeUg62k3fWi1bQibbbOjgBXA@mail.gmail.com>
	<CAP2=TMjynHrA0gm2MHnXtijvEnDSgNNXpwNHBBJk7-N69H8xkw@mail.gmail.com>
	<20160106043030.GA15679@Patricks-MacBook-Pro-4.local>
Message-ID: <CAP2=TMjCBJTL3NE3n6-MMP7dN1a-5Bp++6hsHXx9UmnaFmHjQA@mail.gmail.com>

Okay so i think there's no need for this in my case.

Doing a standard NewtonLS and using the same SNES was no issue at all. My
original issue was dealing with the Variational Inequality at each grid
point which seemed to break down unless I "reset" the SNES. But when I use
these options: -snes_fd -ksp_type preonly -pc_type lu, it works perfectly
if I use the same snes for all N grid points. Yes I sacrifice some time
forming a FD, but it's not as great as creating new SNES objects each time.

Strange

Justin

On Tue, Jan 5, 2016 at 9:30 PM, Patrick Sanan <patrick.sanan at gmail.com>
wrote:

> Do all the SNES's need to be constructed at the same time? It will
> obviously require a lot of memory to store N SNES objects (or perhaps
> your N is small), and if they don't all need to exist simultaneously, then
> do you have the option to
> create and destroy one at a time as you loop over your grid points?
> On Tue, Jan 05, 2016 at 09:24:11PM -0700, Justin Chang wrote:
> > Timothee,
> >
> > No i haven't tried, mainly because I don't know how. Btw I am not doing
> > this in C or FORTRAN, I want to do this in python (via petsc4py) since I
> am
> > trying to make this compatible with Firedrake (which is also
> python-based).
> >
> > Thanks,
> > Justin
> >
> > On Tue, Jan 5, 2016 at 8:57 PM, Timoth?e Nicolas <
> timothee.nicolas at gmail.com
> > > wrote:
> >
> > > Hello and happy new year,
> > >
> > > Have you actually tried ? I just declared an array of 10 snes and
> created
> > > them, and there is no complaint whatsoever. Also, something I do
> usually is
> > > that I declare a derived type which contains some Petsc Objects (like
> SNES,
> > > KSP, matrices, vectors, whatever), and create arrays of this derived
> types.
> > > This works perfectly fine in my case (I use FORTRAN btw).
> > >
> > > Best wishes
> > >
> > > Timothee
> > >
> > >
> > > 2016-01-06 11:53 GMT+09:00 Justin Chang <jychang48 at gmail.com>:
> > >
> > >> Hi all,
> > >>
> > >> Is it possible to create an array of SNES's? If I have a problem size
> N
> > >> degrees of freedom, I want each dof to have its own SNES solver
> (basically
> > >> a pointer to N SNES's). Reason for this is because I am performing a
> > >> "post-processing" step where after my global solve, each entry of my
> > >> solution vector of size N will go through some algebraic manipulation.
> > >>
> > >> If I did a standard LU solve for these individual SNES's, I could use
> the
> > >> same snes and this issue would be moot. But i am using the Variational
> > >> Inequality, which basically requires a fresh SNES for each problem.
> > >>
> > >> Thanks,
> > >> Justin
> > >>
> > >
> > >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160106/6e6ff618/attachment.html>

From orxan.shibli at gmail.com  Wed Jan  6 06:21:40 2016
From: orxan.shibli at gmail.com (Orxan Shibliyev)
Date: Wed, 6 Jan 2016 14:21:40 +0200
Subject: [petsc-users] Shared library error
Message-ID: <CAOYX9gzSMaBjjzR2tKB6sJLQ3582QqXTnOw6J=1+M5+UxQiNXQ@mail.gmail.com>

I got the following error after my code worked until some time. I don't
know why the error was given while the code was working properly but not in
the beginning.

Error message:

./out: /usr/lib64/libcrypto.so.10: no version information available
(required by
/truba/sw/centos6.4/lib/petsc/3.6.3-gcc-5.1.0-openmpi-1.8.5/lib/libpetsc.so.3.6)
./out: /usr/lib64/libssl.so.10: no version information available (required
by
/truba/sw/centos6.4/lib/petsc/3.6.3-gcc-5.1.0-openmpi-1.8.5/lib/libpetsc.so.3.6)
./out: /usr/lib64/libcrypto.so.10: no version information available
(required by
/truba/sw/centos6.4/lib/petsc/3.6.3-gcc-5.1.0-openmpi-1.8.5/lib/libpetsc.so.3.6)
./out: /usr/lib64/libssl.so.10: no version information available (required
by
/truba/sw/centos6.4/lib/petsc/3.6.3-gcc-5.1.0-openmpi-1.8.5/lib/libpetsc.so.3.6)
./out: /usr/lib64/libcrypto.so.10: no version information available
(required by
/truba/sw/centos6.4/lib/petsc/3.6.3-gcc-5.1.0-openmpi-1.8.5/lib/libpetsc.so.3.6)
./out: /usr/lib64/libssl.so.10: no version information available (required
by
/truba/sw/centos6.4/lib/petsc/3.6.3-gcc-5.1.0-openmpi-1.8.5/lib/libpetsc.so.3.6)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160106/cb9c0da4/attachment.html>

From knepley at gmail.com  Wed Jan  6 07:43:14 2016
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 6 Jan 2016 07:43:14 -0600
Subject: [petsc-users] Shared library error
In-Reply-To: <CAOYX9gzSMaBjjzR2tKB6sJLQ3582QqXTnOw6J=1+M5+UxQiNXQ@mail.gmail.com>
References: <CAOYX9gzSMaBjjzR2tKB6sJLQ3582QqXTnOw6J=1+M5+UxQiNXQ@mail.gmail.com>
Message-ID: <CAMYG4GncVrRAEavoE_8-aw=5N47XMn8H7uZnzT5C5U_k=-997A@mail.gmail.com>

Does this prevent it from running? It just looks like those libraries have
a documentation problem.

   Matt

On Wed, Jan 6, 2016 at 6:21 AM, Orxan Shibliyev <orxan.shibli at gmail.com>
wrote:

> I got the following error after my code worked until some time. I don't
> know why the error was given while the code was working properly but not in
> the beginning.
>
> Error message:
>
> ./out: /usr/lib64/libcrypto.so.10: no version information available
> (required by
> /truba/sw/centos6.4/lib/petsc/3.6.3-gcc-5.1.0-openmpi-1.8.5/lib/libpetsc.so.3.6)
> ./out: /usr/lib64/libssl.so.10: no version information available (required
> by
> /truba/sw/centos6.4/lib/petsc/3.6.3-gcc-5.1.0-openmpi-1.8.5/lib/libpetsc.so.3.6)
> ./out: /usr/lib64/libcrypto.so.10: no version information available
> (required by
> /truba/sw/centos6.4/lib/petsc/3.6.3-gcc-5.1.0-openmpi-1.8.5/lib/libpetsc.so.3.6)
> ./out: /usr/lib64/libssl.so.10: no version information available (required
> by
> /truba/sw/centos6.4/lib/petsc/3.6.3-gcc-5.1.0-openmpi-1.8.5/lib/libpetsc.so.3.6)
> ./out: /usr/lib64/libcrypto.so.10: no version information available
> (required by
> /truba/sw/centos6.4/lib/petsc/3.6.3-gcc-5.1.0-openmpi-1.8.5/lib/libpetsc.so.3.6)
> ./out: /usr/lib64/libssl.so.10: no version information available (required
> by
> /truba/sw/centos6.4/lib/petsc/3.6.3-gcc-5.1.0-openmpi-1.8.5/lib/libpetsc.so.3.6)
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160106/cb05a7bb/attachment.html>

From timothee.nicolas at gmail.com  Thu Jan  7 07:49:47 2016
From: timothee.nicolas at gmail.com (=?UTF-8?Q?Timoth=C3=A9e_Nicolas?=)
Date: Thu, 7 Jan 2016 22:49:47 +0900
Subject: [petsc-users] Block Jacobi for Matrix-free
Message-ID: <CAGi1ndSxePJNb6Z8eAfE2TcbAaT+QFi-NaJ55n15SRx5gHqYgw@mail.gmail.com>

Hello everyone,

I have discovered that I need to use Block Jacobi, rather than Jacobi, as a
preconditioner/smoother. The linear problem I am solving at this stage
lives in a subspace with 3 degrees of freedom, which represent the 3
components of a 3D vector. In particular for multigrid, using BJACOBI
instead of JACOBI as a smoother changes everything in terms of efficiency.
I know it because I have tested with the actual matrix in matrix format for
my problem. However, eventually, I want to be matrix free.

My question is, what are the operations I need to provide for the
matrix-free approach to accept BJACOBI ? I am confused because when I try
to apply BJACOBI to my matrix-free operator; the code asks for
MatGetDiagonalBlock (see error below). But MatGetDiagonalBlock, in my
understanding, returns a uniprocessor matrix representing the diagonal part
of the matrix on this processor (as defined in the manual). Instead, I
would expect that what is needed is a routine which returns a 3x3 matrix at
the grid point (that is, the block associated with this grid point,
coupling the 3 components of the vector together). How does this work ? Do
I simply need to code MatGetDiagonalBlock ?

Thx
Best

Timoth?e

[0]PETSC ERROR: --------------------- Error Message
--------------------------------------------------------------
[0]PETSC ERROR: No support for this operation for this object type
[0]PETSC ERROR: Matrix type shell does not support getting diagonal block
[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for
trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.6.1, Jul, 22, 2015
[0]PETSC ERROR: ./miips on a arch-linux2-c-debug named Carl-9000 by
timothee Thu Jan  7 22:41:13 2016
[0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++
--with-fc=gfortran --download-fblaslapack --download-mpich
[0]PETSC ERROR: #1 MatGetDiagonalBlock() line 166 in
/home/timothee/Documents/petsc-3.6.1/src/mat/interface/matrix.c
[0]PETSC ERROR: #2 PCSetUp_BJacobi() line 126 in
/home/timothee/Documents/petsc-3.6.1/src/ksp/pc/impls/bjacobi/bjacobi.c
[0]PETSC ERROR: #3 PCSetUp() line 982 in
/home/timothee/Documents/petsc-3.6.1/src/ksp/pc/interface/precon.c
[0]PETSC ERROR: #4 KSPSetUp() line 332 in
/home/timothee/Documents/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c
[0]PETSC ERROR: #5 KSPSolve() line 546 in
/home/timothee/Documents/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160107/dfa1a1ad/attachment.html>

From knepley at gmail.com  Thu Jan  7 08:06:56 2016
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 7 Jan 2016 08:06:56 -0600
Subject: [petsc-users] Block Jacobi for Matrix-free
In-Reply-To: <CAGi1ndSxePJNb6Z8eAfE2TcbAaT+QFi-NaJ55n15SRx5gHqYgw@mail.gmail.com>
References: <CAGi1ndSxePJNb6Z8eAfE2TcbAaT+QFi-NaJ55n15SRx5gHqYgw@mail.gmail.com>
Message-ID: <CAMYG4GnKe=euFb-0fN3ninmvnZ4E+pjCSTB3aWr=+OEn7qEeAw@mail.gmail.com>

On Thu, Jan 7, 2016 at 7:49 AM, Timoth?e Nicolas <timothee.nicolas at gmail.com
> wrote:

> Hello everyone,
>
> I have discovered that I need to use Block Jacobi, rather than Jacobi, as
> a preconditioner/smoother. The linear problem I am solving at this stage
> lives in a subspace with 3 degrees of freedom, which represent the 3
> components of a 3D vector. In particular for multigrid, using BJACOBI
> instead of JACOBI as a smoother changes everything in terms of efficiency.
> I know it because I have tested with the actual matrix in matrix format for
> my problem. However, eventually, I want to be matrix free.
>
> My question is, what are the operations I need to provide for the
> matrix-free approach to accept BJACOBI ? I am confused because when I try
> to apply BJACOBI to my matrix-free operator; the code asks for
> MatGetDiagonalBlock (see error below). But MatGetDiagonalBlock, in my
> understanding, returns a uniprocessor matrix representing the diagonal part
> of the matrix on this processor (as defined in the manual). Instead, I
> would expect that what is needed is a routine which returns a 3x3 matrix at
> the grid point (that is, the block associated with this grid point,
> coupling the 3 components of the vector together). How does this work ? Do
> I simply need to code MatGetDiagonalBlock ?
>

Just like Jacobi does not request one diagonal element at a time,
Block-Jacobi does not request one diagonal block at a time. You
would need to implement that function, or write a custom block Jacobi for
this matrix.

  Thanks,

    Matt


>
> Thx
> Best
>
> Timoth?e
>
> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [0]PETSC ERROR: No support for this operation for this object type
> [0]PETSC ERROR: Matrix type shell does not support getting diagonal block
> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.6.1, Jul, 22, 2015
> [0]PETSC ERROR: ./miips on a arch-linux2-c-debug named Carl-9000 by
> timothee Thu Jan  7 22:41:13 2016
> [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++
> --with-fc=gfortran --download-fblaslapack --download-mpich
> [0]PETSC ERROR: #1 MatGetDiagonalBlock() line 166 in
> /home/timothee/Documents/petsc-3.6.1/src/mat/interface/matrix.c
> [0]PETSC ERROR: #2 PCSetUp_BJacobi() line 126 in
> /home/timothee/Documents/petsc-3.6.1/src/ksp/pc/impls/bjacobi/bjacobi.c
> [0]PETSC ERROR: #3 PCSetUp() line 982 in
> /home/timothee/Documents/petsc-3.6.1/src/ksp/pc/interface/precon.c
> [0]PETSC ERROR: #4 KSPSetUp() line 332 in
> /home/timothee/Documents/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c
> [0]PETSC ERROR: #5 KSPSolve() line 546 in
> /home/timothee/Documents/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160107/47204b70/attachment.html>

From timothee.nicolas at gmail.com  Thu Jan  7 08:08:46 2016
From: timothee.nicolas at gmail.com (=?UTF-8?Q?Timoth=C3=A9e_Nicolas?=)
Date: Thu, 7 Jan 2016 23:08:46 +0900
Subject: [petsc-users] Block Jacobi for Matrix-free
In-Reply-To: <CAMYG4GnKe=euFb-0fN3ninmvnZ4E+pjCSTB3aWr=+OEn7qEeAw@mail.gmail.com>
References: <CAGi1ndSxePJNb6Z8eAfE2TcbAaT+QFi-NaJ55n15SRx5gHqYgw@mail.gmail.com>
	<CAMYG4GnKe=euFb-0fN3ninmvnZ4E+pjCSTB3aWr=+OEn7qEeAw@mail.gmail.com>
Message-ID: <CAGi1ndTQd3tGqajWz6Q9yvxaMF6qttT1A9n3a-s1isFnvbtB3A@mail.gmail.com>

Ok, so it should be sufficient. Great, I think I can do it.

Best

Timoth?e

2016-01-07 23:06 GMT+09:00 Matthew Knepley <knepley at gmail.com>:

> On Thu, Jan 7, 2016 at 7:49 AM, Timoth?e Nicolas <
> timothee.nicolas at gmail.com> wrote:
>
>> Hello everyone,
>>
>> I have discovered that I need to use Block Jacobi, rather than Jacobi, as
>> a preconditioner/smoother. The linear problem I am solving at this stage
>> lives in a subspace with 3 degrees of freedom, which represent the 3
>> components of a 3D vector. In particular for multigrid, using BJACOBI
>> instead of JACOBI as a smoother changes everything in terms of efficiency.
>> I know it because I have tested with the actual matrix in matrix format for
>> my problem. However, eventually, I want to be matrix free.
>>
>> My question is, what are the operations I need to provide for the
>> matrix-free approach to accept BJACOBI ? I am confused because when I try
>> to apply BJACOBI to my matrix-free operator; the code asks for
>> MatGetDiagonalBlock (see error below). But MatGetDiagonalBlock, in my
>> understanding, returns a uniprocessor matrix representing the diagonal part
>> of the matrix on this processor (as defined in the manual). Instead, I
>> would expect that what is needed is a routine which returns a 3x3 matrix at
>> the grid point (that is, the block associated with this grid point,
>> coupling the 3 components of the vector together). How does this work ? Do
>> I simply need to code MatGetDiagonalBlock ?
>>
>
> Just like Jacobi does not request one diagonal element at a time,
> Block-Jacobi does not request one diagonal block at a time. You
> would need to implement that function, or write a custom block Jacobi for
> this matrix.
>
>   Thanks,
>
>     Matt
>
>
>>
>> Thx
>> Best
>>
>> Timoth?e
>>
>> [0]PETSC ERROR: --------------------- Error Message
>> --------------------------------------------------------------
>> [0]PETSC ERROR: No support for this operation for this object type
>> [0]PETSC ERROR: Matrix type shell does not support getting diagonal block
>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
>> for trouble shooting.
>> [0]PETSC ERROR: Petsc Release Version 3.6.1, Jul, 22, 2015
>> [0]PETSC ERROR: ./miips on a arch-linux2-c-debug named Carl-9000 by
>> timothee Thu Jan  7 22:41:13 2016
>> [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++
>> --with-fc=gfortran --download-fblaslapack --download-mpich
>> [0]PETSC ERROR: #1 MatGetDiagonalBlock() line 166 in
>> /home/timothee/Documents/petsc-3.6.1/src/mat/interface/matrix.c
>> [0]PETSC ERROR: #2 PCSetUp_BJacobi() line 126 in
>> /home/timothee/Documents/petsc-3.6.1/src/ksp/pc/impls/bjacobi/bjacobi.c
>> [0]PETSC ERROR: #3 PCSetUp() line 982 in
>> /home/timothee/Documents/petsc-3.6.1/src/ksp/pc/interface/precon.c
>> [0]PETSC ERROR: #4 KSPSetUp() line 332 in
>> /home/timothee/Documents/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c
>> [0]PETSC ERROR: #5 KSPSolve() line 546 in
>> /home/timothee/Documents/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c
>>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160107/b0a2bcc6/attachment.html>

From bsmith at mcs.anl.gov  Thu Jan  7 11:38:58 2016
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Thu, 7 Jan 2016 11:38:58 -0600
Subject: [petsc-users] Block Jacobi for Matrix-free
In-Reply-To: <CAGi1ndTQd3tGqajWz6Q9yvxaMF6qttT1A9n3a-s1isFnvbtB3A@mail.gmail.com>
References: <CAGi1ndSxePJNb6Z8eAfE2TcbAaT+QFi-NaJ55n15SRx5gHqYgw@mail.gmail.com>
	<CAMYG4GnKe=euFb-0fN3ninmvnZ4E+pjCSTB3aWr=+OEn7qEeAw@mail.gmail.com>
	<CAGi1ndTQd3tGqajWz6Q9yvxaMF6qttT1A9n3a-s1isFnvbtB3A@mail.gmail.com>
Message-ID: <360F0D42-75ED-419F-BFD2-6494AA67F9AA@mcs.anl.gov>


  Timothee,

      You are mixing up block Jacobi PCBJACOBI (which in PETSc generally uses "big" blocks) and point block Jacobi PCPBJACOBI (which generally means all the degrees of freedom associated with a single grid point -- in your case 3). 

   If you are doing matrix free with a shell matrix then you need to provide your own MatInvertBlockDiagonal() which in your case would invert each of your little 3 by 3 blocks and store the result in a 1d array; each little block in column major order followed by the next one. See for example MatInvertBlockDiagonal_SeqAIJ(). You also need you matrix to return a block size of 3.


   Barry


> On Jan 7, 2016, at 8:08 AM, Timoth?e Nicolas <timothee.nicolas at gmail.com> wrote:
> 
> Ok, so it should be sufficient. Great, I think I can do it.
> 
> Best
> 
> Timoth?e
> 
> 2016-01-07 23:06 GMT+09:00 Matthew Knepley <knepley at gmail.com>:
> On Thu, Jan 7, 2016 at 7:49 AM, Timoth?e Nicolas <timothee.nicolas at gmail.com> wrote:
> Hello everyone,
> 
> I have discovered that I need to use Block Jacobi, rather than Jacobi, as a preconditioner/smoother. The linear problem I am solving at this stage lives in a subspace with 3 degrees of freedom, which represent the 3 components of a 3D vector. In particular for multigrid, using BJACOBI instead of JACOBI as a smoother changes everything in terms of efficiency. I know it because I have tested with the actual matrix in matrix format for my problem. However, eventually, I want to be matrix free.
> 
> My question is, what are the operations I need to provide for the matrix-free approach to accept BJACOBI ? I am confused because when I try to apply BJACOBI to my matrix-free operator; the code asks for MatGetDiagonalBlock (see error below). But MatGetDiagonalBlock, in my understanding, returns a uniprocessor matrix representing the diagonal part of the matrix on this processor (as defined in the manual). Instead, I would expect that what is needed is a routine which returns a 3x3 matrix at the grid point (that is, the block associated with this grid point, coupling the 3 components of the vector together). How does this work ? Do I simply need to code MatGetDiagonalBlock ?
> 
> Just like Jacobi does not request one diagonal element at a time, Block-Jacobi does not request one diagonal block at a time. You
> would need to implement that function, or write a custom block Jacobi for this matrix.
> 
>   Thanks,
> 
>     Matt
>  
> 
> Thx
> Best
> 
> Timoth?e
> 
> [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [0]PETSC ERROR: No support for this operation for this object type
> [0]PETSC ERROR: Matrix type shell does not support getting diagonal block
> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.6.1, Jul, 22, 2015 
> [0]PETSC ERROR: ./miips on a arch-linux2-c-debug named Carl-9000 by timothee Thu Jan  7 22:41:13 2016
> [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack --download-mpich
> [0]PETSC ERROR: #1 MatGetDiagonalBlock() line 166 in /home/timothee/Documents/petsc-3.6.1/src/mat/interface/matrix.c
> [0]PETSC ERROR: #2 PCSetUp_BJacobi() line 126 in /home/timothee/Documents/petsc-3.6.1/src/ksp/pc/impls/bjacobi/bjacobi.c
> [0]PETSC ERROR: #3 PCSetUp() line 982 in /home/timothee/Documents/petsc-3.6.1/src/ksp/pc/interface/precon.c
> [0]PETSC ERROR: #4 KSPSetUp() line 332 in /home/timothee/Documents/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c
> [0]PETSC ERROR: #5 KSPSolve() line 546 in /home/timothee/Documents/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 


From timothee.nicolas at gmail.com  Thu Jan  7 19:58:38 2016
From: timothee.nicolas at gmail.com (=?UTF-8?Q?Timoth=C3=A9e_Nicolas?=)
Date: Fri, 8 Jan 2016 10:58:38 +0900
Subject: [petsc-users] Block Jacobi for Matrix-free
In-Reply-To: <360F0D42-75ED-419F-BFD2-6494AA67F9AA@mcs.anl.gov>
References: <CAGi1ndSxePJNb6Z8eAfE2TcbAaT+QFi-NaJ55n15SRx5gHqYgw@mail.gmail.com>
	<CAMYG4GnKe=euFb-0fN3ninmvnZ4E+pjCSTB3aWr=+OEn7qEeAw@mail.gmail.com>
	<CAGi1ndTQd3tGqajWz6Q9yvxaMF6qttT1A9n3a-s1isFnvbtB3A@mail.gmail.com>
	<360F0D42-75ED-419F-BFD2-6494AA67F9AA@mcs.anl.gov>
Message-ID: <CAGi1ndSeg9+XvvPU5bqhs3rzZHbeDw9Jv1bv3JETJtjE8_TZ1g@mail.gmail.com>

I see, I just tested PCPBJACOBI, which is better than PCJACOBI, but still I
may need PCBJACOBI. The problem is that I don't seem to be allowed to
define the matrix operation for MatGetDiagonalBlock... Indeed, I don't find

MATOP_GET_DIAGONAL_BLOCK in ${PETSC_DIR}/include/petscmat.h

Therefore, when I try to define it, I get the following error at
compilation (quite logically)

matrices.F90(174): error #6404: This name does not have a type, and must
have an explicit type.   [MATOP_GET_DIAGONAL_BLOCK]
     call
MatShellSetOperation(lctx(1)%PSmat,MATOP_GET_DIAGONAL_BLOCK,PSmatGetDiagonalBlock,ierr)
---------------------------------------------^

Also, if I change my mind and instead decide to go for PCPBJACOBI, I still
have a problem because the manual says that the routine you talk about,
MatInvertBlockDiagonal, is not available from FORTRAN. Indeed I cannot call
it. I still cannot call it after I provide a routine corresponding to
MATOP_INVERT_BLOCK_DIAGONAL.

So, it seems to mean that if I want to use this kind of algorithms, I will
have to hard code them, which would be too bad. Is that right, or is there
an other way around these two issues ?

Best

Timothee


2016-01-08 2:38 GMT+09:00 Barry Smith <bsmith at mcs.anl.gov>:

>
>   Timothee,
>
>       You are mixing up block Jacobi PCBJACOBI (which in PETSc generally
> uses "big" blocks) and point block Jacobi PCPBJACOBI (which generally means
> all the degrees of freedom associated with a single grid point -- in your
> case 3).
>
>    If you are doing matrix free with a shell matrix then you need to
> provide your own MatInvertBlockDiagonal() which in your case would invert
> each of your little 3 by 3 blocks and store the result in a 1d array; each
> little block in column major order followed by the next one. See for
> example MatInvertBlockDiagonal_SeqAIJ(). You also need you matrix to return
> a block size of 3.
>
>
>    Barry
>
>
>
> > On Jan 7, 2016, at 8:08 AM, Timoth?e Nicolas <timothee.nicolas at gmail.com>
> wrote:
> >
> > Ok, so it should be sufficient. Great, I think I can do it.
> >
> > Best
> >
> > Timoth?e
> >
> > 2016-01-07 23:06 GMT+09:00 Matthew Knepley <knepley at gmail.com>:
> > On Thu, Jan 7, 2016 at 7:49 AM, Timoth?e Nicolas <
> timothee.nicolas at gmail.com> wrote:
> > Hello everyone,
> >
> > I have discovered that I need to use Block Jacobi, rather than Jacobi,
> as a preconditioner/smoother. The linear problem I am solving at this stage
> lives in a subspace with 3 degrees of freedom, which represent the 3
> components of a 3D vector. In particular for multigrid, using BJACOBI
> instead of JACOBI as a smoother changes everything in terms of efficiency.
> I know it because I have tested with the actual matrix in matrix format for
> my problem. However, eventually, I want to be matrix free.
> >
> > My question is, what are the operations I need to provide for the
> matrix-free approach to accept BJACOBI ? I am confused because when I try
> to apply BJACOBI to my matrix-free operator; the code asks for
> MatGetDiagonalBlock (see error below). But MatGetDiagonalBlock, in my
> understanding, returns a uniprocessor matrix representing the diagonal part
> of the matrix on this processor (as defined in the manual). Instead, I
> would expect that what is needed is a routine which returns a 3x3 matrix at
> the grid point (that is, the block associated with this grid point,
> coupling the 3 components of the vector together). How does this work ? Do
> I simply need to code MatGetDiagonalBlock ?
> >
> > Just like Jacobi does not request one diagonal element at a time,
> Block-Jacobi does not request one diagonal block at a time. You
> > would need to implement that function, or write a custom block Jacobi
> for this matrix.
> >
> >   Thanks,
> >
> >     Matt
> >
> >
> > Thx
> > Best
> >
> > Timoth?e
> >
> > [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> > [0]PETSC ERROR: No support for this operation for this object type
> > [0]PETSC ERROR: Matrix type shell does not support getting diagonal block
> > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> > [0]PETSC ERROR: Petsc Release Version 3.6.1, Jul, 22, 2015
> > [0]PETSC ERROR: ./miips on a arch-linux2-c-debug named Carl-9000 by
> timothee Thu Jan  7 22:41:13 2016
> > [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++
> --with-fc=gfortran --download-fblaslapack --download-mpich
> > [0]PETSC ERROR: #1 MatGetDiagonalBlock() line 166 in
> /home/timothee/Documents/petsc-3.6.1/src/mat/interface/matrix.c
> > [0]PETSC ERROR: #2 PCSetUp_BJacobi() line 126 in
> /home/timothee/Documents/petsc-3.6.1/src/ksp/pc/impls/bjacobi/bjacobi.c
> > [0]PETSC ERROR: #3 PCSetUp() line 982 in
> /home/timothee/Documents/petsc-3.6.1/src/ksp/pc/interface/precon.c
> > [0]PETSC ERROR: #4 KSPSetUp() line 332 in
> /home/timothee/Documents/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c
> > [0]PETSC ERROR: #5 KSPSolve() line 546 in
> /home/timothee/Documents/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c
> >
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> > -- Norbert Wiener
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160108/cec1cfb9/attachment.html>

From bsmith at mcs.anl.gov  Thu Jan  7 20:06:29 2016
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Thu, 7 Jan 2016 20:06:29 -0600
Subject: [petsc-users] Block Jacobi for Matrix-free
In-Reply-To: <CAGi1ndSeg9+XvvPU5bqhs3rzZHbeDw9Jv1bv3JETJtjE8_TZ1g@mail.gmail.com>
References: <CAGi1ndSxePJNb6Z8eAfE2TcbAaT+QFi-NaJ55n15SRx5gHqYgw@mail.gmail.com>
	<CAMYG4GnKe=euFb-0fN3ninmvnZ4E+pjCSTB3aWr=+OEn7qEeAw@mail.gmail.com>
	<CAGi1ndTQd3tGqajWz6Q9yvxaMF6qttT1A9n3a-s1isFnvbtB3A@mail.gmail.com>
	<360F0D42-75ED-419F-BFD2-6494AA67F9AA@mcs.anl.gov>
	<CAGi1ndSeg9+XvvPU5bqhs3rzZHbeDw9Jv1bv3JETJtjE8_TZ1g@mail.gmail.com>
Message-ID: <DFE7B916-F9AF-415E-A09C-919D89FDC2B1@mcs.anl.gov>


> On Jan 7, 2016, at 7:58 PM, Timoth?e Nicolas <timothee.nicolas at gmail.com> wrote:
> 
> I see, I just tested PCPBJACOBI, which is better than PCJACOBI, but still I may need PCBJACOBI.

   Note that using PCBJACOBI means you are providing big blocks of the Jacobian. If you do provide big blocks of the Jacobian you might as well just provide the entire Jacobin IMHO. 

   Anyways the easiest way to do either PCPBJACOBI or PCBJACOBI is to explicitly construct the portion of the Jacobian you need, in a AIJ or BAIJ matrix and pass that as the SECOND matrix argument to KSPSetOperator() or SNESSetJacobian() then PETSc will use the piece you provide to build the preconditioner. So for example if you want PBJACOBI you would create a BAIJ matrix with block size 3 and only fill up the 3 by 3 block diagonal with Jacobian entries.


   Barry


> The problem is that I don't seem to be allowed to define the matrix operation for MatGetDiagonalBlock... Indeed, I don't find
> 
> MATOP_GET_DIAGONAL_BLOCK in ${PETSC_DIR}/include/petscmat.h
> 
> Therefore, when I try to define it, I get the following error at compilation (quite logically)
> 
> matrices.F90(174): error #6404: This name does not have a type, and must have an explicit type.   [MATOP_GET_DIAGONAL_BLOCK]
>      call MatShellSetOperation(lctx(1)%PSmat,MATOP_GET_DIAGONAL_BLOCK,PSmatGetDiagonalBlock,ierr)
> ---------------------------------------------^
> 
> Also, if I change my mind and instead decide to go for PCPBJACOBI, I still have a problem because the manual says that the routine you talk about, MatInvertBlockDiagonal, is not available from FORTRAN. Indeed I cannot call it. I still cannot call it after I provide a routine corresponding to MATOP_INVERT_BLOCK_DIAGONAL. 
> 
> So, it seems to mean that if I want to use this kind of algorithms, I will have to hard code them, which would be too bad. Is that right, or is there an other way around these two issues ?
> 
> Best
> 
> Timothee
> 
> 
> 
> 
> 2016-01-08 2:38 GMT+09:00 Barry Smith <bsmith at mcs.anl.gov>:
> 
>   Timothee,
> 
>       You are mixing up block Jacobi PCBJACOBI (which in PETSc generally uses "big" blocks) and point block Jacobi PCPBJACOBI (which generally means all the degrees of freedom associated with a single grid point -- in your case 3).
> 
>    If you are doing matrix free with a shell matrix then you need to provide your own MatInvertBlockDiagonal() which in your case would invert each of your little 3 by 3 blocks and store the result in a 1d array; each little block in column major order followed by the next one. See for example MatInvertBlockDiagonal_SeqAIJ(). You also need you matrix to return a block size of 3.
> 
> 
>    Barry
> 
> 
> 
> > On Jan 7, 2016, at 8:08 AM, Timoth?e Nicolas <timothee.nicolas at gmail.com> wrote:
> >
> > Ok, so it should be sufficient. Great, I think I can do it.
> >
> > Best
> >
> > Timoth?e
> >
> > 2016-01-07 23:06 GMT+09:00 Matthew Knepley <knepley at gmail.com>:
> > On Thu, Jan 7, 2016 at 7:49 AM, Timoth?e Nicolas <timothee.nicolas at gmail.com> wrote:
> > Hello everyone,
> >
> > I have discovered that I need to use Block Jacobi, rather than Jacobi, as a preconditioner/smoother. The linear problem I am solving at this stage lives in a subspace with 3 degrees of freedom, which represent the 3 components of a 3D vector. In particular for multigrid, using BJACOBI instead of JACOBI as a smoother changes everything in terms of efficiency. I know it because I have tested with the actual matrix in matrix format for my problem. However, eventually, I want to be matrix free.
> >
> > My question is, what are the operations I need to provide for the matrix-free approach to accept BJACOBI ? I am confused because when I try to apply BJACOBI to my matrix-free operator; the code asks for MatGetDiagonalBlock (see error below). But MatGetDiagonalBlock, in my understanding, returns a uniprocessor matrix representing the diagonal part of the matrix on this processor (as defined in the manual). Instead, I would expect that what is needed is a routine which returns a 3x3 matrix at the grid point (that is, the block associated with this grid point, coupling the 3 components of the vector together). How does this work ? Do I simply need to code MatGetDiagonalBlock ?
> >
> > Just like Jacobi does not request one diagonal element at a time, Block-Jacobi does not request one diagonal block at a time. You
> > would need to implement that function, or write a custom block Jacobi for this matrix.
> >
> >   Thanks,
> >
> >     Matt
> >
> >
> > Thx
> > Best
> >
> > Timoth?e
> >
> > [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> > [0]PETSC ERROR: No support for this operation for this object type
> > [0]PETSC ERROR: Matrix type shell does not support getting diagonal block
> > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> > [0]PETSC ERROR: Petsc Release Version 3.6.1, Jul, 22, 2015
> > [0]PETSC ERROR: ./miips on a arch-linux2-c-debug named Carl-9000 by timothee Thu Jan  7 22:41:13 2016
> > [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack --download-mpich
> > [0]PETSC ERROR: #1 MatGetDiagonalBlock() line 166 in /home/timothee/Documents/petsc-3.6.1/src/mat/interface/matrix.c
> > [0]PETSC ERROR: #2 PCSetUp_BJacobi() line 126 in /home/timothee/Documents/petsc-3.6.1/src/ksp/pc/impls/bjacobi/bjacobi.c
> > [0]PETSC ERROR: #3 PCSetUp() line 982 in /home/timothee/Documents/petsc-3.6.1/src/ksp/pc/interface/precon.c
> > [0]PETSC ERROR: #4 KSPSetUp() line 332 in /home/timothee/Documents/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c
> > [0]PETSC ERROR: #5 KSPSolve() line 546 in /home/timothee/Documents/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c
> >
> >
> >
> > --
> > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> > -- Norbert Wiener
> >
> 
> 


From timothee.nicolas at gmail.com  Thu Jan  7 20:31:35 2016
From: timothee.nicolas at gmail.com (=?UTF-8?Q?Timoth=C3=A9e_Nicolas?=)
Date: Fri, 8 Jan 2016 11:31:35 +0900
Subject: [petsc-users] Block Jacobi for Matrix-free
In-Reply-To: <DFE7B916-F9AF-415E-A09C-919D89FDC2B1@mcs.anl.gov>
References: <CAGi1ndSxePJNb6Z8eAfE2TcbAaT+QFi-NaJ55n15SRx5gHqYgw@mail.gmail.com>
	<CAMYG4GnKe=euFb-0fN3ninmvnZ4E+pjCSTB3aWr=+OEn7qEeAw@mail.gmail.com>
	<CAGi1ndTQd3tGqajWz6Q9yvxaMF6qttT1A9n3a-s1isFnvbtB3A@mail.gmail.com>
	<360F0D42-75ED-419F-BFD2-6494AA67F9AA@mcs.anl.gov>
	<CAGi1ndSeg9+XvvPU5bqhs3rzZHbeDw9Jv1bv3JETJtjE8_TZ1g@mail.gmail.com>
	<DFE7B916-F9AF-415E-A09C-919D89FDC2B1@mcs.anl.gov>
Message-ID: <CAGi1ndT8Z7Ehi=ABVa3xPymM4xmxH6yvs3eG69=1+WusB5aoNQ@mail.gmail.com>

Ah, I understand, so by allocating this BAIJ in an intelligent way
(allocating only the diagonal 3x3 blocks), I can still be basically memory
efficient, and use matrix-free formulation for the first matrix in
KSPSetOperator, right ?

Timothee

2016-01-08 11:06 GMT+09:00 Barry Smith <bsmith at mcs.anl.gov>:

>
> > On Jan 7, 2016, at 7:58 PM, Timoth?e Nicolas <timothee.nicolas at gmail.com>
> wrote:
> >
> > I see, I just tested PCPBJACOBI, which is better than PCJACOBI, but
> still I may need PCBJACOBI.
>
>    Note that using PCBJACOBI means you are providing big blocks of the
> Jacobian. If you do provide big blocks of the Jacobian you might as well
> just provide the entire Jacobin IMHO.
>
>    Anyways the easiest way to do either PCPBJACOBI or PCBJACOBI is to
> explicitly construct the portion of the Jacobian you need, in a AIJ or BAIJ
> matrix and pass that as the SECOND matrix argument to KSPSetOperator() or
> SNESSetJacobian() then PETSc will use the piece you provide to build the
> preconditioner. So for example if you want PBJACOBI you would create a BAIJ
> matrix with block size 3 and only fill up the 3 by 3 block diagonal with
> Jacobian entries.
>
>
>    Barry
>
>
> > The problem is that I don't seem to be allowed to define the matrix
> operation for MatGetDiagonalBlock... Indeed, I don't find
> >
> > MATOP_GET_DIAGONAL_BLOCK in ${PETSC_DIR}/include/petscmat.h
> >
> > Therefore, when I try to define it, I get the following error at
> compilation (quite logically)
> >
> > matrices.F90(174): error #6404: This name does not have a type, and must
> have an explicit type.   [MATOP_GET_DIAGONAL_BLOCK]
> >      call
> MatShellSetOperation(lctx(1)%PSmat,MATOP_GET_DIAGONAL_BLOCK,PSmatGetDiagonalBlock,ierr)
> > ---------------------------------------------^
> >
> > Also, if I change my mind and instead decide to go for PCPBJACOBI, I
> still have a problem because the manual says that the routine you talk
> about, MatInvertBlockDiagonal, is not available from FORTRAN. Indeed I
> cannot call it. I still cannot call it after I provide a routine
> corresponding to MATOP_INVERT_BLOCK_DIAGONAL.
> >
> > So, it seems to mean that if I want to use this kind of algorithms, I
> will have to hard code them, which would be too bad. Is that right, or is
> there an other way around these two issues ?
> >
> > Best
> >
> > Timothee
> >
> >
> >
> >
> > 2016-01-08 2:38 GMT+09:00 Barry Smith <bsmith at mcs.anl.gov>:
> >
> >   Timothee,
> >
> >       You are mixing up block Jacobi PCBJACOBI (which in PETSc generally
> uses "big" blocks) and point block Jacobi PCPBJACOBI (which generally means
> all the degrees of freedom associated with a single grid point -- in your
> case 3).
> >
> >    If you are doing matrix free with a shell matrix then you need to
> provide your own MatInvertBlockDiagonal() which in your case would invert
> each of your little 3 by 3 blocks and store the result in a 1d array; each
> little block in column major order followed by the next one. See for
> example MatInvertBlockDiagonal_SeqAIJ(). You also need you matrix to return
> a block size of 3.
> >
> >
> >    Barry
> >
> >
> >
> > > On Jan 7, 2016, at 8:08 AM, Timoth?e Nicolas <
> timothee.nicolas at gmail.com> wrote:
> > >
> > > Ok, so it should be sufficient. Great, I think I can do it.
> > >
> > > Best
> > >
> > > Timoth?e
> > >
> > > 2016-01-07 23:06 GMT+09:00 Matthew Knepley <knepley at gmail.com>:
> > > On Thu, Jan 7, 2016 at 7:49 AM, Timoth?e Nicolas <
> timothee.nicolas at gmail.com> wrote:
> > > Hello everyone,
> > >
> > > I have discovered that I need to use Block Jacobi, rather than Jacobi,
> as a preconditioner/smoother. The linear problem I am solving at this stage
> lives in a subspace with 3 degrees of freedom, which represent the 3
> components of a 3D vector. In particular for multigrid, using BJACOBI
> instead of JACOBI as a smoother changes everything in terms of efficiency.
> I know it because I have tested with the actual matrix in matrix format for
> my problem. However, eventually, I want to be matrix free.
> > >
> > > My question is, what are the operations I need to provide for the
> matrix-free approach to accept BJACOBI ? I am confused because when I try
> to apply BJACOBI to my matrix-free operator; the code asks for
> MatGetDiagonalBlock (see error below). But MatGetDiagonalBlock, in my
> understanding, returns a uniprocessor matrix representing the diagonal part
> of the matrix on this processor (as defined in the manual). Instead, I
> would expect that what is needed is a routine which returns a 3x3 matrix at
> the grid point (that is, the block associated with this grid point,
> coupling the 3 components of the vector together). How does this work ? Do
> I simply need to code MatGetDiagonalBlock ?
> > >
> > > Just like Jacobi does not request one diagonal element at a time,
> Block-Jacobi does not request one diagonal block at a time. You
> > > would need to implement that function, or write a custom block Jacobi
> for this matrix.
> > >
> > >   Thanks,
> > >
> > >     Matt
> > >
> > >
> > > Thx
> > > Best
> > >
> > > Timoth?e
> > >
> > > [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> > > [0]PETSC ERROR: No support for this operation for this object type
> > > [0]PETSC ERROR: Matrix type shell does not support getting diagonal
> block
> > > [0]PETSC ERROR: See
> http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> > > [0]PETSC ERROR: Petsc Release Version 3.6.1, Jul, 22, 2015
> > > [0]PETSC ERROR: ./miips on a arch-linux2-c-debug named Carl-9000 by
> timothee Thu Jan  7 22:41:13 2016
> > > [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++
> --with-fc=gfortran --download-fblaslapack --download-mpich
> > > [0]PETSC ERROR: #1 MatGetDiagonalBlock() line 166 in
> /home/timothee/Documents/petsc-3.6.1/src/mat/interface/matrix.c
> > > [0]PETSC ERROR: #2 PCSetUp_BJacobi() line 126 in
> /home/timothee/Documents/petsc-3.6.1/src/ksp/pc/impls/bjacobi/bjacobi.c
> > > [0]PETSC ERROR: #3 PCSetUp() line 982 in
> /home/timothee/Documents/petsc-3.6.1/src/ksp/pc/interface/precon.c
> > > [0]PETSC ERROR: #4 KSPSetUp() line 332 in
> /home/timothee/Documents/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c
> > > [0]PETSC ERROR: #5 KSPSolve() line 546 in
> /home/timothee/Documents/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c
> > >
> > >
> > >
> > > --
> > > What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> > > -- Norbert Wiener
> > >
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160108/e17466db/attachment-0001.html>

From bsmith at mcs.anl.gov  Thu Jan  7 20:37:03 2016
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Thu, 7 Jan 2016 20:37:03 -0600
Subject: [petsc-users] Block Jacobi for Matrix-free
In-Reply-To: <CAGi1ndT8Z7Ehi=ABVa3xPymM4xmxH6yvs3eG69=1+WusB5aoNQ@mail.gmail.com>
References: <CAGi1ndSxePJNb6Z8eAfE2TcbAaT+QFi-NaJ55n15SRx5gHqYgw@mail.gmail.com>
	<CAMYG4GnKe=euFb-0fN3ninmvnZ4E+pjCSTB3aWr=+OEn7qEeAw@mail.gmail.com>
	<CAGi1ndTQd3tGqajWz6Q9yvxaMF6qttT1A9n3a-s1isFnvbtB3A@mail.gmail.com>
	<360F0D42-75ED-419F-BFD2-6494AA67F9AA@mcs.anl.gov>
	<CAGi1ndSeg9+XvvPU5bqhs3rzZHbeDw9Jv1bv3JETJtjE8_TZ1g@mail.gmail.com>
	<DFE7B916-F9AF-415E-A09C-919D89FDC2B1@mcs.anl.gov>
	<CAGi1ndT8Z7Ehi=ABVa3xPymM4xmxH6yvs3eG69=1+WusB5aoNQ@mail.gmail.com>
Message-ID: <9563B240-921E-4D5D-AA86-E4D07C74362D@mcs.anl.gov>


> On Jan 7, 2016, at 8:31 PM, Timoth?e Nicolas <timothee.nicolas at gmail.com> wrote:
> 
> Ah, I understand, so by allocating this BAIJ in an intelligent way (allocating only the diagonal 3x3 blocks), I can still be basically memory efficient, and use matrix-free formulation for the first matrix in KSPSetOperator, right ?

  Exactly

> 
> Timothee
> 
> 2016-01-08 11:06 GMT+09:00 Barry Smith <bsmith at mcs.anl.gov>:
> 
> > On Jan 7, 2016, at 7:58 PM, Timoth?e Nicolas <timothee.nicolas at gmail.com> wrote:
> >
> > I see, I just tested PCPBJACOBI, which is better than PCJACOBI, but still I may need PCBJACOBI.
> 
>    Note that using PCBJACOBI means you are providing big blocks of the Jacobian. If you do provide big blocks of the Jacobian you might as well just provide the entire Jacobin IMHO.
> 
>    Anyways the easiest way to do either PCPBJACOBI or PCBJACOBI is to explicitly construct the portion of the Jacobian you need, in a AIJ or BAIJ matrix and pass that as the SECOND matrix argument to KSPSetOperator() or SNESSetJacobian() then PETSc will use the piece you provide to build the preconditioner. So for example if you want PBJACOBI you would create a BAIJ matrix with block size 3 and only fill up the 3 by 3 block diagonal with Jacobian entries.
> 
> 
>    Barry
> 
> 
> > The problem is that I don't seem to be allowed to define the matrix operation for MatGetDiagonalBlock... Indeed, I don't find
> >
> > MATOP_GET_DIAGONAL_BLOCK in ${PETSC_DIR}/include/petscmat.h
> >
> > Therefore, when I try to define it, I get the following error at compilation (quite logically)
> >
> > matrices.F90(174): error #6404: This name does not have a type, and must have an explicit type.   [MATOP_GET_DIAGONAL_BLOCK]
> >      call MatShellSetOperation(lctx(1)%PSmat,MATOP_GET_DIAGONAL_BLOCK,PSmatGetDiagonalBlock,ierr)
> > ---------------------------------------------^
> >
> > Also, if I change my mind and instead decide to go for PCPBJACOBI, I still have a problem because the manual says that the routine you talk about, MatInvertBlockDiagonal, is not available from FORTRAN. Indeed I cannot call it. I still cannot call it after I provide a routine corresponding to MATOP_INVERT_BLOCK_DIAGONAL.
> >
> > So, it seems to mean that if I want to use this kind of algorithms, I will have to hard code them, which would be too bad. Is that right, or is there an other way around these two issues ?
> >
> > Best
> >
> > Timothee
> >
> >
> >
> >
> > 2016-01-08 2:38 GMT+09:00 Barry Smith <bsmith at mcs.anl.gov>:
> >
> >   Timothee,
> >
> >       You are mixing up block Jacobi PCBJACOBI (which in PETSc generally uses "big" blocks) and point block Jacobi PCPBJACOBI (which generally means all the degrees of freedom associated with a single grid point -- in your case 3).
> >
> >    If you are doing matrix free with a shell matrix then you need to provide your own MatInvertBlockDiagonal() which in your case would invert each of your little 3 by 3 blocks and store the result in a 1d array; each little block in column major order followed by the next one. See for example MatInvertBlockDiagonal_SeqAIJ(). You also need you matrix to return a block size of 3.
> >
> >
> >    Barry
> >
> >
> >
> > > On Jan 7, 2016, at 8:08 AM, Timoth?e Nicolas <timothee.nicolas at gmail.com> wrote:
> > >
> > > Ok, so it should be sufficient. Great, I think I can do it.
> > >
> > > Best
> > >
> > > Timoth?e
> > >
> > > 2016-01-07 23:06 GMT+09:00 Matthew Knepley <knepley at gmail.com>:
> > > On Thu, Jan 7, 2016 at 7:49 AM, Timoth?e Nicolas <timothee.nicolas at gmail.com> wrote:
> > > Hello everyone,
> > >
> > > I have discovered that I need to use Block Jacobi, rather than Jacobi, as a preconditioner/smoother. The linear problem I am solving at this stage lives in a subspace with 3 degrees of freedom, which represent the 3 components of a 3D vector. In particular for multigrid, using BJACOBI instead of JACOBI as a smoother changes everything in terms of efficiency. I know it because I have tested with the actual matrix in matrix format for my problem. However, eventually, I want to be matrix free.
> > >
> > > My question is, what are the operations I need to provide for the matrix-free approach to accept BJACOBI ? I am confused because when I try to apply BJACOBI to my matrix-free operator; the code asks for MatGetDiagonalBlock (see error below). But MatGetDiagonalBlock, in my understanding, returns a uniprocessor matrix representing the diagonal part of the matrix on this processor (as defined in the manual). Instead, I would expect that what is needed is a routine which returns a 3x3 matrix at the grid point (that is, the block associated with this grid point, coupling the 3 components of the vector together). How does this work ? Do I simply need to code MatGetDiagonalBlock ?
> > >
> > > Just like Jacobi does not request one diagonal element at a time, Block-Jacobi does not request one diagonal block at a time. You
> > > would need to implement that function, or write a custom block Jacobi for this matrix.
> > >
> > >   Thanks,
> > >
> > >     Matt
> > >
> > >
> > > Thx
> > > Best
> > >
> > > Timoth?e
> > >
> > > [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> > > [0]PETSC ERROR: No support for this operation for this object type
> > > [0]PETSC ERROR: Matrix type shell does not support getting diagonal block
> > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> > > [0]PETSC ERROR: Petsc Release Version 3.6.1, Jul, 22, 2015
> > > [0]PETSC ERROR: ./miips on a arch-linux2-c-debug named Carl-9000 by timothee Thu Jan  7 22:41:13 2016
> > > [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack --download-mpich
> > > [0]PETSC ERROR: #1 MatGetDiagonalBlock() line 166 in /home/timothee/Documents/petsc-3.6.1/src/mat/interface/matrix.c
> > > [0]PETSC ERROR: #2 PCSetUp_BJacobi() line 126 in /home/timothee/Documents/petsc-3.6.1/src/ksp/pc/impls/bjacobi/bjacobi.c
> > > [0]PETSC ERROR: #3 PCSetUp() line 982 in /home/timothee/Documents/petsc-3.6.1/src/ksp/pc/interface/precon.c
> > > [0]PETSC ERROR: #4 KSPSetUp() line 332 in /home/timothee/Documents/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c
> > > [0]PETSC ERROR: #5 KSPSolve() line 546 in /home/timothee/Documents/petsc-3.6.1/src/ksp/ksp/interface/itfunc.c
> > >
> > >
> > >
> > > --
> > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> > > -- Norbert Wiener
> > >
> >
> >
> 
> 


From orxan.shibli at gmail.com  Fri Jan  8 07:33:00 2016
From: orxan.shibli at gmail.com (Orxan Shibliyev)
Date: Fri, 8 Jan 2016 15:33:00 +0200
Subject: [petsc-users] blas and lapack directory
Message-ID: <CAOYX9gzaJVc7Zuwi-QKSKYwvaCd+fqfCC=hD_pH+s0RjifKa_g@mail.gmail.com>

I am trying to configure petsc by giving blas and lapack directory myself.
First of all, can I just give the directory of lapack
for --with-blas-lapack-dir since blas is included in lapack. Secondly, I
tried what I said above but I got the message that the folder I provided
cannot be used. I am not sure if this is due to permission rights on
cluster I work on or due to a configure mistake.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160108/6969c24e/attachment.html>

From knepley at gmail.com  Fri Jan  8 07:37:47 2016
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 8 Jan 2016 07:37:47 -0600
Subject: [petsc-users] blas and lapack directory
In-Reply-To: <CAOYX9gzaJVc7Zuwi-QKSKYwvaCd+fqfCC=hD_pH+s0RjifKa_g@mail.gmail.com>
References: <CAOYX9gzaJVc7Zuwi-QKSKYwvaCd+fqfCC=hD_pH+s0RjifKa_g@mail.gmail.com>
Message-ID: <CAMYG4GkiR+rrna0_sNyujxS8_7fZj49=XhWP1oErgein=tQwgw@mail.gmail.com>

For any configure question, you need to send configure.log

   Matt

On Fri, Jan 8, 2016 at 7:33 AM, Orxan Shibliyev <orxan.shibli at gmail.com>
wrote:

> I am trying to configure petsc by giving blas and lapack directory myself.
> First of all, can I just give the directory of lapack
> for --with-blas-lapack-dir since blas is included in lapack. Secondly, I
> tried what I said above but I got the message that the folder I provided
> cannot be used. I am not sure if this is due to permission rights on
> cluster I work on or due to a configure mistake.
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160108/88b005c3/attachment.html>

From mpovolot at purdue.edu  Fri Jan  8 14:28:17 2016
From: mpovolot at purdue.edu (Michael Povolotskyi)
Date: Fri, 08 Jan 2016 15:28:17 -0500
Subject: [petsc-users] question about SNESLINESEARCHBT
Message-ID: <56901BE1.7010605@purdue.edu>

Dear Petsc developers and users,
I solve nonlinear systems with Newton Broyden method with Petsc.
I use line search algorithm of type SNESLINESEARCHBT.
Usually it works, but sometimes diverges. I want to change its 
parameters that are listed in
http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESLINESEARCHBT.html#SNESLINESEARCHBT

Is there any way to set them in the code rather then from the command line?
Thank you,
Michael.

-- 
Michael Povolotskyi, PhD
Research Assistant Professor
Network for Computational Nanotechnology
Hall for Discover and Learning Research, Room 441
West Lafayette, IN 47907
Phone (765) 4949396


From bsmith at mcs.anl.gov  Fri Jan  8 14:35:26 2016
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Fri, 8 Jan 2016 14:35:26 -0600
Subject: [petsc-users] question about SNESLINESEARCHBT
In-Reply-To: <56901BE1.7010605@purdue.edu>
References: <56901BE1.7010605@purdue.edu>
Message-ID: <828B8DC3-81B3-4EB9-9339-A4A834B08DF1@mcs.anl.gov>


  SNESGetLineSearch() then things like SNESLineSearchSetTolerances() SNESLineSearchSetLambda(), SNESLineSearchSetOrder()

  Barry

> On Jan 8, 2016, at 2:28 PM, Michael Povolotskyi <mpovolot at purdue.edu> wrote:
> 
> Dear Petsc developers and users,
> I solve nonlinear systems with Newton Broyden method with Petsc.
> I use line search algorithm of type SNESLINESEARCHBT.
> Usually it works, but sometimes diverges. I want to change its parameters that are listed in
> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESLINESEARCHBT.html#SNESLINESEARCHBT
> 
> Is there any way to set them in the code rather then from the command line?
> Thank you,
> Michael.
> 
> -- 
> Michael Povolotskyi, PhD
> Research Assistant Professor
> Network for Computational Nanotechnology
> Hall for Discover and Learning Research, Room 441
> West Lafayette, IN 47907
> Phone (765) 4949396
> 


From tabrezali at gmail.com  Mon Jan 11 09:41:42 2016
From: tabrezali at gmail.com (Tabrez Ali)
Date: Mon, 11 Jan 2016 09:41:42 -0600
Subject: [petsc-users] METIS without C++ compiler
Message-ID: <5693CD36.10205@gmail.com>

Hello

I just wanted to point that configure fails when "--with-metis=1 
--download-metis=1" options are used and a C++ compiler is not installed.

After changing "project(METIS)" to "project(METIS C)" in 
petsc-3.6.3/arch-linux2-c-opt/externalpackages/metis-5.1.0-p1/CMakeLists.txt 
it works alright.

Not sure if there is another way to suppress the check.

Regards,

Tabrez

*******************************************************************************
          UNABLE to CONFIGURE with GIVEN OPTIONS    (see configure.log 
for details):
-------------------------------------------------------------------------------
Error configuring METIS with cmake Could not execute "cd 
/home/ubuntu/petsc-3.6.3/arch-linux2-c-opt/externalpackages/metis-5.1.0-p1/build 
&& /usr/bin/cmake .. 
-DCMAKE_INSTALL_PREFIX=/home/ubuntu/petsc-3.6.3/arch-linux2-c-opt 
-DCMAKE_VERBOSE_MAKEFILE=1 
-DCMAKE_C_COMPILER="/home/ubuntu/petsc-3.6.3/arch-linux2-c-opt/bin/mpicc" -DCMAKE_AR=/usr/bin/ar 
-DCMAKE_RANLIB=/usr/bin/ranlib -DCMAKE_C_FLAGS:STRING="-fPIC -O" 
-DCMAKE_Fortran_COMPILER="/home/ubuntu/petsc-3.6.3/arch-linux2-c-opt/bin/mpif90" 
-DCMAKE_Fortran_FLAGS:STRING="-fPIC -ffree-line-length-0 -O" 
-DGKLIB_PATH=../GKlib -DSHARED=1 -DMETIS_USE_DOUBLEPRECISION=1":
-- The C compiler identification is GNU 4.8.4
-- The CXX compiler identification is unknown
-- Check for working C compiler: 
/home/ubuntu/petsc-3.6.3/arch-linux2-c-opt/bin/mpicc
-- Check for working C compiler: 
/home/ubuntu/petsc-3.6.3/arch-linux2-c-opt/bin/mpicc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Looking for execinfo.h
-- Looking for execinfo.h - found
-- Looking for getline
-- Looking for getline - found
-- Performing Test HAVE__thread
-- Performing Test HAVE__thread - Success
-- checking for __thread thread-local storage - found
-- Configuring incomplete, errors occurred!
See also 
"/home/ubuntu/petsc-3.6.3/arch-linux2-c-opt/externalpackages/metis-5.1.0-p1/build/CMakeFiles/CMakeOutput.log".
See also 
"/home/ubuntu/petsc-3.6.3/arch-linux2-c-opt/externalpackages/metis-5.1.0-p1/build/CMakeFiles/CMakeError.log".CMake 
Error: your CXX compiler: "CMAKE_CXX_COMPILER-NOTFOUND" was not found.   
Please set CMAKE_CXX_COMPILER to a valid compiler path or name.
*******************************************************************************


From Shuangshuang.Jin at pnnl.gov  Mon Jan 11 13:15:27 2016
From: Shuangshuang.Jin at pnnl.gov (Jin, Shuangshuang)
Date: Mon, 11 Jan 2016 19:15:27 +0000
Subject: [petsc-users] PETSC_i
Message-ID: <71FF54182841B443932BB8F835FD98531A59ACC6@EX10MBOX02.pnnl.gov>

Hi, I have the following codes (The version of Petsc installed on my machine has PetscScalar set to be complex number):

double ival_re, ival_im;
PetscScalar val;
...
val  = ival_re + PETSC_i * ival_im;

I got a compilation error below:

error: cannot convert 'std::complex<double>' to 'PetscScalar {aka double}' in assignment

Even if I set "val = 1.0 * PETSC_i;", the error stays the same.

Can anyone help to evaluate the problem?

Thanks,
Shuangshuang

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160111/679646a3/attachment.html>

From jed at jedbrown.org  Mon Jan 11 13:18:26 2016
From: jed at jedbrown.org (Jed Brown)
Date: Mon, 11 Jan 2016 12:18:26 -0700
Subject: [petsc-users] PETSC_i
In-Reply-To: <71FF54182841B443932BB8F835FD98531A59ACC6@EX10MBOX02.pnnl.gov>
References: <71FF54182841B443932BB8F835FD98531A59ACC6@EX10MBOX02.pnnl.gov>
Message-ID: <87ziwbx6v1.fsf@jedbrown.org>

"Jin, Shuangshuang" <Shuangshuang.Jin at pnnl.gov> writes:

> Hi, I have the following codes (The version of Petsc installed on my machine has PetscScalar set to be complex number):

Looks like you're trying to compile with a different PETSc.  Check PETSC_DIR and PETSC_ARCH.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 818 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160111/6de601be/attachment.pgp>

From Shuangshuang.Jin at pnnl.gov  Mon Jan 11 13:26:16 2016
From: Shuangshuang.Jin at pnnl.gov (Jin, Shuangshuang)
Date: Mon, 11 Jan 2016 19:26:16 +0000
Subject: [petsc-users] PETSC_i
In-Reply-To: <87ziwbx6v1.fsf@jedbrown.org>
References: <71FF54182841B443932BB8F835FD98531A59ACC6@EX10MBOX02.pnnl.gov>
	<87ziwbx6v1.fsf@jedbrown.org>
Message-ID: <71FF54182841B443932BB8F835FD98531A59ACE3@EX10MBOX02.pnnl.gov>

I didn't see anything wrong with my PETSC_DIR and PETSC_ARCH.

Please see my setup below:

setenv PETSC_DIR /pic/projects/software_new/petsc-3.6.0
setenv PETSC_ARCH linux-openmpi-gnu-cxx-complex-opt

Thanks,
Shuangshuang

-----Original Message-----
From: Jed Brown [mailto:jed at jedbrown.org] 
Sent: Monday, January 11, 2016 11:18 AM
To: Jin, Shuangshuang; petsc-users at mcs.anl.gov
Subject: Re: [petsc-users] PETSC_i

"Jin, Shuangshuang" <Shuangshuang.Jin at pnnl.gov> writes:

> Hi, I have the following codes (The version of Petsc installed on my machine has PetscScalar set to be complex number):

Looks like you're trying to compile with a different PETSc.  Check PETSC_DIR and PETSC_ARCH.

From balay at mcs.anl.gov  Mon Jan 11 13:48:29 2016
From: balay at mcs.anl.gov (Satish Balay)
Date: Mon, 11 Jan 2016 13:48:29 -0600
Subject: [petsc-users] PETSC_i
In-Reply-To: <71FF54182841B443932BB8F835FD98531A59ACE3@EX10MBOX02.pnnl.gov>
References: <71FF54182841B443932BB8F835FD98531A59ACC6@EX10MBOX02.pnnl.gov>
	<87ziwbx6v1.fsf@jedbrown.org>
	<71FF54182841B443932BB8F835FD98531A59ACE3@EX10MBOX02.pnnl.gov>
Message-ID: <alpine.LFD.2.20.1601111341010.16423@asterix>

Are you sure its a complex build? Please send us configure.log or
make.log for this build.

Also send us test.log for this build.

Satish

On Mon, 11 Jan 2016, Jin, Shuangshuang wrote:

> I didn't see anything wrong with my PETSC_DIR and PETSC_ARCH.
> 
> Please see my setup below:
> 
> setenv PETSC_DIR /pic/projects/software_new/petsc-3.6.0
> setenv PETSC_ARCH linux-openmpi-gnu-cxx-complex-opt
> 
> Thanks,
> Shuangshuang
> 
> -----Original Message-----
> From: Jed Brown [mailto:jed at jedbrown.org] 
> Sent: Monday, January 11, 2016 11:18 AM
> To: Jin, Shuangshuang; petsc-users at mcs.anl.gov
> Subject: Re: [petsc-users] PETSC_i
> 
> "Jin, Shuangshuang" <Shuangshuang.Jin at pnnl.gov> writes:
> 
> > Hi, I have the following codes (The version of Petsc installed on my machine has PetscScalar set to be complex number):
> 
> Looks like you're trying to compile with a different PETSc.  Check PETSC_DIR and PETSC_ARCH.
> 


From bsmith at mcs.anl.gov  Mon Jan 11 13:50:33 2016
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Mon, 11 Jan 2016 13:50:33 -0600
Subject: [petsc-users] PETSC_i
In-Reply-To: <71FF54182841B443932BB8F835FD98531A59ACC6@EX10MBOX02.pnnl.gov>
References: <71FF54182841B443932BB8F835FD98531A59ACC6@EX10MBOX02.pnnl.gov>
Message-ID: <99A40BEA-136F-426D-A4E4-9969304C85A5@mcs.anl.gov>


> On Jan 11, 2016, at 1:15 PM, Jin, Shuangshuang <Shuangshuang.Jin at pnnl.gov> wrote:
> 
> Hi, I have the following codes (The version of Petsc installed on my machine has PetscScalar set to be complex number):
>  
> double ival_re, ival_im;
> PetscScalar val;
> ?
> val  = ival_re + PETSC_i * ival_im;
>  
> I got a compilation error below: 
>  
> error: cannot convert 'std::complex<double>' to 'PetscScalar {aka double}' in assignment

   For sure something is wrong. It definitely believes that PetscScalar is a double when it will be std::complex<double> if all the ducks are in order. Did/does make test work after you installed PETSc?

  Barry

>  
> Even if I set ?val = 1.0 * PETSC_i;?, the error stays the same.
>  
> Can anyone help to evaluate the problem?
>  
> Thanks,
> Shuangshuang


From Shuangshuang.Jin at pnnl.gov  Mon Jan 11 14:59:00 2016
From: Shuangshuang.Jin at pnnl.gov (Jin, Shuangshuang)
Date: Mon, 11 Jan 2016 20:59:00 +0000
Subject: [petsc-users] PETSC_i
In-Reply-To: <99A40BEA-136F-426D-A4E4-9969304C85A5@mcs.anl.gov>
References: <71FF54182841B443932BB8F835FD98531A59ACC6@EX10MBOX02.pnnl.gov>
	<99A40BEA-136F-426D-A4E4-9969304C85A5@mcs.anl.gov>
Message-ID: <71FF54182841B443932BB8F835FD98531A59AD2C@EX10MBOX02.pnnl.gov>

Thanks, I reinstalled the PETSc to make sure the PetscScalar type is complex, and it works fine now.

Shuangshuang

-----Original Message-----
From: Barry Smith [mailto:bsmith at mcs.anl.gov] 
Sent: Monday, January 11, 2016 11:51 AM
To: Jin, Shuangshuang
Cc: petsc-users at mcs.anl.gov
Subject: Re: [petsc-users] PETSC_i


> On Jan 11, 2016, at 1:15 PM, Jin, Shuangshuang <Shuangshuang.Jin at pnnl.gov> wrote:
> 
> Hi, I have the following codes (The version of Petsc installed on my machine has PetscScalar set to be complex number):
>  
> double ival_re, ival_im;
> PetscScalar val;
> ?
> val  = ival_re + PETSC_i * ival_im;
>  
> I got a compilation error below: 
>  
> error: cannot convert 'std::complex<double>' to 'PetscScalar {aka double}' in assignment

   For sure something is wrong. It definitely believes that PetscScalar is a double when it will be std::complex<double> if all the ducks are in order. Did/does make test work after you installed PETSc?

  Barry

>  
> Even if I set ?val = 1.0 * PETSC_i;?, the error stays the same.
>  
> Can anyone help to evaluate the problem?
>  
> Thanks,
> Shuangshuang


From gideon.simpson at gmail.com  Mon Jan 11 15:26:35 2016
From: gideon.simpson at gmail.com (Gideon Simpson)
Date: Mon, 11 Jan 2016 16:26:35 -0500
Subject: [petsc-users] SNES norm control
Message-ID: <12AEF269-EF0A-4330-8924-0B3A02C234BB@gmail.com>

I?m solving nonlinear problem for a complex valued function which is decomposed into real and imaginary parts, Q = u + i v.  What I?m finding is that where |Q| is small, the numerical phase errors tend to be larger.  I suspect this is because it?s using the 2-norm for convergence in the SNES, so, where the solution is already, the phase errors are seen as small too.  Is there a way to use something more like an infinity norm with SNES, to get more point wise control?

-gideon

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160111/e312a3f0/attachment.html>

From ckhuangf at gmail.com  Mon Jan 11 20:53:43 2016
From: ckhuangf at gmail.com (Chung-Kan Huang)
Date: Mon, 11 Jan 2016 20:53:43 -0600
Subject: [petsc-users] KSPConvergedReason = KSP_CONVERGED_ITERATING
Message-ID: <CABRkuZiuKMUDyRr7nyCS1eLSBuaQVJmr8g5m=e4A9eib9eYJJA@mail.gmail.com>

Hi,

I am encountering KSPSolve hanging with one process finished
KSPSolve reporting KSPConvergedReason = KSP_CONVERGED_ITERATING while other
processes stuck in KSPSolve.

The problem is not seen when code was compiled in debug mode and problem
only appears after more than 10 hours of run time with production mode.

Can anyone suggest how I can do to debug this case?

Thanks,

Ken
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160111/fad1df64/attachment.html>

From knepley at gmail.com  Mon Jan 11 21:03:32 2016
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 11 Jan 2016 21:03:32 -0600
Subject: [petsc-users] KSPConvergedReason = KSP_CONVERGED_ITERATING
In-Reply-To: <CABRkuZiuKMUDyRr7nyCS1eLSBuaQVJmr8g5m=e4A9eib9eYJJA@mail.gmail.com>
References: <CABRkuZiuKMUDyRr7nyCS1eLSBuaQVJmr8g5m=e4A9eib9eYJJA@mail.gmail.com>
Message-ID: <CAMYG4Gm5_O_wRB-TF2yAYTMkGUOC+xoSJB5Yiby585BwcKgdTA@mail.gmail.com>

On Mon, Jan 11, 2016 at 8:53 PM, Chung-Kan Huang <ckhuangf at gmail.com> wrote:

>
> Hi,
>
> I am encountering KSPSolve hanging with one process finished
> KSPSolve reporting KSPConvergedReason = KSP_CONVERGED_ITERATING while other
> processes stuck in KSPSolve.
>
> The problem is not seen when code was compiled in debug mode and problem
> only appears after more than 10 hours of run time with production mode.
>
> Can anyone suggest how I can do to debug this case?
>

Can you send the complete solver being used? Something like the output of
-ksp_view. Are you using a custom PC?

  Matt


> Thanks,
>
> Ken
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160111/73d9fb7f/attachment-0001.html>

From bsmith at mcs.anl.gov  Mon Jan 11 22:45:19 2016
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Mon, 11 Jan 2016 22:45:19 -0600
Subject: [petsc-users] KSPConvergedReason = KSP_CONVERGED_ITERATING
In-Reply-To: <CABRkuZiuKMUDyRr7nyCS1eLSBuaQVJmr8g5m=e4A9eib9eYJJA@mail.gmail.com>
References: <CABRkuZiuKMUDyRr7nyCS1eLSBuaQVJmr8g5m=e4A9eib9eYJJA@mail.gmail.com>
Message-ID: <14100ECD-1510-4DD5-8580-3298948504A5@mcs.anl.gov>


   Hmm, KSPSolve() should never complete with a KSP_CONVERGED_ITERATING so something is definitely not going well.

   Have you run your code with valgrind to make sure there is not some subtle memory bug that only rears its ugly head after a great deal of time? http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind  I would do this first.

    It is possible to use the -g flag even with optimized builds to get debug symbols even with optimization (in fact that is our new default) so depending on the machine you are running on and how many MPI processes you use it could be possible to simply run the run in the debugger (-start_in_debugger) and then come back the next day when one process returns and the others hang then control c the other processes and see where they are in the code.

Barry


> On Jan 11, 2016, at 8:53 PM, Chung-Kan Huang <ckhuangf at gmail.com> wrote:
> 
> 
> Hi,
> 
> I am encountering KSPSolve hanging with one process finished KSPSolve reporting KSPConvergedReason = KSP_CONVERGED_ITERATING while other processes stuck in KSPSolve. 
> 
> The problem is not seen when code was compiled in debug mode and problem only appears after more than 10 hours of run time with production mode.
> 
> Can anyone suggest how I can do to debug this case?
> 
> Thanks,
> 
> Ken
> 


From bsmith at mcs.anl.gov  Mon Jan 11 23:14:49 2016
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Mon, 11 Jan 2016 23:14:49 -0600
Subject: [petsc-users] SNES norm control
In-Reply-To: <12AEF269-EF0A-4330-8924-0B3A02C234BB@gmail.com>
References: <12AEF269-EF0A-4330-8924-0B3A02C234BB@gmail.com>
Message-ID: <51EEA1EB-19FE-4F69-A4E9-EA8FCE323F42@mcs.anl.gov>


   You can use SNESSetConvergenceTest() to use whatever test you want to decide on convergence.

Barry

> On Jan 11, 2016, at 3:26 PM, Gideon Simpson <gideon.simpson at gmail.com> wrote:
> 
> I?m solving nonlinear problem for a complex valued function which is decomposed into real and imaginary parts, Q = u + i v.  What I?m finding is that where |Q| is small, the numerical phase errors tend to be larger.  I suspect this is because it?s using the 2-norm for convergence in the SNES, so, where the solution is already, the phase errors are seen as small too.  Is there a way to use something more like an infinity norm with SNES, to get more point wise control?
> 
> -gideon
> 


From jed at jedbrown.org  Tue Jan 12 00:04:51 2016
From: jed at jedbrown.org (Jed Brown)
Date: Mon, 11 Jan 2016 23:04:51 -0700
Subject: [petsc-users] METIS without C++ compiler
In-Reply-To: <5693CD36.10205@gmail.com>
References: <5693CD36.10205@gmail.com>
Message-ID: <87k2nfwcxo.fsf@jedbrown.org>

Tabrez Ali <tabrezali at gmail.com> writes:

> Hello
>
> I just wanted to point that configure fails when "--with-metis=1 
> --download-metis=1" options are used and a C++ compiler is not installed.
>
> After changing "project(METIS)" to "project(METIS C)" in 
> petsc-3.6.3/arch-linux2-c-opt/externalpackages/metis-5.1.0-p1/CMakeLists.txt 
> it works alright.

Thanks, it looks like ParMETIS needs this too.  I've pushed this change
to the repositories for each.  Satish can bump the patch number and
point PETSc to the new version.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 818 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160111/5ebef33c/attachment.pgp>

From gideon.simpson at gmail.com  Tue Jan 12 07:14:53 2016
From: gideon.simpson at gmail.com (Gideon Simpson)
Date: Tue, 12 Jan 2016 08:14:53 -0500
Subject: [petsc-users] SNES norm control
In-Reply-To: <51EEA1EB-19FE-4F69-A4E9-EA8FCE323F42@mcs.anl.gov>
References: <12AEF269-EF0A-4330-8924-0B3A02C234BB@gmail.com>
	<51EEA1EB-19FE-4F69-A4E9-EA8FCE323F42@mcs.anl.gov>
Message-ID: <4C6FAD00-73E1-4435-940D-6482AB5DDA54@gmail.com>

That seems to to allow for me to cook up a convergence test in terms of the 2 norm.  What I?m really looking for is the ability to change things to be something like the 2 norm of the vector with elements

F_i/|x_i|

where I am looking for a root of F(x).  I can just build that scaling into the form function, but is there a way to do it without rewriting that piece of the code?


-gideon

> On Jan 12, 2016, at 12:14 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> 
> 
>   You can use SNESSetConvergenceTest() to use whatever test you want to decide on convergence.
> 
> Barry
> 
>> On Jan 11, 2016, at 3:26 PM, Gideon Simpson <gideon.simpson at gmail.com> wrote:
>> 
>> I?m solving nonlinear problem for a complex valued function which is decomposed into real and imaginary parts, Q = u + i v.  What I?m finding is that where |Q| is small, the numerical phase errors tend to be larger.  I suspect this is because it?s using the 2-norm for convergence in the SNES, so, where the solution is already, the phase errors are seen as small too.  Is there a way to use something more like an infinity norm with SNES, to get more point wise control?
>> 
>> -gideon
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160112/92f68ad8/attachment.html>

From bsmith at mcs.anl.gov  Tue Jan 12 07:24:13 2016
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Tue, 12 Jan 2016 07:24:13 -0600
Subject: [petsc-users] SNES norm control
In-Reply-To: <4C6FAD00-73E1-4435-940D-6482AB5DDA54@gmail.com>
References: <12AEF269-EF0A-4330-8924-0B3A02C234BB@gmail.com>
	<51EEA1EB-19FE-4F69-A4E9-EA8FCE323F42@mcs.anl.gov>
	<4C6FAD00-73E1-4435-940D-6482AB5DDA54@gmail.com>
Message-ID: <5B2AC30A-FF15-4FA5-B0E1-6FA5E5975FF2@mcs.anl.gov>


> On Jan 12, 2016, at 7:14 AM, Gideon Simpson <gideon.simpson at gmail.com> wrote:
> 
> That seems to to allow for me to cook up a convergence test in terms of the 2 norm.  

  No, why just the two norm? You can put whatever tests you want into your convergence test, including looking at F_i/|x_i| if you want.  You need to call SNESGetSolution() and SNESGetFunction() from within your test routine to get the vectors you want to look at.


   Barry

> What I?m really looking for is the ability to change things to be something like the 2 norm of the vector with elements
> 
> F_i/|x_i|
> 
> where I am looking for a root of F(x).  I can just build that scaling into the form function, but is there a way to do it without rewriting that piece of the code?
> 
> 
> -gideon
> 
>> On Jan 12, 2016, at 12:14 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>> 
>> 
>>   You can use SNESSetConvergenceTest() to use whatever test you want to decide on convergence.
>> 
>> Barry
>> 
>>> On Jan 11, 2016, at 3:26 PM, Gideon Simpson <gideon.simpson at gmail.com> wrote:
>>> 
>>> I?m solving nonlinear problem for a complex valued function which is decomposed into real and imaginary parts, Q = u + i v.  What I?m finding is that where |Q| is small, the numerical phase errors tend to be larger.  I suspect this is because it?s using the 2-norm for convergence in the SNES, so, where the solution is already, the phase errors are seen as small too.  Is there a way to use something more like an infinity norm with SNES, to get more point wise control?
>>> 
>>> -gideon
>>> 
>> 
> 


From dave.mayhem23 at gmail.com  Tue Jan 12 07:24:29 2016
From: dave.mayhem23 at gmail.com (Dave May)
Date: Tue, 12 Jan 2016 14:24:29 +0100
Subject: [petsc-users] SNES norm control
In-Reply-To: <4C6FAD00-73E1-4435-940D-6482AB5DDA54@gmail.com>
References: <12AEF269-EF0A-4330-8924-0B3A02C234BB@gmail.com>
	<51EEA1EB-19FE-4F69-A4E9-EA8FCE323F42@mcs.anl.gov>
	<4C6FAD00-73E1-4435-940D-6482AB5DDA54@gmail.com>
Message-ID: <CAJ98EDpyrkfivp-p2KfaH_NsXvOCeJ8=ywYsbGXbgiAZzZvpxA@mail.gmail.com>

On 12 January 2016 at 14:14, Gideon Simpson <gideon.simpson at gmail.com>
wrote:

> That seems to to allow for me to cook up a convergence test in terms of
> the 2 norm.
>

While you are only provided the 2 norm of F, you are also given access to
the SNES object. Thus inside your user convergence test function, you can
call SNESGetFunction() and SNESGetSolution(), then you can compute your
convergence criteria and set the converged reason to what ever you want.

See

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetFunction.html

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetSolution.html

Cheers,
  Dave


> What I?m really looking for is the ability to change things to be
> something like the 2 norm of the vector with elements
>
> F_i/|x_i|
>
> where I am looking for a root of F(x).  I can just build that scaling into
> the form function, but is there a way to do it without rewriting that piece
> of the code?
>
>
> -gideon
>
> On Jan 12, 2016, at 12:14 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
>
>   You can use SNESSetConvergenceTest() to use whatever test you want to
> decide on convergence.
>
> Barry
>
> On Jan 11, 2016, at 3:26 PM, Gideon Simpson <gideon.simpson at gmail.com>
> wrote:
>
> I?m solving nonlinear problem for a complex valued function which is
> decomposed into real and imaginary parts, Q = u + i v.  What I?m finding is
> that where |Q| is small, the numerical phase errors tend to be larger.  I
> suspect this is because it?s using the 2-norm for convergence in the SNES,
> so, where the solution is already, the phase errors are seen as small too.
> Is there a way to use something more like an infinity norm with SNES, to
> get more point wise control?
>
> -gideon
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160112/0486fb2e/attachment.html>

From gideon.simpson at gmail.com  Tue Jan 12 07:33:00 2016
From: gideon.simpson at gmail.com (Gideon Simpson)
Date: Tue, 12 Jan 2016 08:33:00 -0500
Subject: [petsc-users] SNES norm control
In-Reply-To: <CAJ98EDpyrkfivp-p2KfaH_NsXvOCeJ8=ywYsbGXbgiAZzZvpxA@mail.gmail.com>
References: <12AEF269-EF0A-4330-8924-0B3A02C234BB@gmail.com>
	<51EEA1EB-19FE-4F69-A4E9-EA8FCE323F42@mcs.anl.gov>
	<4C6FAD00-73E1-4435-940D-6482AB5DDA54@gmail.com>
	<CAJ98EDpyrkfivp-p2KfaH_NsXvOCeJ8=ywYsbGXbgiAZzZvpxA@mail.gmail.com>
Message-ID: <6D8EB106-8809-409B-BC61-BEEBEF8AF820@gmail.com>

I?m just a bit confused by the documentation for SNESConvergenceTestFunction.  the arguments for the xnorm, gnorm, and f are passed in, at the current iterate, correct?  I interpreted this as though I had to build by convergence test based on those values.

-gideon

> On Jan 12, 2016, at 8:24 AM, Dave May <dave.mayhem23 at gmail.com> wrote:
> 
> 
> 
> On 12 January 2016 at 14:14, Gideon Simpson <gideon.simpson at gmail.com <mailto:gideon.simpson at gmail.com>> wrote:
> That seems to to allow for me to cook up a convergence test in terms of the 2 norm.  
> 
> While you are only provided the 2 norm of F, you are also given access to the SNES object. Thus inside your user convergence test function, you can call SNESGetFunction() and SNESGetSolution(), then you can compute your convergence criteria and set the converged reason to what ever you want.
> 
> See 
> 
> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetFunction.html <http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetFunction.html>
> 
> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetSolution.html <http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetSolution.html>
> 
> Cheers,
>   Dave
> 
> 
> 
>  
> What I?m really looking for is the ability to change things to be something like the 2 norm of the vector with elements
> 
> F_i/|x_i|
> 
> where I am looking for a root of F(x).  I can just build that scaling into the form function, but is there a way to do it without rewriting that piece of the code?
> 
> 
> -gideon
> 
>> On Jan 12, 2016, at 12:14 AM, Barry Smith <bsmith at mcs.anl.gov <mailto:bsmith at mcs.anl.gov>> wrote:
>> 
>> 
>>   You can use SNESSetConvergenceTest() to use whatever test you want to decide on convergence.
>> 
>> Barry
>> 
>>> On Jan 11, 2016, at 3:26 PM, Gideon Simpson <gideon.simpson at gmail.com <mailto:gideon.simpson at gmail.com>> wrote:
>>> 
>>> I?m solving nonlinear problem for a complex valued function which is decomposed into real and imaginary parts, Q = u + i v.  What I?m finding is that where |Q| is small, the numerical phase errors tend to be larger.  I suspect this is because it?s using the 2-norm for convergence in the SNES, so, where the solution is already, the phase errors are seen as small too.  Is there a way to use something more like an infinity norm with SNES, to get more point wise control?
>>> 
>>> -gideon
>>> 
>> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160112/eb73114e/attachment-0001.html>

From dave.mayhem23 at gmail.com  Tue Jan 12 07:37:16 2016
From: dave.mayhem23 at gmail.com (Dave May)
Date: Tue, 12 Jan 2016 14:37:16 +0100
Subject: [petsc-users] SNES norm control
In-Reply-To: <6D8EB106-8809-409B-BC61-BEEBEF8AF820@gmail.com>
References: <12AEF269-EF0A-4330-8924-0B3A02C234BB@gmail.com>
	<51EEA1EB-19FE-4F69-A4E9-EA8FCE323F42@mcs.anl.gov>
	<4C6FAD00-73E1-4435-940D-6482AB5DDA54@gmail.com>
	<CAJ98EDpyrkfivp-p2KfaH_NsXvOCeJ8=ywYsbGXbgiAZzZvpxA@mail.gmail.com>
	<6D8EB106-8809-409B-BC61-BEEBEF8AF820@gmail.com>
Message-ID: <CAJ98EDroBWBDaywxfy4_FGFoR4LF3VBhD5-vWqfn8QyyHesJKQ@mail.gmail.com>

On 12 January 2016 at 14:33, Gideon Simpson <gideon.simpson at gmail.com>
wrote:

> I?m just a bit confused by the documentation
> for SNESConvergenceTestFunction.  the arguments for the xnorm, gnorm, and f
> are passed in, at the current iterate, correct?
>

Yes, but nothing requires you to use them :D


>  I interpreted this as though I had to build by convergence test based on
> those values.
>

This is a misinterpretation. You can ignore all of xnorm, gnorm and fnorm
and define any crazy stopping condition you like.

xnorm, gnorm and fnorm are commonly required for many stopping conditions
and are computed by the snes methods. As such, are readily available and
for efficiency and convenience they are provided to the user (e.g. to avoid
you having to re-compute norms).

Cheers,
  Dave


>
> -gideon
>
> On Jan 12, 2016, at 8:24 AM, Dave May <dave.mayhem23 at gmail.com> wrote:
>
>
>
> On 12 January 2016 at 14:14, Gideon Simpson <gideon.simpson at gmail.com>
> wrote:
>
>> That seems to to allow for me to cook up a convergence test in terms of
>> the 2 norm.
>>
>
> While you are only provided the 2 norm of F, you are also given access to
> the SNES object. Thus inside your user convergence test function, you can
> call SNESGetFunction() and SNESGetSolution(), then you can compute your
> convergence criteria and set the converged reason to what ever you want.
>
> See
>
>
> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetFunction.html
>
>
> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetSolution.html
>
> Cheers,
>   Dave
>
>
>
>
>
>> What I?m really looking for is the ability to change things to be
>> something like the 2 norm of the vector with elements
>>
>> F_i/|x_i|
>>
>> where I am looking for a root of F(x).  I can just build that scaling
>> into the form function, but is there a way to do it without rewriting that
>> piece of the code?
>>
>>
>> -gideon
>>
>> On Jan 12, 2016, at 12:14 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>
>>
>>   You can use SNESSetConvergenceTest() to use whatever test you want to
>> decide on convergence.
>>
>> Barry
>>
>> On Jan 11, 2016, at 3:26 PM, Gideon Simpson <gideon.simpson at gmail.com>
>> wrote:
>>
>> I?m solving nonlinear problem for a complex valued function which is
>> decomposed into real and imaginary parts, Q = u + i v.  What I?m finding is
>> that where |Q| is small, the numerical phase errors tend to be larger.  I
>> suspect this is because it?s using the 2-norm for convergence in the SNES,
>> so, where the solution is already, the phase errors are seen as small too.
>> Is there a way to use something more like an infinity norm with SNES, to
>> get more point wise control?
>>
>> -gideon
>>
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160112/5f0ec4ae/attachment.html>

From gideon.simpson at gmail.com  Tue Jan 12 08:06:38 2016
From: gideon.simpson at gmail.com (gideon.simpson at gmail.com)
Date: Tue, 12 Jan 2016 09:06:38 -0500
Subject: [petsc-users] SNES norm control
In-Reply-To: <CAJ98EDroBWBDaywxfy4_FGFoR4LF3VBhD5-vWqfn8QyyHesJKQ@mail.gmail.com>
References: <12AEF269-EF0A-4330-8924-0B3A02C234BB@gmail.com>
	<51EEA1EB-19FE-4F69-A4E9-EA8FCE323F42@mcs.anl.gov>
	<4C6FAD00-73E1-4435-940D-6482AB5DDA54@gmail.com>
	<CAJ98EDpyrkfivp-p2KfaH_NsXvOCeJ8=ywYsbGXbgiAZzZvpxA@mail.gmail.com>
	<6D8EB106-8809-409B-BC61-BEEBEF8AF820@gmail.com>
	<CAJ98EDroBWBDaywxfy4_FGFoR4LF3VBhD5-vWqfn8QyyHesJKQ@mail.gmail.com>
Message-ID: <F8384604-C48F-48F5-8A81-3D019BFBE4A8@gmail.com>

Do I have to manually code in the divergence criteria too?

> On Jan 12, 2016, at 8:37 AM, Dave May <dave.mayhem23 at gmail.com> wrote:
> 
> 
> 
>> On 12 January 2016 at 14:33, Gideon Simpson <gideon.simpson at gmail.com> wrote:
>> I?m just a bit confused by the documentation for SNESConvergenceTestFunction.  the arguments for the xnorm, gnorm, and f are passed in, at the current iterate, correct?
> 
> Yes, but nothing requires you to use them :D
>  
>>  I interpreted this as though I had to build by convergence test based on those values.
> 
> This is a misinterpretation. You can ignore all of xnorm, gnorm and fnorm and define any crazy stopping condition you like. 
> 
> xnorm, gnorm and fnorm are commonly required for many stopping conditions and are computed by the snes methods. As such, are readily available and for efficiency and convenience they are provided to the user (e.g. to avoid you having to re-compute norms).
> 
> Cheers,
>   Dave
>  
>> 
>> -gideon
>> 
>>> On Jan 12, 2016, at 8:24 AM, Dave May <dave.mayhem23 at gmail.com> wrote:
>>> 
>>> 
>>> 
>>> On 12 January 2016 at 14:14, Gideon Simpson <gideon.simpson at gmail.com> wrote:
>>>> That seems to to allow for me to cook up a convergence test in terms of the 2 norm.  
>>> 
>>> While you are only provided the 2 norm of F, you are also given access to the SNES object. Thus inside your user convergence test function, you can call SNESGetFunction() and SNESGetSolution(), then you can compute your convergence criteria and set the converged reason to what ever you want.
>>> 
>>> See 
>>> 
>>> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetFunction.html
>>> 
>>> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetSolution.html
>>> 
>>> Cheers,
>>>   Dave
>>> 
>>> 
>>> 
>>>  
>>>> What I?m really looking for is the ability to change things to be something like the 2 norm of the vector with elements
>>>> 
>>>> F_i/|x_i|
>>>> 
>>>> where I am looking for a root of F(x).  I can just build that scaling into the form function, but is there a way to do it without rewriting that piece of the code?
>>>> 
>>>> 
>>>> -gideon
>>>> 
>>>>> On Jan 12, 2016, at 12:14 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>>> 
>>>>> 
>>>>>   You can use SNESSetConvergenceTest() to use whatever test you want to decide on convergence.
>>>>> 
>>>>> Barry
>>>>> 
>>>>>> On Jan 11, 2016, at 3:26 PM, Gideon Simpson <gideon.simpson at gmail.com> wrote:
>>>>>> 
>>>>>> I?m solving nonlinear problem for a complex valued function which is decomposed into real and imaginary parts, Q = u + i v.  What I?m finding is that where |Q| is small, the numerical phase errors tend to be larger.  I suspect this is because it?s using the 2-norm for convergence in the SNES, so, where the solution is already, the phase errors are seen as small too.  Is there a way to use something more like an infinity norm with SNES, to get more point wise control?
>>>>>> 
>>>>>> -gideon
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160112/9ed75938/attachment.html>

From dave.mayhem23 at gmail.com  Tue Jan 12 08:17:23 2016
From: dave.mayhem23 at gmail.com (Dave May)
Date: Tue, 12 Jan 2016 15:17:23 +0100
Subject: [petsc-users] SNES norm control
In-Reply-To: <F8384604-C48F-48F5-8A81-3D019BFBE4A8@gmail.com>
References: <12AEF269-EF0A-4330-8924-0B3A02C234BB@gmail.com>
	<51EEA1EB-19FE-4F69-A4E9-EA8FCE323F42@mcs.anl.gov>
	<4C6FAD00-73E1-4435-940D-6482AB5DDA54@gmail.com>
	<CAJ98EDpyrkfivp-p2KfaH_NsXvOCeJ8=ywYsbGXbgiAZzZvpxA@mail.gmail.com>
	<6D8EB106-8809-409B-BC61-BEEBEF8AF820@gmail.com>
	<CAJ98EDroBWBDaywxfy4_FGFoR4LF3VBhD5-vWqfn8QyyHesJKQ@mail.gmail.com>
	<F8384604-C48F-48F5-8A81-3D019BFBE4A8@gmail.com>
Message-ID: <CAJ98EDqLsprnB6vPagfLyXtFRqBLnOZf-Jeb+BEA_Jn-3z9-xg@mail.gmail.com>

On 12 January 2016 at 15:06, <gideon.simpson at gmail.com> wrote:

> Do I have to manually code in the divergence criteria too?
>

Yes.

By calling SNESSetConvergenceTest() you are replacing the default SNES
convergence test function which will get called at each SNES iteration,
therefore you are responsible for defining all reasons for convergence and
divergence.

To make life easy, you could copy everything in the funciton
SNESConvergedDefault(),

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESConvergedDefault.html#SNESConvergedDefault

and just replace the rule for
  SNES_CONVERGED_FNORM_RELATIVE
with your custom scaled stopping condition.


>
> On Jan 12, 2016, at 8:37 AM, Dave May <dave.mayhem23 at gmail.com> wrote:
>
>
>
> On 12 January 2016 at 14:33, Gideon Simpson <gideon.simpson at gmail.com>
> wrote:
>
>> I?m just a bit confused by the documentation
>> for SNESConvergenceTestFunction.  the arguments for the xnorm, gnorm, and f
>> are passed in, at the current iterate, correct?
>>
>
> Yes, but nothing requires you to use them :D
>
>
>>  I interpreted this as though I had to build by convergence test based on
>> those values.
>>
>
> This is a misinterpretation. You can ignore all of xnorm, gnorm and fnorm
> and define any crazy stopping condition you like.
>
> xnorm, gnorm and fnorm are commonly required for many stopping conditions
> and are computed by the snes methods. As such, are readily available and
> for efficiency and convenience they are provided to the user (e.g. to avoid
> you having to re-compute norms).
>
> Cheers,
>   Dave
>
>
>>
>> -gideon
>>
>> On Jan 12, 2016, at 8:24 AM, Dave May <dave.mayhem23 at gmail.com> wrote:
>>
>>
>>
>> On 12 January 2016 at 14:14, Gideon Simpson <gideon.simpson at gmail.com>
>> wrote:
>>
>>> That seems to to allow for me to cook up a convergence test in terms of
>>> the 2 norm.
>>>
>>
>> While you are only provided the 2 norm of F, you are also given access to
>> the SNES object. Thus inside your user convergence test function, you can
>> call SNESGetFunction() and SNESGetSolution(), then you can compute your
>> convergence criteria and set the converged reason to what ever you want.
>>
>> See
>>
>>
>> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetFunction.html
>>
>>
>> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetSolution.html
>>
>> Cheers,
>>   Dave
>>
>>
>>
>>
>>
>>> What I?m really looking for is the ability to change things to be
>>> something like the 2 norm of the vector with elements
>>>
>>> F_i/|x_i|
>>>
>>> where I am looking for a root of F(x).  I can just build that scaling
>>> into the form function, but is there a way to do it without rewriting that
>>> piece of the code?
>>>
>>>
>>> -gideon
>>>
>>> On Jan 12, 2016, at 12:14 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>
>>>
>>>   You can use SNESSetConvergenceTest() to use whatever test you want to
>>> decide on convergence.
>>>
>>> Barry
>>>
>>> On Jan 11, 2016, at 3:26 PM, Gideon Simpson <gideon.simpson at gmail.com>
>>> wrote:
>>>
>>> I?m solving nonlinear problem for a complex valued function which is
>>> decomposed into real and imaginary parts, Q = u + i v.  What I?m finding is
>>> that where |Q| is small, the numerical phase errors tend to be larger.  I
>>> suspect this is because it?s using the 2-norm for convergence in the SNES,
>>> so, where the solution is already, the phase errors are seen as small too.
>>> Is there a way to use something more like an infinity norm with SNES, to
>>> get more point wise control?
>>>
>>> -gideon
>>>
>>>
>>>
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160112/6ad7c50d/attachment-0001.html>

From borisbou at buffalo.edu  Tue Jan 12 09:37:50 2016
From: borisbou at buffalo.edu (Boris Boutkov)
Date: Tue, 12 Jan 2016 10:37:50 -0500
Subject: [petsc-users] Providing context to DMShell
Message-ID: <56951DCE.2060401@buffalo.edu>

Hello All,

I'm trying to attach a context to the DMShell similarly to how a context 
is passed into SNES through the SetFunction routine. Specifically, I'm 
looking to provide my own interpolation routine to both 
DMShellSetCreateInterpolationand Injection, which requires some user 
data from my environment. Ive tried searching around the _p_DM*struct 
looking for somewhere to attach this data but found no convenient way, 
any pointers to how I could achieve this would be appreciated.

Thanks for your time,
Boris Boutkov

From lawrence.mitchell at imperial.ac.uk  Tue Jan 12 09:44:01 2016
From: lawrence.mitchell at imperial.ac.uk (Lawrence Mitchell)
Date: Tue, 12 Jan 2016 15:44:01 +0000
Subject: [petsc-users] Providing context to DMShell
In-Reply-To: <56951DCE.2060401@buffalo.edu>
References: <56951DCE.2060401@buffalo.edu>
Message-ID: <56951F41.2080007@imperial.ac.uk>


On 12/01/16 15:37, Boris Boutkov wrote:
> Hello All,
> 
> I'm trying to attach a context to the DMShell similarly to how a context 
> is passed into SNES through the SetFunction routine. Specifically, I'm 
> looking to provide my own interpolation routine to both 
> DMShellSetCreateInterpolationand Injection, which requires some user 
> data from my environment. Ive tried searching around the _p_DM*struct 
> looking for somewhere to attach this data but found no convenient way, 
> any pointers to how I could achieve this would be appreciated.

I think you want to do:

DMSetApplicationContext(dm, user_context);

Inside your interpolation routine you can then use:

PetscErrorCode interpolate(DM coarse, DM fine, Mat *m, Vec *v) {
    ...
    DMGetApplicationContext(coarse, *ctx);
    ...

}

Cheers,

Lawrence

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 490 bytes
Desc: OpenPGP digital signature
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160112/728415bb/attachment.pgp>

From balay at mcs.anl.gov  Tue Jan 12 11:18:51 2016
From: balay at mcs.anl.gov (Satish Balay)
Date: Tue, 12 Jan 2016 11:18:51 -0600
Subject: [petsc-users] METIS without C++ compiler
In-Reply-To: <87k2nfwcxo.fsf@jedbrown.org>
References: <5693CD36.10205@gmail.com> <87k2nfwcxo.fsf@jedbrown.org>
Message-ID: <alpine.LFD.2.20.1601121118140.601@asterix>

On Tue, 12 Jan 2016, Jed Brown wrote:

> Tabrez Ali <tabrezali at gmail.com> writes:
> 
> > Hello
> >
> > I just wanted to point that configure fails when "--with-metis=1 
> > --download-metis=1" options are used and a C++ compiler is not installed.
> >
> > After changing "project(METIS)" to "project(METIS C)" in 
> > petsc-3.6.3/arch-linux2-c-opt/externalpackages/metis-5.1.0-p1/CMakeLists.txt 
> > it works alright.
> 
> Thanks, it looks like ParMETIS needs this too.  I've pushed this change
> to the repositories for each.  Satish can bump the patch number and
> point PETSc to the new version.
> 

spun the new patched tarballs - and added the change in
'balay/to-maint-metis-parmetis-nocxx' and merged to next - for now.

Satish

From kshyatt at physics.ucsb.edu  Tue Jan 12 15:20:05 2016
From: kshyatt at physics.ucsb.edu (Katharine Hyatt)
Date: Tue, 12 Jan 2016 13:20:05 -0800
Subject: [petsc-users] HDF5Viewer only on worker 0?
Message-ID: <43101E79-FA47-4546-A3C7-88916DDDF023@physics.ucsb.edu>

Hello,

I?m trying to use PETsc?s HDF5Viewers on a system that doesn?t support parallel HDF5. When I tried naively using

PetscViewer hdf5viewer;
PetscViewerHDF5Open( PETSC_COMM_WORLD, filename, FILE_MODE_WRITE, &hdf5viewer);

I get a segfault because ADIOI can?t lock. So I switched to using the binary format, which routes everything through one CPU. Then my job can output successfully. But I would like to use HDF5 without any intermediate steps, and reading the documentation it was unclear to me if it is possible to ask for behavior similar to the binary viewers from the HDF5 ones - everyone sends their information to worker 0, who then does single-process I/O. Is this possible?

Thanks,
Katharine

From mfadams at lbl.gov  Tue Jan 12 17:48:36 2016
From: mfadams at lbl.gov (Mark Adams)
Date: Tue, 12 Jan 2016 15:48:36 -0800
Subject: [petsc-users] osx configuration error
Message-ID: <CADOhEh74XWmCXLU=4GWSrSDNWikjm3QStu1PfBLfAO1b04SFUA@mail.gmail.com>

I did nuke the arch directory.  This has worked in the past and don't know
what I might have changed.  it's been awhile since I've reconfigured.
Thanks,
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160112/7ab9b1e6/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: configure.log
Type: application/octet-stream
Size: 339545 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160112/7ab9b1e6/attachment-0001.obj>

From bsmith at mcs.anl.gov  Tue Jan 12 18:30:13 2016
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Tue, 12 Jan 2016 18:30:13 -0600
Subject: [petsc-users] HDF5Viewer only on worker 0?
In-Reply-To: <43101E79-FA47-4546-A3C7-88916DDDF023@physics.ucsb.edu>
References: <43101E79-FA47-4546-A3C7-88916DDDF023@physics.ucsb.edu>
Message-ID: <9D1A2757-BAD2-4638-9FFB-92608570F017@mcs.anl.gov>


   Katherine,

   Assuming the vectors are not so large that the entire thing cannot fit on the first process you could  do something like

VecScatterCreateToZero(vec, &scatter,&veczero);
VecScatterBegin/End(scatter,vec,veczero);

if (!rank) {
> PetscViewer hdf5viewer;
> PetscViewerHDF5Open( PETSC_COMM_SELF filename, FILE_MODE_WRITE, &hdf5viewer);
VecView(vzero,hdf5viewer);
}

Not that if your vec came from a DMDA then you need to first do a DMDAGlobalToNaturalBegin/End() to get a vector in the right ordering to pass to VecScatterCreateToZero(). 

On the other hand if the vectors are enormous and cannot fit on one process it would be more involved. Essentially you would 
need to copy VecView_MPI_Binary() and modify it to write out to HDF a part at a time instead of the binary format it does now.

  Barry


> On Jan 12, 2016, at 3:20 PM, Katharine Hyatt <kshyatt at physics.ucsb.edu> wrote:
> 
> Hello,
> 
> I?m trying to use PETsc?s HDF5Viewers on a system that doesn?t support parallel HDF5. When I tried naively using
> 
> PetscViewer hdf5viewer;
> PetscViewerHDF5Open( PETSC_COMM_WORLD, filename, FILE_MODE_WRITE, &hdf5viewer);
> 
> I get a segfault because ADIOI can?t lock. So I switched to using the binary format, which routes everything through one CPU. Then my job can output successfully. But I would like to use HDF5 without any intermediate steps, and reading the documentation it was unclear to me if it is possible to ask for behavior similar to the binary viewers from the HDF5 ones - everyone sends their information to worker 0, who then does single-process I/O. Is this possible?
> 
> Thanks,
> Katharine


From balay at mcs.anl.gov  Tue Jan 12 18:31:49 2016
From: balay at mcs.anl.gov (Satish Balay)
Date: Tue, 12 Jan 2016 18:31:49 -0600
Subject: [petsc-users] osx configuration error
In-Reply-To: <CADOhEh74XWmCXLU=4GWSrSDNWikjm3QStu1PfBLfAO1b04SFUA@mail.gmail.com>
References: <CADOhEh74XWmCXLU=4GWSrSDNWikjm3QStu1PfBLfAO1b04SFUA@mail.gmail.com>
Message-ID: <alpine.LFD.2.20.1601121829290.32027@asterix>

> 'file' object has no attribute 'getvalue'  File "/Users/markadams/Codes/petsc/config/configure.py", line 363, in petsc_configure

Hm - have to figure this one out - but the primary issue is:

> stderr:
> gfortran: warning: couldn't understand kern.osversion '15.2.0
> ld: -rpath can only be used when targeting Mac OS X 10.5 or later

Perhaps you've updated xcode or OSX - but did not reinstall brew/gfortran.

> Executing: mpif90 --version
> stdout:
> GNU Fortran (Homebrew gcc 4.9.1) 4.9.1

I suggest uninstalling/reinstalling homebrew packages.

Satish


On Tue, 12 Jan 2016, Mark Adams wrote:

> I did nuke the arch directory.  This has worked in the past and don't know
> what I might have changed.  it's been awhile since I've reconfigured.
> Thanks,
> Mark
> 


From gideon.simpson at gmail.com  Tue Jan 12 20:19:26 2016
From: gideon.simpson at gmail.com (Gideon Simpson)
Date: Tue, 12 Jan 2016 21:19:26 -0500
Subject: [petsc-users] SNES norm control
In-Reply-To: <CAJ98EDqLsprnB6vPagfLyXtFRqBLnOZf-Jeb+BEA_Jn-3z9-xg@mail.gmail.com>
References: <12AEF269-EF0A-4330-8924-0B3A02C234BB@gmail.com>
	<51EEA1EB-19FE-4F69-A4E9-EA8FCE323F42@mcs.anl.gov>
	<4C6FAD00-73E1-4435-940D-6482AB5DDA54@gmail.com>
	<CAJ98EDpyrkfivp-p2KfaH_NsXvOCeJ8=ywYsbGXbgiAZzZvpxA@mail.gmail.com>
	<6D8EB106-8809-409B-BC61-BEEBEF8AF820@gmail.com>
	<CAJ98EDroBWBDaywxfy4_FGFoR4LF3VBhD5-vWqfn8QyyHesJKQ@mail.gmail.com>
	<F8384604-C48F-48F5-8A81-3D019BFBE4A8@gmail.com>
	<CAJ98EDqLsprnB6vPagfLyXtFRqBLnOZf-Jeb+BEA_Jn-3z9-xg@mail.gmail.com>
Message-ID: <BF97EDCD-1FEC-4327-AE76-41A006471FD3@gmail.com>

Got it.  I?m trying to build up my desired convergence test, based on the default routine. I?m getting the following compiler error, which I don?t entirely understand:

blowup_utils.c:180:9: error: 
      incomplete definition of type 'struct _p_SNES'
    snes->ttol = fnorm_scaled*snes->rtol;
    ~~~~^
/opt/petsc/include/petscsnes.h:20:16: note: forward declaration of
      'struct _p_SNES'
typedef struct _p_SNES* SNES;

Separately, is there a way to get the step vector?

-gideon

> On Jan 12, 2016, at 9:17 AM, Dave May <dave.mayhem23 at gmail.com> wrote:
> 
> 
> 
> On 12 January 2016 at 15:06, <gideon.simpson at gmail.com <mailto:gideon.simpson at gmail.com>> wrote:
> Do I have to manually code in the divergence criteria too?
> 
> Yes.
> 
> By calling SNESSetConvergenceTest() you are replacing the default SNES convergence test function which will get called at each SNES iteration, therefore you are responsible for defining all reasons for convergence and divergence.
> 
> To make life easy, you could copy everything in the funciton SNESConvergedDefault(), 
> 
> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESConvergedDefault.html#SNESConvergedDefault <http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESConvergedDefault.html#SNESConvergedDefault>
> 
> and just replace the rule for 
>   SNES_CONVERGED_FNORM_RELATIVE
> with your custom scaled stopping condition.
> 
> 
> 
> 
>  
> 
> On Jan 12, 2016, at 8:37 AM, Dave May <dave.mayhem23 at gmail.com <mailto:dave.mayhem23 at gmail.com>> wrote:
> 
>> 
>> 
>> On 12 January 2016 at 14:33, Gideon Simpson <gideon.simpson at gmail.com <mailto:gideon.simpson at gmail.com>> wrote:
>> I?m just a bit confused by the documentation for SNESConvergenceTestFunction.  the arguments for the xnorm, gnorm, and f are passed in, at the current iterate, correct?
>> 
>> Yes, but nothing requires you to use them :D
>>  
>>  I interpreted this as though I had to build by convergence test based on those values.
>> 
>> This is a misinterpretation. You can ignore all of xnorm, gnorm and fnorm and define any crazy stopping condition you like. 
>> 
>> xnorm, gnorm and fnorm are commonly required for many stopping conditions and are computed by the snes methods. As such, are readily available and for efficiency and convenience they are provided to the user (e.g. to avoid you having to re-compute norms).
>> 
>> Cheers,
>>   Dave
>>  
>> 
>> -gideon
>> 
>>> On Jan 12, 2016, at 8:24 AM, Dave May <dave.mayhem23 at gmail.com <mailto:dave.mayhem23 at gmail.com>> wrote:
>>> 
>>> 
>>> 
>>> On 12 January 2016 at 14:14, Gideon Simpson <gideon.simpson at gmail.com <mailto:gideon.simpson at gmail.com>> wrote:
>>> That seems to to allow for me to cook up a convergence test in terms of the 2 norm.  
>>> 
>>> While you are only provided the 2 norm of F, you are also given access to the SNES object. Thus inside your user convergence test function, you can call SNESGetFunction() and SNESGetSolution(), then you can compute your convergence criteria and set the converged reason to what ever you want.
>>> 
>>> See 
>>> 
>>> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetFunction.html <http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetFunction.html>
>>> 
>>> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetSolution.html <http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetSolution.html>
>>> 
>>> Cheers,
>>>   Dave
>>> 
>>> 
>>> 
>>>  
>>> What I?m really looking for is the ability to change things to be something like the 2 norm of the vector with elements
>>> 
>>> F_i/|x_i|
>>> 
>>> where I am looking for a root of F(x).  I can just build that scaling into the form function, but is there a way to do it without rewriting that piece of the code?
>>> 
>>> 
>>> -gideon
>>> 
>>>> On Jan 12, 2016, at 12:14 AM, Barry Smith <bsmith at mcs.anl.gov <mailto:bsmith at mcs.anl.gov>> wrote:
>>>> 
>>>> 
>>>>   You can use SNESSetConvergenceTest() to use whatever test you want to decide on convergence.
>>>> 
>>>> Barry
>>>> 
>>>>> On Jan 11, 2016, at 3:26 PM, Gideon Simpson <gideon.simpson at gmail.com <mailto:gideon.simpson at gmail.com>> wrote:
>>>>> 
>>>>> I?m solving nonlinear problem for a complex valued function which is decomposed into real and imaginary parts, Q = u + i v.  What I?m finding is that where |Q| is small, the numerical phase errors tend to be larger.  I suspect this is because it?s using the 2-norm for convergence in the SNES, so, where the solution is already, the phase errors are seen as small too.  Is there a way to use something more like an infinity norm with SNES, to get more point wise control?
>>>>> 
>>>>> -gideon
>>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160112/7f98e74e/attachment.html>

From bsmith at mcs.anl.gov  Tue Jan 12 21:55:27 2016
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Tue, 12 Jan 2016 21:55:27 -0600
Subject: [petsc-users] SNES norm control
In-Reply-To: <BF97EDCD-1FEC-4327-AE76-41A006471FD3@gmail.com>
References: <12AEF269-EF0A-4330-8924-0B3A02C234BB@gmail.com>
	<51EEA1EB-19FE-4F69-A4E9-EA8FCE323F42@mcs.anl.gov>
	<4C6FAD00-73E1-4435-940D-6482AB5DDA54@gmail.com>
	<CAJ98EDpyrkfivp-p2KfaH_NsXvOCeJ8=ywYsbGXbgiAZzZvpxA@mail.gmail.com>
	<6D8EB106-8809-409B-BC61-BEEBEF8AF820@gmail.com>
	<CAJ98EDroBWBDaywxfy4_FGFoR4LF3VBhD5-vWqfn8QyyHesJKQ@mail.gmail.com>
	<F8384604-C48F-48F5-8A81-3D019BFBE4A8@gmail.com>
	<CAJ98EDqLsprnB6vPagfLyXtFRqBLnOZf-Jeb+BEA_Jn-3z9-xg@mail.gmail.com>
	<BF97EDCD-1FEC-4327-AE76-41A006471FD3@gmail.com>
Message-ID: <140D0B19-E93F-4D7D-9EDD-D0B6CB060656@mcs.anl.gov>


> On Jan 12, 2016, at 8:19 PM, Gideon Simpson <gideon.simpson at gmail.com> wrote:
> 
> Got it.  I?m trying to build up my desired convergence test, based on the default routine. I?m getting the following compiler error, which I don?t entirely understand:
> 
> blowup_utils.c:180:9: error: 
>       incomplete definition of type 'struct _p_SNES'
>     snes->ttol = fnorm_scaled*snes->rtol;
>     ~~~~^
> /opt/petsc/include/petscsnes.h:20:16: note: forward declaration of
>       'struct _p_SNES'
> typedef struct _p_SNES* SNES;

  Since you are accessing the internals of SNES you need to include <petsc/private/snesimpl.h> 
> 
> Separately, is there a way to get the step vector?

SNESGetSolutionUpdate()

> 
> -gideon
> 
>> On Jan 12, 2016, at 9:17 AM, Dave May <dave.mayhem23 at gmail.com> wrote:
>> 
>> 
>> 
>> On 12 January 2016 at 15:06, <gideon.simpson at gmail.com> wrote:
>> Do I have to manually code in the divergence criteria too?
>> 
>> Yes.
>> 
>> By calling SNESSetConvergenceTest() you are replacing the default SNES convergence test function which will get called at each SNES iteration, therefore you are responsible for defining all reasons for convergence and divergence.
>> 
>> To make life easy, you could copy everything in the funciton SNESConvergedDefault(), 
>> 
>> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESConvergedDefault.html#SNESConvergedDefault
>> 
>> and just replace the rule for 
>>   SNES_CONVERGED_FNORM_RELATIVE
>> with your custom scaled stopping condition.
>> 
>> 
>> 
>> 
>>  
>> 
>> On Jan 12, 2016, at 8:37 AM, Dave May <dave.mayhem23 at gmail.com> wrote:
>> 
>>> 
>>> 
>>> On 12 January 2016 at 14:33, Gideon Simpson <gideon.simpson at gmail.com> wrote:
>>> I?m just a bit confused by the documentation for SNESConvergenceTestFunction.  the arguments for the xnorm, gnorm, and f are passed in, at the current iterate, correct?
>>> 
>>> Yes, but nothing requires you to use them :D
>>>  
>>>  I interpreted this as though I had to build by convergence test based on those values.
>>> 
>>> This is a misinterpretation. You can ignore all of xnorm, gnorm and fnorm and define any crazy stopping condition you like. 
>>> 
>>> xnorm, gnorm and fnorm are commonly required for many stopping conditions and are computed by the snes methods. As such, are readily available and for efficiency and convenience they are provided to the user (e.g. to avoid you having to re-compute norms).
>>> 
>>> Cheers,
>>>   Dave
>>>  
>>> 
>>> -gideon
>>> 
>>>> On Jan 12, 2016, at 8:24 AM, Dave May <dave.mayhem23 at gmail.com> wrote:
>>>> 
>>>> 
>>>> 
>>>> On 12 January 2016 at 14:14, Gideon Simpson <gideon.simpson at gmail.com> wrote:
>>>> That seems to to allow for me to cook up a convergence test in terms of the 2 norm.  
>>>> 
>>>> While you are only provided the 2 norm of F, you are also given access to the SNES object. Thus inside your user convergence test function, you can call SNESGetFunction() and SNESGetSolution(), then you can compute your convergence criteria and set the converged reason to what ever you want.
>>>> 
>>>> See 
>>>> 
>>>> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetFunction.html
>>>> 
>>>> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetSolution.html
>>>> 
>>>> Cheers,
>>>>   Dave
>>>> 
>>>> 
>>>> 
>>>>  
>>>> What I?m really looking for is the ability to change things to be something like the 2 norm of the vector with elements
>>>> 
>>>> F_i/|x_i|
>>>> 
>>>> where I am looking for a root of F(x).  I can just build that scaling into the form function, but is there a way to do it without rewriting that piece of the code?
>>>> 
>>>> 
>>>> -gideon
>>>> 
>>>>> On Jan 12, 2016, at 12:14 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>>> 
>>>>> 
>>>>>   You can use SNESSetConvergenceTest() to use whatever test you want to decide on convergence.
>>>>> 
>>>>> Barry
>>>>> 
>>>>>> On Jan 11, 2016, at 3:26 PM, Gideon Simpson <gideon.simpson at gmail.com> wrote:
>>>>>> 
>>>>>> I?m solving nonlinear problem for a complex valued function which is decomposed into real and imaginary parts, Q = u + i v.  What I?m finding is that where |Q| is small, the numerical phase errors tend to be larger.  I suspect this is because it?s using the 2-norm for convergence in the SNES, so, where the solution is already, the phase errors are seen as small too.  Is there a way to use something more like an infinity norm with SNES, to get more point wise control?
>>>>>> 
>>>>>> -gideon
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
> 


From hgbk2008 at gmail.com  Wed Jan 13 03:34:02 2016
From: hgbk2008 at gmail.com (Hoang Giang Bui)
Date: Wed, 13 Jan 2016 10:34:02 +0100
Subject: [petsc-users] error on MatZeroRowsColumns
Message-ID: <CAJW_hKeULRUZQxnS7ke=FFzfk+kLJysvd_c8W2AqEz1aUhH9Eg@mail.gmail.com>

Dear PETSc developers

I got an error with MatZeroRowsColumns, which said there was one missing
diagonal entries

This is the full log message that I got:

Mat Object: 2 MPI processes
  type: mpiaij
  rows=41064, cols=41064, bs=4
  total: nonzeros=5.66069e+06, allocated nonzeros=1.28112e+07
  total number of mallocs used during MatSetValues calls =0
    using I-node (on process 0) routines: found 5647 nodes, limit used is 5
[0]PETSC ERROR: --------------------- Error Message
--------------------------------------------------------------
[0]PETSC ERROR: Object is in wrong state
[0]PETSC ERROR: Matrix is missing diagonal entry in row 7
[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for
trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.6.3, Dec, 03, 2015
[0]PETSC ERROR: python on a arch-linux2-cxx-opt named bermuda by hbui2 Wed
Jan 13 10:27:42 2016
[0]PETSC ERROR: Configure options --with-shared-libraries
--with-debugging=0 --with-pic --with-clanguage=cxx
--download-fblas-lapack=yes --download-ptscotch=yes --download-metis=yes
--download-parmetis=yes --download-scalapack=yes --download-mumps=yes
--download-hypre=yes --download-ml=yes --download-klu=yes
--download-pastix=yes --with-mpi-dir=/opt/openmpi-1.10.1_thread-multiple
--prefix=/home/hbui/opt/petsc-3.6.3
[0]PETSC ERROR: #1 MatZeroRowsColumns_SeqAIJ() line 1901 in
/media/NEW_HOME/sw2/petsc-3.6.3/src/mat/impls/aij/seq/aij.c
[0]PETSC ERROR: #2 MatZeroRowsColumns() line 5476 in
/media/NEW_HOME/sw2/petsc-3.6.3/src/mat/interface/matrix.c
[0]PETSC ERROR: #3 MatZeroRowsColumns_MPIAIJ() line 908 in
/media/NEW_HOME/sw2/petsc-3.6.3/src/mat/impls/aij/mpi/mpiaij.c
[0]PETSC ERROR: #4 MatZeroRowsColumns() line 5476 in
/media/NEW_HOME/sw2/petsc-3.6.3/src/mat/interface/matrix.c
[1]PETSC ERROR: --------------------- Error Message
--------------------------------------------------------------
[1]PETSC ERROR: Object is in wrong state
[1]PETSC ERROR: Matrix is missing diagonal entry in row 7
[1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for
trouble shooting.
[1]PETSC ERROR: Petsc Release Version 3.6.3, Dec, 03, 2015
[1]PETSC ERROR: python on a arch-linux2-cxx-opt named bermuda by hbui2 Wed
Jan 13 10:27:42 2016
[1]PETSC ERROR: Configure options --with-shared-libraries
--with-debugging=0 --with-pic --with-clanguage=cxx
--download-fblas-lapack=yes --download-ptscotch=yes --download-metis=yes
--download-parmetis=yes --download-scalapack=yes --download-mumps=yes
--download-hypre=yes --download-ml=yes --download-klu=yes
--download-pastix=yes --with-mpi-dir=/opt/openmpi-1.10.1_thread-multiple
--prefix=/home/hbui/opt/petsc-3.6.3
[1]PETSC ERROR: #1 MatZeroRowsColumns_SeqAIJ() line 1901 in
/media/NEW_HOME/sw2/petsc-3.6.3/src/mat/impls/aij/seq/aij.c
[1]PETSC ERROR: #2 MatZeroRowsColumns() line 5476 in
/media/NEW_HOME/sw2/petsc-3.6.3/src/mat/interface/matrix.c
[1]PETSC ERROR: #3 MatZeroRowsColumns_MPIAIJ() line 908 in
/media/NEW_HOME/sw2/petsc-3.6.3/src/mat/impls/aij/mpi/mpiaij.c
[1]PETSC ERROR: #4 MatZeroRowsColumns() line 5476 in
/media/NEW_HOME/sw2/petsc-3.6.3/src/mat/interface/matrix.c


The problem is, before calling MatZeroRowsColumns, I searched for zero rows
and set the respective diagonal:

        PetscInt Istart, Iend;
        MatGetOwnershipRange(rA.Get(), &Istart, &Iend);

        // loop through each row in the current partition
        for(PetscInt row = Istart; row < Iend; ++row)
        {
            int ncols;
            const PetscInt* cols;
            const PetscScalar* vals;
            MatGetRow(rA.Get(), row, &ncols, &cols, &vals);

            PetscScalar row_norm = 0.0;
            for(int i = 0; i < ncols; ++i)
            {
                PetscScalar val = vals[i];
                row_norm += pow(val, 2);
            }
            row_norm = sqrt(row_norm);

            if(row_norm == 0.0)
            {
                for(int i = 0; i < ncols; ++i)
                {
                    PetscInt col = cols[i];
                    if(col == row)
                    {
                        MatSetValue(rA.Get(), row, col, 1.0, INSERT_VALUES);
                    }
                }
            }

            MatRestoreRow(rA.Get(), row, &ncols, &cols, &vals);
        }

        // cached the modification
        MatAssemblyBegin(rA.Get(), MAT_FINAL_ASSEMBLY);
        MatAssemblyEnd(rA.Get(), MAT_FINAL_ASSEMBLY);

This should set all the missing diagonal for missing rows. But I could not
figure out why the error message above happen. Any ideas?

Regards
Giang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160113/a510517b/attachment-0001.html>

From knepley at gmail.com  Wed Jan 13 04:15:53 2016
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 13 Jan 2016 04:15:53 -0600
Subject: [petsc-users] osx configuration error
In-Reply-To: <alpine.LFD.2.20.1601121829290.32027@asterix>
References: <CADOhEh74XWmCXLU=4GWSrSDNWikjm3QStu1PfBLfAO1b04SFUA@mail.gmail.com>
	<alpine.LFD.2.20.1601121829290.32027@asterix>
Message-ID: <CAMYG4Gn--VDEzid5ra=fir1bh3+6oAa0Vi9D98FTzSrPU+k2gA@mail.gmail.com>

On Tue, Jan 12, 2016 at 6:31 PM, Satish Balay <balay at mcs.anl.gov> wrote:

> > 'file' object has no attribute 'getvalue'  File
> "/Users/markadams/Codes/petsc/config/configure.py", line 363, in
> petsc_configure
>
> Hm - have to figure this one out - but the primary issue is:
>
> > stderr:
> > gfortran: warning: couldn't understand kern.osversion '15.2.0
> > ld: -rpath can only be used when targeting Mac OS X 10.5 or later
>

I get this. The remedy I use is to put

  MACOSX_DEPLOYMENT_TARGET=10.5

in the environment. Its annoying, and quintessentially Mac.

  Matt


> Perhaps you've updated xcode or OSX - but did not reinstall brew/gfortran.
>
> > Executing: mpif90 --version
> > stdout:
> > GNU Fortran (Homebrew gcc 4.9.1) 4.9.1
>
> I suggest uninstalling/reinstalling homebrew packages.
>
> Satish
>
>
>
> On Tue, 12 Jan 2016, Mark Adams wrote:
>
> > I did nuke the arch directory.  This has worked in the past and don't
> know
> > what I might have changed.  it's been awhile since I've reconfigured.
> > Thanks,
> > Mark
> >
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160113/d9b0d2cc/attachment.html>

From knepley at gmail.com  Wed Jan 13 04:17:36 2016
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 13 Jan 2016 04:17:36 -0600
Subject: [petsc-users] error on MatZeroRowsColumns
In-Reply-To: <CAJW_hKeULRUZQxnS7ke=FFzfk+kLJysvd_c8W2AqEz1aUhH9Eg@mail.gmail.com>
References: <CAJW_hKeULRUZQxnS7ke=FFzfk+kLJysvd_c8W2AqEz1aUhH9Eg@mail.gmail.com>
Message-ID: <CAMYG4Gkf-=ovRDkcRs+ccymsxAQzjHFjYsYfqtvC7UHcbjOL7g@mail.gmail.com>

On Wed, Jan 13, 2016 at 3:34 AM, Hoang Giang Bui <hgbk2008 at gmail.com> wrote:

> Dear PETSc developers
>
> I got an error with MatZeroRowsColumns, which said there was one missing
> diagonal entries
>
> This is the full log message that I got:
>
> Mat Object: 2 MPI processes
>   type: mpiaij
>   rows=41064, cols=41064, bs=4
>   total: nonzeros=5.66069e+06, allocated nonzeros=1.28112e+07
>   total number of mallocs used during MatSetValues calls =0
>     using I-node (on process 0) routines: found 5647 nodes, limit used is 5
> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [0]PETSC ERROR: Object is in wrong state
> [0]PETSC ERROR: Matrix is missing diagonal entry in row 7
> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.6.3, Dec, 03, 2015
> [0]PETSC ERROR: python on a arch-linux2-cxx-opt named bermuda by hbui2 Wed
> Jan 13 10:27:42 2016
> [0]PETSC ERROR: Configure options --with-shared-libraries
> --with-debugging=0 --with-pic --with-clanguage=cxx
> --download-fblas-lapack=yes --download-ptscotch=yes --download-metis=yes
> --download-parmetis=yes --download-scalapack=yes --download-mumps=yes
> --download-hypre=yes --download-ml=yes --download-klu=yes
> --download-pastix=yes --with-mpi-dir=/opt/openmpi-1.10.1_thread-multiple
> --prefix=/home/hbui/opt/petsc-3.6.3
> [0]PETSC ERROR: #1 MatZeroRowsColumns_SeqAIJ() line 1901 in
> /media/NEW_HOME/sw2/petsc-3.6.3/src/mat/impls/aij/seq/aij.c
> [0]PETSC ERROR: #2 MatZeroRowsColumns() line 5476 in
> /media/NEW_HOME/sw2/petsc-3.6.3/src/mat/interface/matrix.c
> [0]PETSC ERROR: #3 MatZeroRowsColumns_MPIAIJ() line 908 in
> /media/NEW_HOME/sw2/petsc-3.6.3/src/mat/impls/aij/mpi/mpiaij.c
> [0]PETSC ERROR: #4 MatZeroRowsColumns() line 5476 in
> /media/NEW_HOME/sw2/petsc-3.6.3/src/mat/interface/matrix.c
> [1]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [1]PETSC ERROR: Object is in wrong state
> [1]PETSC ERROR: Matrix is missing diagonal entry in row 7
> [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> [1]PETSC ERROR: Petsc Release Version 3.6.3, Dec, 03, 2015
> [1]PETSC ERROR: python on a arch-linux2-cxx-opt named bermuda by hbui2 Wed
> Jan 13 10:27:42 2016
> [1]PETSC ERROR: Configure options --with-shared-libraries
> --with-debugging=0 --with-pic --with-clanguage=cxx
> --download-fblas-lapack=yes --download-ptscotch=yes --download-metis=yes
> --download-parmetis=yes --download-scalapack=yes --download-mumps=yes
> --download-hypre=yes --download-ml=yes --download-klu=yes
> --download-pastix=yes --with-mpi-dir=/opt/openmpi-1.10.1_thread-multiple
> --prefix=/home/hbui/opt/petsc-3.6.3
> [1]PETSC ERROR: #1 MatZeroRowsColumns_SeqAIJ() line 1901 in
> /media/NEW_HOME/sw2/petsc-3.6.3/src/mat/impls/aij/seq/aij.c
> [1]PETSC ERROR: #2 MatZeroRowsColumns() line 5476 in
> /media/NEW_HOME/sw2/petsc-3.6.3/src/mat/interface/matrix.c
> [1]PETSC ERROR: #3 MatZeroRowsColumns_MPIAIJ() line 908 in
> /media/NEW_HOME/sw2/petsc-3.6.3/src/mat/impls/aij/mpi/mpiaij.c
> [1]PETSC ERROR: #4 MatZeroRowsColumns() line 5476 in
> /media/NEW_HOME/sw2/petsc-3.6.3/src/mat/interface/matrix.c
>
>
> The problem is, before calling MatZeroRowsColumns, I searched for zero
> rows and set the respective diagonal:
>
>         PetscInt Istart, Iend;
>         MatGetOwnershipRange(rA.Get(), &Istart, &Iend);
>
>         // loop through each row in the current partition
>         for(PetscInt row = Istart; row < Iend; ++row)
>         {
>             int ncols;
>             const PetscInt* cols;
>             const PetscScalar* vals;
>             MatGetRow(rA.Get(), row, &ncols, &cols, &vals);
>
>             PetscScalar row_norm = 0.0;
>             for(int i = 0; i < ncols; ++i)
>             {
>                 PetscScalar val = vals[i];
>                 row_norm += pow(val, 2);
>             }
>             row_norm = sqrt(row_norm);
>
>             if(row_norm == 0.0)
>             {
>                 for(int i = 0; i < ncols; ++i)
>                 {
>                     PetscInt col = cols[i];
>                     if(col == row)
>                     {
>                         MatSetValue(rA.Get(), row, col, 1.0,
> INSERT_VALUES);
>                     }
>                 }
>             }
>
>             MatRestoreRow(rA.Get(), row, &ncols, &cols, &vals);
>         }
>
>         // cached the modification
>         MatAssemblyBegin(rA.Get(), MAT_FINAL_ASSEMBLY);
>         MatAssemblyEnd(rA.Get(), MAT_FINAL_ASSEMBLY);
>
> This should set all the missing diagonal for missing rows. But I could not
> figure out why the error message above happen. Any ideas?
>

My guess would be that you have a row missing the diagonal, but with other
entries.

  Matt


> Regards
> Giang
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160113/e4b6b114/attachment.html>

From damon at ices.utexas.edu  Wed Jan 13 10:14:02 2016
From: damon at ices.utexas.edu (Damon McDougall)
Date: Wed, 13 Jan 2016 10:14:02 -0600
Subject: [petsc-users] The 7th Annual Scientific Software Days Conference
Message-ID: <1452701642.2145471.491082282.5126827B@webmail.messagingengine.com>

The 7th Annual Scientific Software Days Conference  (SSD)
<http://scisoftdays.org/> targets users and developers of scientific
software.  The conference will be held at the University of Texas at
Austin Thursday Feb 25 - Friday Feb 26, 2016 and focuses on two themes:

 a) sharing best practices across scientific software communities;
 b) sharing the latest tools and technology relevant to scientific
software.

Past keynotes speakers include Greg Wilson (2008), Victoria Stodden
(2009), Steve Easterbrook (2010),  Fernando Perez (2011), Will Schroeder
(2012), Neil Chue Hong (2013). 

This year's list of speakers include:

- Brian Adams (Sandia, Dakota):
http://www.sandia.gov/~briadam/index.html
- Jed Brown (CU Boulder, PETSc): https://jedbrown.org/
- Tim Davis (TAMU, SuiteSparse):
http://faculty.cse.tamu.edu/davis/welcome.html
- Iain Dunning (MIT, Julia Project): http://iaindunning.com/
- Victor Eijkhout (TACC): http://pages.tacc.utexas.edu/~eijkhout/
- Robert van de Geijn (keynote, UT Austin, libflame):
https://www.cs.utexas.edu/users/rvdg/
- Jeff Hammond (Intel, nwchem): https://jeffhammond.github.io/
- Mark Hoemmen (keynote, Sandia, Trilinos):
https://plus.google.com/+MarkHoemmen
- James Howison (UT Austin): http://james.howison.name/
- Fernando Perez (Berkeley, IPython): http://fperez.org/
- Cory Quammen (Kitware, Paraview/VTK):
http://www.kitware.com/company/team/quammen.html
- Ridgway Scott (UChicago, FEniCS): http://people.cs.uchicago.edu/~ridg/
- Roy Stogner (UT Austin, LibMesh):
https://scholar.google.com/citations?user=XcurJI0AAAAJ

In addition, we solicit poster submissions that share novel uses of
scientific software.  Please send an abstract of less than 250 words to
ssd-organizers at googlegroups.com.

Limited travel funding for students and early career researchers who
present posters will be available.

Early-bird registration fees (before Feb 10th):
Students: $35
Everyone else: $50

Late registration fees (Feb 10th onwards):
Students: $55
Everyone else: $70

Register here: http://scisoftdays.org/

Regards,
S. Fomel (UTexas), T. Isaac (UChicago), M. Knepley (Rice), R. Kirby
(Baylor), Y. Lai (UTexas), K. Long (Texas Tech), D. McDougall (UTexas),
J. Stewart (Sandia)

From mfadams at lbl.gov  Wed Jan 13 11:43:09 2016
From: mfadams at lbl.gov (Mark Adams)
Date: Wed, 13 Jan 2016 09:43:09 -0800
Subject: [petsc-users] osx configuration error
In-Reply-To: <CAMYG4Gn--VDEzid5ra=fir1bh3+6oAa0Vi9D98FTzSrPU+k2gA@mail.gmail.com>
References: <CADOhEh74XWmCXLU=4GWSrSDNWikjm3QStu1PfBLfAO1b04SFUA@mail.gmail.com>
	<alpine.LFD.2.20.1601121829290.32027@asterix>
	<CAMYG4Gn--VDEzid5ra=fir1bh3+6oAa0Vi9D98FTzSrPU+k2gA@mail.gmail.com>
Message-ID: <CADOhEh7TSOhyKDdbJkyqHzyYVqEccXi1nHO2sp6BW6n5z=p1BA@mail.gmail.com>

I'm still having problems.  I have upgraded gcc and mpich. I am now
upgrading everything from homebrew.  Any ideas on this error?
thanks,

On Wed, Jan 13, 2016 at 2:15 AM, Matthew Knepley <knepley at gmail.com> wrote:

> On Tue, Jan 12, 2016 at 6:31 PM, Satish Balay <balay at mcs.anl.gov> wrote:
>
>> > 'file' object has no attribute 'getvalue'  File
>> "/Users/markadams/Codes/petsc/config/configure.py", line 363, in
>> petsc_configure
>>
>> Hm - have to figure this one out - but the primary issue is:
>>
>> > stderr:
>> > gfortran: warning: couldn't understand kern.osversion '15.2.0
>> > ld: -rpath can only be used when targeting Mac OS X 10.5 or later
>>
>
> I get this. The remedy I use is to put
>
>   MACOSX_DEPLOYMENT_TARGET=10.5
>
> in the environment. Its annoying, and quintessentially Mac.
>
>   Matt
>
>
>> Perhaps you've updated xcode or OSX - but did not reinstall brew/gfortran.
>>
>> > Executing: mpif90 --version
>> > stdout:
>> > GNU Fortran (Homebrew gcc 4.9.1) 4.9.1
>>
>> I suggest uninstalling/reinstalling homebrew packages.
>>
>> Satish
>>
>>
>>
>> On Tue, 12 Jan 2016, Mark Adams wrote:
>>
>> > I did nuke the arch directory.  This has worked in the past and don't
>> know
>> > what I might have changed.  it's been awhile since I've reconfigured.
>> > Thanks,
>> > Mark
>> >
>>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160113/32080efb/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: configure.log
Type: application/octet-stream
Size: 266713 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160113/32080efb/attachment-0001.obj>

From balay at mcs.anl.gov  Wed Jan 13 11:49:21 2016
From: balay at mcs.anl.gov (Satish Balay)
Date: Wed, 13 Jan 2016 11:49:21 -0600
Subject: [petsc-users] osx configuration error
In-Reply-To: <CADOhEh7TSOhyKDdbJkyqHzyYVqEccXi1nHO2sp6BW6n5z=p1BA@mail.gmail.com>
References: <CADOhEh74XWmCXLU=4GWSrSDNWikjm3QStu1PfBLfAO1b04SFUA@mail.gmail.com>
	<alpine.LFD.2.20.1601121829290.32027@asterix>
	<CAMYG4Gn--VDEzid5ra=fir1bh3+6oAa0Vi9D98FTzSrPU+k2gA@mail.gmail.com>
	<CADOhEh7TSOhyKDdbJkyqHzyYVqEccXi1nHO2sp6BW6n5z=p1BA@mail.gmail.com>
Message-ID: <alpine.LFD.2.20.1601131146190.3368@asterix>

>>>>>>>>
Executing: mpif90  -o /var/folders/sw/67cq0mmx43g93vrb5xkf1j7c0000gn/T/petsc-2z06LS/config.setCompilers/conftest    /var/folders/sw/67cq0mmx43g93vrb5xkf1j7c0000gn/T/petsc-2z06LS/config.setCompilers/conftest.o
Testing executable /var/folders/sw/67cq0mmx43g93vrb5xkf1j7c0000gn/T/petsc-2z06LS/config.setCompilers/conftest to see if it can be run
Executing: /var/folders/sw/67cq0mmx43g93vrb5xkf1j7c0000gn/T/petsc-2z06LS/config.setCompilers/conftest
Executing: /var/folders/sw/67cq0mmx43g93vrb5xkf1j7c0000gn/T/petsc-2z06LS/config.setCompilers/conftest
ERROR while running executable: Could not execute "/var/folders/sw/67cq0mmx43g93vrb5xkf1j7c0000gn/T/petsc-2z06LS/config.setCompilers/conftest":
dyld: Library not loaded: /Users/markadams/homebrew/lib/gcc/x86_64-apple-darwin13.4.0/4.9.1/libgfortran.3.dylib
  Referenced from: /Users/markadams/homebrew/lib/libmpifort.12.dylib
  Reason: image not found
<<<<<<<<<<<

Mostlikely you haven't reinstalled mpich - as its refering to gfortran-4.9.1. Current gfortran is 5.3
GNU Fortran (Homebrew gcc 5.3.0) 5.3.0


This is what I would do to reinstall brew

1. Make list of pkgs to reinstall

brew leaves > reinstall.lst

2. delete all installed brew pacakges.

brew cleanup
brew list > delete.lst
brew remove `cat delete.lst

3. Now reinstall all required packages
brew update
brew install `cat reinstall.lst`


Satish


On Wed, 13 Jan 2016, Mark Adams wrote:

> I'm still having problems.  I have upgraded gcc and mpich. I am now
> upgrading everything from homebrew.  Any ideas on this error?
> thanks,
> 
> On Wed, Jan 13, 2016 at 2:15 AM, Matthew Knepley <knepley at gmail.com> wrote:
> 
> > On Tue, Jan 12, 2016 at 6:31 PM, Satish Balay <balay at mcs.anl.gov> wrote:
> >
> >> > 'file' object has no attribute 'getvalue'  File
> >> "/Users/markadams/Codes/petsc/config/configure.py", line 363, in
> >> petsc_configure
> >>
> >> Hm - have to figure this one out - but the primary issue is:
> >>
> >> > stderr:
> >> > gfortran: warning: couldn't understand kern.osversion '15.2.0
> >> > ld: -rpath can only be used when targeting Mac OS X 10.5 or later
> >>
> >
> > I get this. The remedy I use is to put
> >
> >   MACOSX_DEPLOYMENT_TARGET=10.5
> >
> > in the environment. Its annoying, and quintessentially Mac.
> >
> >   Matt
> >
> >
> >> Perhaps you've updated xcode or OSX - but did not reinstall brew/gfortran.
> >>
> >> > Executing: mpif90 --version
> >> > stdout:
> >> > GNU Fortran (Homebrew gcc 4.9.1) 4.9.1
> >>
> >> I suggest uninstalling/reinstalling homebrew packages.
> >>
> >> Satish
> >>
> >>
> >>
> >> On Tue, 12 Jan 2016, Mark Adams wrote:
> >>
> >> > I did nuke the arch directory.  This has worked in the past and don't
> >> know
> >> > what I might have changed.  it's been awhile since I've reconfigured.
> >> > Thanks,
> >> > Mark
> >> >
> >>
> >>
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> > experiments is infinitely more interesting than any results to which their
> > experiments lead.
> > -- Norbert Wiener
> >
> 


From mhasan8 at vols.utk.edu  Wed Jan 13 13:13:33 2016
From: mhasan8 at vols.utk.edu (Hasan, Fahad)
Date: Wed, 13 Jan 2016 19:13:33 +0000
Subject: [petsc-users] ODE Solver on multiple cores
Message-ID: <CY1PR0201MB0844BBC8F3B28B4E81465560C2CB0@CY1PR0201MB0844.namprd02.prod.outlook.com>

Hello,

I have written a code to solve a simple differential equation (x''+x'+6x=0 with initial values, x(0)=2, x'(0)=3). It works well on a single core and produces result close to theoretical answer but whenever I am trying to run the same code on multiple cores, I am getting incorrect results. It seems to me that, for multiple cores it stops after taking only 2 steps regardless of the final time and gives the final result (which is inaccurate). I tried different TSType (TSEULER, TSBEULER, TSSUNDIALS, TSCN etc.) but I always ended up with the same issue.

Can you tell me what possibly may cause this problem? Thanks in advance.

Regards,
Fahad
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160113/0274efc2/attachment.html>

From hzhang at mcs.anl.gov  Wed Jan 13 13:28:29 2016
From: hzhang at mcs.anl.gov (Hong)
Date: Wed, 13 Jan 2016 13:28:29 -0600
Subject: [petsc-users] ODE Solver on multiple cores
In-Reply-To: <CY1PR0201MB0844BBC8F3B28B4E81465560C2CB0@CY1PR0201MB0844.namprd02.prod.outlook.com>
References: <CY1PR0201MB0844BBC8F3B28B4E81465560C2CB0@CY1PR0201MB0844.namprd02.prod.outlook.com>
Message-ID: <CAGCphBu-MxU=Y2gNWiMBk23w60eUPapFVnczP-6qWdpuTg2QDw@mail.gmail.com>

Fahad:
Run your code with '-ts_view' to see what solvers being used for sequential
and parallel runs.

Hong

Hello,
>
>
>
> I have written a code to solve a simple differential equation (x??+x?+6x=0
> with initial values, x(0)=2, x?(0)=3). It works well on a single core and
> produces result close to theoretical answer but whenever I am trying to run
> the same code on multiple cores, I am getting incorrect results. It seems
> to me that, for multiple cores it stops after taking only 2 steps
> regardless of the final time and gives the final result (which is
> inaccurate). I tried different TSType (TSEULER, TSBEULER, TSSUNDIALS, TSCN
> etc.) but I always ended up with the same issue.
>
>
>
> Can you tell me what possibly may cause this problem? Thanks in advance.
>
>
>
> Regards,
>
> Fahad
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160113/573067af/attachment.html>

From bsmith at mcs.anl.gov  Wed Jan 13 13:35:39 2016
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Wed, 13 Jan 2016 13:35:39 -0600
Subject: [petsc-users] ODE Solver on multiple cores
In-Reply-To: <CY1PR0201MB0844BBC8F3B28B4E81465560C2CB0@CY1PR0201MB0844.namprd02.prod.outlook.com>
References: <CY1PR0201MB0844BBC8F3B28B4E81465560C2CB0@CY1PR0201MB0844.namprd02.prod.outlook.com>
Message-ID: <FF5749F8-E2AC-4FA2-8CFD-B93B62529042@mcs.anl.gov>


  Likely there is something wrong with the IFunction or RHSFunction or their Jacobians that you provide in parallel.  For the example you are running the easiest way to manage the parallelism of the data is with a DMDACreate1d(). Otherwise you need to manage the ghost point communication yourself by setting up VecScatters.

  Barry

> On Jan 13, 2016, at 1:13 PM, Hasan, Fahad <mhasan8 at vols.utk.edu> wrote:
> 
> Hello,
>  
> I have written a code to solve a simple differential equation (x??+x?+6x=0 with initial values, x(0)=2, x?(0)=3). It works well on a single core and produces result close to theoretical answer but whenever I am trying to run the same code on multiple cores, I am getting incorrect results. It seems to me that, for multiple cores it stops after taking only 2 steps regardless of the final time and gives the final result (which is inaccurate). I tried different TSType (TSEULER, TSBEULER, TSSUNDIALS, TSCN etc.) but I always ended up with the same issue.
>  
> Can you tell me what possibly may cause this problem? Thanks in advance.
>  
> Regards,
> Fahad


From hongzhang at anl.gov  Wed Jan 13 14:02:45 2016
From: hongzhang at anl.gov (Hong Zhang)
Date: Wed, 13 Jan 2016 14:02:45 -0600
Subject: [petsc-users] ODE Solver on multiple cores
In-Reply-To: <CY1PR0201MB0844BBC8F3B28B4E81465560C2CB0@CY1PR0201MB0844.namprd02.prod.outlook.com>
References: <CY1PR0201MB0844BBC8F3B28B4E81465560C2CB0@CY1PR0201MB0844.namprd02.prod.outlook.com>
Message-ID: <84E686A1-AFEA-4C2C-8AB2-C0A127B89029@anl.gov>

If x is just a scalar, it would not be a surprise that the code does not run in parallel. If x is a vector, you need a DM object to handle the decomposition. 

Hong

On Jan 13, 2016, at 1:13 PM, Hasan, Fahad <mhasan8 at vols.utk.edu> wrote:

> Hello,
>  
> I have written a code to solve a simple differential equation (x??+x?+6x=0 with initial values, x(0)=2, x?(0)=3). It works well on a single core and produces result close to theoretical answer but whenever I am trying to run the same code on multiple cores, I am getting incorrect results. It seems to me that, for multiple cores it stops after taking only 2 steps regardless of the final time and gives the final result (which is inaccurate). I tried different TSType (TSEULER, TSBEULER, TSSUNDIALS, TSCN etc.) but I always ended up with the same issue.
>  
> Can you tell me what possibly may cause this problem? Thanks in advance.
>  
> Regards,
> Fahad

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160113/783763b1/attachment.html>

From david.knezevic at akselos.com  Wed Jan 13 14:48:57 2016
From: david.knezevic at akselos.com (David Knezevic)
Date: Wed, 13 Jan 2016 15:48:57 -0500
Subject: [petsc-users] SNES NEWTONLS serial vs. parallel
Message-ID: <CAJCWK9AhTy+fUJjht-J9uAagryczsimd6PToduq0ONQGvKjbnQ@mail.gmail.com>

I'm using NEWTONLS (with mumps for the linear solves) to do a nonlinear PDE
solve. It converges well when I use 1 core. When I use 2 or more cores, the
line search stagnates. I've pasted the output of -snes_linesearch_monitor
below in these two cases.

I was wondering if this implies that I must have a bug in parallel, or if
perhaps the NEWTONLS solver can behave slightly differently in parallel?

Thanks,
David

---------------------------------------------------------------------------------------

*Serial case:*
  NL step  0, |residual|_2 = 4.714515e-02
      Line search: gnorm after quadratic fit 7.862867755323e-02
      Line search: Cubically determined step, current gnorm
4.663945043239e-02 lambda=1.4276549921126183e-02
  NL step  1, |residual|_2 = 4.663945e-02
      Line search: gnorm after quadratic fit 6.977268575068e-02
      Line search: Cubically determined step, current gnorm
4.594912794004e-02 lambda=2.3644825912085998e-02
  NL step  2, |residual|_2 = 4.594913e-02
      Line search: gnorm after quadratic fit 5.502067932478e-02
      Line search: Cubically determined step, current gnorm
4.494531294405e-02 lambda=4.1260497615261321e-02
  NL step  3, |residual|_2 = 4.494531e-02
      Line search: gnorm after quadratic fit 5.415371063247e-02
      Line search: Cubically determined step, current gnorm
4.392165925471e-02 lambda=3.6375618871780056e-02
  NL step  4, |residual|_2 = 4.392166e-02
      Line search: gnorm after quadratic fit 4.631663976615e-02
      Line search: Cubically determined step, current gnorm
4.246200798775e-02 lambda=5.0000000000000003e-02
  NL step  5, |residual|_2 = 4.246201e-02
      Line search: gnorm after quadratic fit 4.222105321728e-02
      Line search: Quadratically determined step,
lambda=1.0000000000000001e-01
  NL step  6, |residual|_2 = 4.222105e-02
      Line search: gnorm after quadratic fit 4.026081251872e-02
      Line search: Quadratically determined step,
lambda=1.0000000000000001e-01
  NL step  7, |residual|_2 = 4.026081e-02
      Line search: gnorm after quadratic fit 3.776439532346e-02
      Line search: Quadratically determined step,
lambda=1.0000000000000001e-01
  NL step  8, |residual|_2 = 3.776440e-02
      Line search: gnorm after quadratic fit 3.659796311121e-02
      Line search: Quadratically determined step,
lambda=1.0000000000000001e-01
  NL step  9, |residual|_2 = 3.659796e-02
      Line search: gnorm after quadratic fit 3.423207664901e-02
      Line search: Quadratically determined step,
lambda=1.0000000000000001e-01
  NL step 10, |residual|_2 = 3.423208e-02
      Line search: gnorm after quadratic fit 3.116928452225e-02
      Line search: Quadratically determined step,
lambda=1.0000000000000001e-01
  NL step 11, |residual|_2 = 3.116928e-02
      Line search: gnorm after quadratic fit 2.874310955274e-02
      Line search: Quadratically determined step,
lambda=1.0000000000000001e-01
  NL step 12, |residual|_2 = 2.874311e-02
      Line search: gnorm after quadratic fit 2.587826662305e-02
      Line search: Quadratically determined step,
lambda=1.0000000000000001e-01
  NL step 13, |residual|_2 = 2.587827e-02
      Line search: gnorm after quadratic fit 2.344161073075e-02
      Line search: Quadratically determined step,
lambda=1.0000000000000001e-01
  NL step 14, |residual|_2 = 2.344161e-02
      Line search: gnorm after quadratic fit 2.187719889554e-02
      Line search: Quadratically determined step,
lambda=1.0000000000000001e-01
  NL step 15, |residual|_2 = 2.187720e-02
      Line search: gnorm after quadratic fit 1.983089075086e-02
      Line search: Quadratically determined step,
lambda=1.0000000000000001e-01
  NL step 16, |residual|_2 = 1.983089e-02
      Line search: gnorm after quadratic fit 1.791227711151e-02
      Line search: Quadratically determined step,
lambda=1.0000000000000001e-01
  NL step 17, |residual|_2 = 1.791228e-02
      Line search: gnorm after quadratic fit 1.613250573900e-02
      Line search: Quadratically determined step,
lambda=1.0000000000000001e-01
  NL step 18, |residual|_2 = 1.613251e-02
      Line search: gnorm after quadratic fit 1.455841843183e-02
      Line search: Quadratically determined step,
lambda=1.0000000000000001e-01
  NL step 19, |residual|_2 = 1.455842e-02
      Line search: gnorm after quadratic fit 1.321849780208e-02
      Line search: Quadratically determined step,
lambda=1.0574876450981290e-01
  NL step 20, |residual|_2 = 1.321850e-02
      Line search: gnorm after quadratic fit 9.209641609489e-03
      Line search: Quadratically determined step,
lambda=3.0589684959139674e-01
  NL step 21, |residual|_2 = 9.209642e-03
      Line search: gnorm after quadratic fit 7.590942028574e-03
      Line search: Quadratically determined step,
lambda=2.0920305898507460e-01
  NL step 22, |residual|_2 = 7.590942e-03
      Line search: gnorm after quadratic fit 4.373918927227e-03
      Line search: Quadratically determined step,
lambda=4.2379743128074154e-01
  NL step 23, |residual|_2 = 4.373919e-03
      Line search: gnorm after quadratic fit 3.681351665911e-03
      Line search: Quadratically determined step,
lambda=1.9626618428089049e-01
  NL step 24, |residual|_2 = 3.681352e-03
      Line search: gnorm after quadratic fit 2.594782418891e-03
      Line search: Quadratically determined step,
lambda=3.8057533372167579e-01
  NL step 25, |residual|_2 = 2.594782e-03
      Line search: gnorm after quadratic fit 1.803188279452e-03
      Line search: Quadratically determined step,
lambda=4.3574109448916826e-01
  NL step 26, |residual|_2 = 1.803188e-03
      Line search: Using full step: fnorm 1.803188279452e-03 gnorm
9.015947319176e-04
  NL step 27, |residual|_2 = 9.015947e-04
      Line search: Using full step: fnorm 9.015947319176e-04 gnorm
7.088879385731e-08
  NL step 28, |residual|_2 = 7.088879e-08
      Line search: gnorm after quadratic fit 7.088878906502e-08
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088878957116e-08 lambda=2.1132490715284968e-01
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385683e-08 lambda=9.2196195824189087e-02
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385711e-08 lambda=4.0004532931495446e-02
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385722e-08 lambda=1.7374764617622523e-02
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385726e-08 lambda=7.5449542135114234e-03
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=3.2764749100364717e-03
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=1.4228361655588414e-03
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=6.1787884492365153e-04
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=2.6831916265377548e-04
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=1.1651988987471248e-04
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=5.0599757911789984e-05
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=2.1973377296845284e-05
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=9.5421268734746417e-06
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=4.1437501409853001e-06
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=1.7994589108402447e-06
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=7.8143041004756041e-07
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=3.3934283359762142e-07
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=1.4736252548330828e-07
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=6.3993436038104693e-08
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=2.7789696481734489e-08
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=1.2067913185456743e-08
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=5.2405944320925838e-09
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=2.2757729177880525e-09
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=9.8827383810151057e-10
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=4.2916635989551390e-10
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=1.8636915940199893e-10
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=8.0932400164504977e-11
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=3.5145586412497970e-11
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=1.5262271250668997e-11
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=6.6277717206096633e-12
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=2.8781665100197773e-12
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=1.2498684035299616e-12
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=5.4276603549660526e-13
      Line search: unable to find good step length! After 33 tries
      Line search: fnorm=7.0888793857309783e-08,
gnorm=7.0888793857309783e-08, ynorm=2.4650076775058285e-08,
minlambda=9.9999999999999998e-13, lambda=5.4276603549660526e-13, initial
slope=-5.0252210945441613e-15


*Parallel case:*
  NL step  0, |residual|_2 = 4.714515e-02
      Line search: gnorm after quadratic fit 7.862867755323e-02
      Line search: Cubically determined step, current gnorm
4.663945043239e-02 lambda=1.4276549921126183e-02
  NL step  1, |residual|_2 = 4.663945e-02
      Line search: gnorm after quadratic fit 6.977268575068e-02
      Line search: Cubically determined step, current gnorm
4.594912794004e-02 lambda=2.3644825912085998e-02
  NL step  2, |residual|_2 = 4.594913e-02
      Line search: gnorm after quadratic fit 5.502067932478e-02
      Line search: Cubically determined step, current gnorm
4.494531294405e-02 lambda=4.1260497615261321e-02
  NL step  3, |residual|_2 = 4.494531e-02
      Line search: gnorm after quadratic fit 5.415371063247e-02
      Line search: Cubically determined step, current gnorm
4.392165925471e-02 lambda=3.6375618871780056e-02
  NL step  4, |residual|_2 = 4.392166e-02
      Line search: gnorm after quadratic fit 4.631663976615e-02
      Line search: Cubically determined step, current gnorm
4.246200798775e-02 lambda=5.0000000000000003e-02
  NL step  5, |residual|_2 = 4.246201e-02
      Line search: gnorm after quadratic fit 4.222105321728e-02
      Line search: Quadratically determined step,
lambda=1.0000000000000001e-01
  NL step  6, |residual|_2 = 4.222105e-02
      Line search: gnorm after quadratic fit 4.026081251872e-02
      Line search: Quadratically determined step,
lambda=1.0000000000000001e-01
  NL step  7, |residual|_2 = 4.026081e-02
      Line search: gnorm after quadratic fit 3.776439532346e-02
      Line search: Quadratically determined step,
lambda=1.0000000000000001e-01
  NL step  8, |residual|_2 = 3.776440e-02
      Line search: gnorm after quadratic fit 3.659796311121e-02
      Line search: Quadratically determined step,
lambda=1.0000000000000001e-01
  NL step  9, |residual|_2 = 3.659796e-02
      Line search: gnorm after quadratic fit 3.423207664901e-02
      Line search: Quadratically determined step,
lambda=1.0000000000000001e-01
  NL step 10, |residual|_2 = 3.423208e-02
      Line search: gnorm after quadratic fit 3.116928452225e-02
      Line search: Quadratically determined step,
lambda=1.0000000000000001e-01
  NL step 11, |residual|_2 = 3.116928e-02
      Line search: gnorm after quadratic fit 2.874310955274e-02
      Line search: Quadratically determined step,
lambda=1.0000000000000001e-01
  NL step 12, |residual|_2 = 2.874311e-02
      Line search: gnorm after quadratic fit 2.587826662305e-02
      Line search: Quadratically determined step,
lambda=1.0000000000000001e-01
  NL step 13, |residual|_2 = 2.587827e-02
      Line search: gnorm after quadratic fit 2.344161073075e-02
      Line search: Quadratically determined step,
lambda=1.0000000000000001e-01
  NL step 14, |residual|_2 = 2.344161e-02
      Line search: gnorm after quadratic fit 2.187719889554e-02
      Line search: Quadratically determined step,
lambda=1.0000000000000001e-01
  NL step 15, |residual|_2 = 2.187720e-02
      Line search: gnorm after quadratic fit 1.983089075086e-02
      Line search: Quadratically determined step,
lambda=1.0000000000000001e-01
  NL step 16, |residual|_2 = 1.983089e-02
      Line search: gnorm after quadratic fit 1.791227711151e-02
      Line search: Quadratically determined step,
lambda=1.0000000000000001e-01
  NL step 17, |residual|_2 = 1.791228e-02
      Line search: gnorm after quadratic fit 1.613250573900e-02
      Line search: Quadratically determined step,
lambda=1.0000000000000001e-01
  NL step 18, |residual|_2 = 1.613251e-02
      Line search: gnorm after quadratic fit 1.455841843183e-02
      Line search: Quadratically determined step,
lambda=1.0000000000000001e-01
  NL step 19, |residual|_2 = 1.455842e-02
      Line search: gnorm after quadratic fit 1.321849780208e-02
      Line search: Quadratically determined step,
lambda=1.0574876450981290e-01
  NL step 20, |residual|_2 = 1.321850e-02
      Line search: gnorm after quadratic fit 9.209641609489e-03
      Line search: Quadratically determined step,
lambda=3.0589684959139674e-01
  NL step 21, |residual|_2 = 9.209642e-03
      Line search: gnorm after quadratic fit 7.590942028574e-03
      Line search: Quadratically determined step,
lambda=2.0920305898507460e-01
  NL step 22, |residual|_2 = 7.590942e-03
      Line search: gnorm after quadratic fit 4.373918927227e-03
      Line search: Quadratically determined step,
lambda=4.2379743128074154e-01
  NL step 23, |residual|_2 = 4.373919e-03
      Line search: gnorm after quadratic fit 3.681351665911e-03
      Line search: Quadratically determined step,
lambda=1.9626618428089049e-01
  NL step 24, |residual|_2 = 3.681352e-03
      Line search: gnorm after quadratic fit 2.594782418891e-03
      Line search: Quadratically determined step,
lambda=3.8057533372167579e-01
  NL step 25, |residual|_2 = 2.594782e-03
      Line search: gnorm after quadratic fit 1.803188279452e-03
      Line search: Quadratically determined step,
lambda=4.3574109448916826e-01
  NL step 26, |residual|_2 = 1.803188e-03
      Line search: Using full step: fnorm 1.803188279452e-03 gnorm
9.015947319176e-04
  NL step 27, |residual|_2 = 9.015947e-04
      Line search: Using full step: fnorm 9.015947319176e-04 gnorm
7.088879385731e-08
  NL step 28, |residual|_2 = 7.088879e-08
      Line search: gnorm after quadratic fit 7.088878906502e-08
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088878957116e-08 lambda=2.1132490715284968e-01
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385683e-08 lambda=9.2196195824189087e-02
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385711e-08 lambda=4.0004532931495446e-02
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385722e-08 lambda=1.7374764617622523e-02
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385726e-08 lambda=7.5449542135114234e-03
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=3.2764749100364717e-03
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=1.4228361655588414e-03
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=6.1787884492365153e-04
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=2.6831916265377548e-04
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=1.1651988987471248e-04
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=5.0599757911789984e-05
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=2.1973377296845284e-05
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=9.5421268734746417e-06
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=4.1437501409853001e-06
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=1.7994589108402447e-06
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=7.8143041004756041e-07
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=3.3934283359762142e-07
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=1.4736252548330828e-07
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=6.3993436038104693e-08
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=2.7789696481734489e-08
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=1.2067913185456743e-08
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=5.2405944320925838e-09
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=2.2757729177880525e-09
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=9.8827383810151057e-10
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=4.2916635989551390e-10
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=1.8636915940199893e-10
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=8.0932400164504977e-11
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=3.5145586412497970e-11
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=1.5262271250668997e-11
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=6.6277717206096633e-12
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=2.8781665100197773e-12
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=1.2498684035299616e-12
      Line search: Cubic step no good, shrinking lambda, current gnorm
7.088879385731e-08 lambda=5.4276603549660526e-13
      Line search: unable to find good step length! After 33 tries
      Line search: fnorm=7.0888793857309783e-08,
gnorm=7.0888793857309783e-08, ynorm=2.4650076775058285e-08,
minlambda=9.9999999999999998e-13, lambda=5.4276603549660526e-13, initial
slope=-5.0252210945441613e-15
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160113/6b6f100d/attachment-0001.html>

From david.knezevic at akselos.com  Wed Jan 13 14:51:19 2016
From: david.knezevic at akselos.com (David Knezevic)
Date: Wed, 13 Jan 2016 15:51:19 -0500
Subject: [petsc-users] SNES NEWTONLS serial vs. parallel
In-Reply-To: <CAJCWK9AhTy+fUJjht-J9uAagryczsimd6PToduq0ONQGvKjbnQ@mail.gmail.com>
References: <CAJCWK9AhTy+fUJjht-J9uAagryczsimd6PToduq0ONQGvKjbnQ@mail.gmail.com>
Message-ID: <CAJCWK9Dt+Nmq2HLQS4zB0-eOjeE6zk+_swq-1m4pD0orEur_SQ@mail.gmail.com>

Oops! I pasted the wrong text for the serial case. The correct text is
below:

*Serial case:*
  NL step  0, |residual|_2 = 4.714515e-02
      Line search: gnorm after quadratic fit 7.862867755130e-02
      Line search: Cubically determined step, current gnorm
4.663945044088e-02 lambda=1.4276549223307832e-02
  NL step  1, |residual|_2 = 4.663945e-02
      Line search: gnorm after quadratic fit 6.977268532963e-02
      Line search: Cubically determined step, current gnorm
4.594912791877e-02 lambda=2.3644826349821228e-02
  NL step  2, |residual|_2 = 4.594913e-02
      Line search: gnorm after quadratic fit 5.502067915588e-02
      Line search: Cubically determined step, current gnorm
4.494531287593e-02 lambda=4.1260496881982515e-02
  NL step  3, |residual|_2 = 4.494531e-02
      Line search: gnorm after quadratic fit 5.415371014813e-02
      Line search: Cubically determined step, current gnorm
4.392165909219e-02 lambda=3.6375617606865668e-02
  NL step  4, |residual|_2 = 4.392166e-02
      Line search: gnorm after quadratic fit 4.631663907262e-02
      Line search: Cubically determined step, current gnorm
4.246200768767e-02 lambda=5.0000000000000003e-02
  NL step  5, |residual|_2 = 4.246201e-02
      Line search: gnorm after quadratic fit 4.222105256158e-02
      Line search: Quadratically determined step,
lambda=1.0000000000000001e-01
  NL step  6, |residual|_2 = 4.222105e-02
      Line search: gnorm after quadratic fit 4.026081168915e-02
      Line search: Quadratically determined step,
lambda=1.0000000000000001e-01
  NL step  7, |residual|_2 = 4.026081e-02
      Line search: gnorm after quadratic fit 3.776439443011e-02
      Line search: Quadratically determined step,
lambda=1.0000000000000001e-01
  NL step  8, |residual|_2 = 3.776439e-02
      Line search: gnorm after quadratic fit 3.659796213553e-02
      Line search: Quadratically determined step,
lambda=1.0000000000000001e-01
  NL step  9, |residual|_2 = 3.659796e-02
      Line search: gnorm after quadratic fit 3.423207563496e-02
      Line search: Quadratically determined step,
lambda=1.0000000000000001e-01
  NL step 10, |residual|_2 = 3.423208e-02
      Line search: gnorm after quadratic fit 3.116928356075e-02
      Line search: Quadratically determined step,
lambda=1.0000000000000001e-01
  NL step 11, |residual|_2 = 3.116928e-02
      Line search: gnorm after quadratic fit 2.874310673331e-02
      Line search: Quadratically determined step,
lambda=1.0000000000000001e-01
  NL step 12, |residual|_2 = 2.874311e-02
      Line search: gnorm after quadratic fit 2.587826447631e-02
      Line search: Quadratically determined step,
lambda=1.0000000000000001e-01
  NL step 13, |residual|_2 = 2.587826e-02
      Line search: gnorm after quadratic fit 2.344160918669e-02
      Line search: Quadratically determined step,
lambda=1.0000000000000001e-01
  NL step 14, |residual|_2 = 2.344161e-02
      Line search: gnorm after quadratic fit 2.187719801063e-02
      Line search: Quadratically determined step,
lambda=1.0000000000000001e-01
  NL step 15, |residual|_2 = 2.187720e-02
      Line search: gnorm after quadratic fit 1.983089025936e-02
      Line search: Quadratically determined step,
lambda=1.0000000000000001e-01
  NL step 16, |residual|_2 = 1.983089e-02
      Line search: gnorm after quadratic fit 1.791227696650e-02
      Line search: Quadratically determined step,
lambda=1.0000000000000001e-01
  NL step 17, |residual|_2 = 1.791228e-02
      Line search: gnorm after quadratic fit 1.613250592206e-02
      Line search: Quadratically determined step,
lambda=1.0000000000000001e-01
  NL step 18, |residual|_2 = 1.613251e-02
      Line search: gnorm after quadratic fit 1.455841890804e-02
      Line search: Quadratically determined step,
lambda=1.0000000000000001e-01
  NL step 19, |residual|_2 = 1.455842e-02
      Line search: gnorm after quadratic fit 1.321849665170e-02
      Line search: Quadratically determined step,
lambda=1.0574900347563776e-01
  NL step 20, |residual|_2 = 1.321850e-02
      Line search: gnorm after quadratic fit 9.209642717528e-03
      Line search: Quadratically determined step,
lambda=3.0589679103560180e-01
  NL step 21, |residual|_2 = 9.209643e-03
      Line search: gnorm after quadratic fit 7.590944125425e-03
      Line search: Quadratically determined step,
lambda=2.0920307644146574e-01
  NL step 22, |residual|_2 = 7.590944e-03
      Line search: gnorm after quadratic fit 4.373921456388e-03
      Line search: Quadratically determined step,
lambda=4.2379743756255861e-01
  NL step 23, |residual|_2 = 4.373921e-03
      Line search: gnorm after quadratic fit 3.681355014898e-03
      Line search: Quadratically determined step,
lambda=1.9626628361883081e-01
  NL step 24, |residual|_2 = 3.681355e-03
      Line search: gnorm after quadratic fit 2.594785108727e-03
      Line search: Quadratically determined step,
lambda=3.8057573229158653e-01
  NL step 25, |residual|_2 = 2.594785e-03
      Line search: gnorm after quadratic fit 1.803191839408e-03
      Line search: Quadratically determined step,
lambda=4.3574150080610474e-01
  NL step 26, |residual|_2 = 1.803192e-03
      Line search: Using full step: fnorm 1.803191839408e-03 gnorm
9.015954497317e-04
  NL step 27, |residual|_2 = 9.015954e-04
      Line search: Using full step: fnorm 9.015954497317e-04 gnorm
1.390181456520e-13
  NL step 28, |residual|_2 = 1.390181e-13
Number of nonlinear iterations: 28


On Wed, Jan 13, 2016 at 3:48 PM, David Knezevic <david.knezevic at akselos.com>
wrote:

> I'm using NEWTONLS (with mumps for the linear solves) to do a nonlinear
> PDE solve. It converges well when I use 1 core. When I use 2 or more cores,
> the line search stagnates. I've pasted the output of
> -snes_linesearch_monitor below in these two cases.
>
> I was wondering if this implies that I must have a bug in parallel, or if
> perhaps the NEWTONLS solver can behave slightly differently in parallel?
>
> Thanks,
> David
>
>
> ---------------------------------------------------------------------------------------
>
>
>
> *Parallel case:*
>   NL step  0, |residual|_2 = 4.714515e-02
>       Line search: gnorm after quadratic fit 7.862867755323e-02
>       Line search: Cubically determined step, current gnorm
> 4.663945043239e-02 lambda=1.4276549921126183e-02
>   NL step  1, |residual|_2 = 4.663945e-02
>       Line search: gnorm after quadratic fit 6.977268575068e-02
>       Line search: Cubically determined step, current gnorm
> 4.594912794004e-02 lambda=2.3644825912085998e-02
>   NL step  2, |residual|_2 = 4.594913e-02
>       Line search: gnorm after quadratic fit 5.502067932478e-02
>       Line search: Cubically determined step, current gnorm
> 4.494531294405e-02 lambda=4.1260497615261321e-02
>   NL step  3, |residual|_2 = 4.494531e-02
>       Line search: gnorm after quadratic fit 5.415371063247e-02
>       Line search: Cubically determined step, current gnorm
> 4.392165925471e-02 lambda=3.6375618871780056e-02
>   NL step  4, |residual|_2 = 4.392166e-02
>       Line search: gnorm after quadratic fit 4.631663976615e-02
>       Line search: Cubically determined step, current gnorm
> 4.246200798775e-02 lambda=5.0000000000000003e-02
>   NL step  5, |residual|_2 = 4.246201e-02
>       Line search: gnorm after quadratic fit 4.222105321728e-02
>       Line search: Quadratically determined step,
> lambda=1.0000000000000001e-01
>   NL step  6, |residual|_2 = 4.222105e-02
>       Line search: gnorm after quadratic fit 4.026081251872e-02
>       Line search: Quadratically determined step,
> lambda=1.0000000000000001e-01
>   NL step  7, |residual|_2 = 4.026081e-02
>       Line search: gnorm after quadratic fit 3.776439532346e-02
>       Line search: Quadratically determined step,
> lambda=1.0000000000000001e-01
>   NL step  8, |residual|_2 = 3.776440e-02
>       Line search: gnorm after quadratic fit 3.659796311121e-02
>       Line search: Quadratically determined step,
> lambda=1.0000000000000001e-01
>   NL step  9, |residual|_2 = 3.659796e-02
>       Line search: gnorm after quadratic fit 3.423207664901e-02
>       Line search: Quadratically determined step,
> lambda=1.0000000000000001e-01
>   NL step 10, |residual|_2 = 3.423208e-02
>       Line search: gnorm after quadratic fit 3.116928452225e-02
>       Line search: Quadratically determined step,
> lambda=1.0000000000000001e-01
>   NL step 11, |residual|_2 = 3.116928e-02
>       Line search: gnorm after quadratic fit 2.874310955274e-02
>       Line search: Quadratically determined step,
> lambda=1.0000000000000001e-01
>   NL step 12, |residual|_2 = 2.874311e-02
>       Line search: gnorm after quadratic fit 2.587826662305e-02
>       Line search: Quadratically determined step,
> lambda=1.0000000000000001e-01
>   NL step 13, |residual|_2 = 2.587827e-02
>       Line search: gnorm after quadratic fit 2.344161073075e-02
>       Line search: Quadratically determined step,
> lambda=1.0000000000000001e-01
>   NL step 14, |residual|_2 = 2.344161e-02
>       Line search: gnorm after quadratic fit 2.187719889554e-02
>       Line search: Quadratically determined step,
> lambda=1.0000000000000001e-01
>   NL step 15, |residual|_2 = 2.187720e-02
>       Line search: gnorm after quadratic fit 1.983089075086e-02
>       Line search: Quadratically determined step,
> lambda=1.0000000000000001e-01
>   NL step 16, |residual|_2 = 1.983089e-02
>       Line search: gnorm after quadratic fit 1.791227711151e-02
>       Line search: Quadratically determined step,
> lambda=1.0000000000000001e-01
>   NL step 17, |residual|_2 = 1.791228e-02
>       Line search: gnorm after quadratic fit 1.613250573900e-02
>       Line search: Quadratically determined step,
> lambda=1.0000000000000001e-01
>   NL step 18, |residual|_2 = 1.613251e-02
>       Line search: gnorm after quadratic fit 1.455841843183e-02
>       Line search: Quadratically determined step,
> lambda=1.0000000000000001e-01
>   NL step 19, |residual|_2 = 1.455842e-02
>       Line search: gnorm after quadratic fit 1.321849780208e-02
>       Line search: Quadratically determined step,
> lambda=1.0574876450981290e-01
>   NL step 20, |residual|_2 = 1.321850e-02
>       Line search: gnorm after quadratic fit 9.209641609489e-03
>       Line search: Quadratically determined step,
> lambda=3.0589684959139674e-01
>   NL step 21, |residual|_2 = 9.209642e-03
>       Line search: gnorm after quadratic fit 7.590942028574e-03
>       Line search: Quadratically determined step,
> lambda=2.0920305898507460e-01
>   NL step 22, |residual|_2 = 7.590942e-03
>       Line search: gnorm after quadratic fit 4.373918927227e-03
>       Line search: Quadratically determined step,
> lambda=4.2379743128074154e-01
>   NL step 23, |residual|_2 = 4.373919e-03
>       Line search: gnorm after quadratic fit 3.681351665911e-03
>       Line search: Quadratically determined step,
> lambda=1.9626618428089049e-01
>   NL step 24, |residual|_2 = 3.681352e-03
>       Line search: gnorm after quadratic fit 2.594782418891e-03
>       Line search: Quadratically determined step,
> lambda=3.8057533372167579e-01
>   NL step 25, |residual|_2 = 2.594782e-03
>       Line search: gnorm after quadratic fit 1.803188279452e-03
>       Line search: Quadratically determined step,
> lambda=4.3574109448916826e-01
>   NL step 26, |residual|_2 = 1.803188e-03
>       Line search: Using full step: fnorm 1.803188279452e-03 gnorm
> 9.015947319176e-04
>   NL step 27, |residual|_2 = 9.015947e-04
>       Line search: Using full step: fnorm 9.015947319176e-04 gnorm
> 7.088879385731e-08
>   NL step 28, |residual|_2 = 7.088879e-08
>       Line search: gnorm after quadratic fit 7.088878906502e-08
>       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088878957116e-08 lambda=2.1132490715284968e-01
>       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385683e-08 lambda=9.2196195824189087e-02
>       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385711e-08 lambda=4.0004532931495446e-02
>       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385722e-08 lambda=1.7374764617622523e-02
>       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385726e-08 lambda=7.5449542135114234e-03
>       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=3.2764749100364717e-03
>       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=1.4228361655588414e-03
>       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=6.1787884492365153e-04
>       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=2.6831916265377548e-04
>       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=1.1651988987471248e-04
>       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=5.0599757911789984e-05
>       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=2.1973377296845284e-05
>       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=9.5421268734746417e-06
>       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=4.1437501409853001e-06
>       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=1.7994589108402447e-06
>       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=7.8143041004756041e-07
>       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=3.3934283359762142e-07
>       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=1.4736252548330828e-07
>       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=6.3993436038104693e-08
>       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=2.7789696481734489e-08
>       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=1.2067913185456743e-08
>       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=5.2405944320925838e-09
>       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=2.2757729177880525e-09
>       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=9.8827383810151057e-10
>       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=4.2916635989551390e-10
>       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=1.8636915940199893e-10
>       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=8.0932400164504977e-11
>       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=3.5145586412497970e-11
>       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=1.5262271250668997e-11
>       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=6.6277717206096633e-12
>       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=2.8781665100197773e-12
>       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=1.2498684035299616e-12
>       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=5.4276603549660526e-13
>       Line search: unable to find good step length! After 33 tries
>       Line search: fnorm=7.0888793857309783e-08,
> gnorm=7.0888793857309783e-08, ynorm=2.4650076775058285e-08,
> minlambda=9.9999999999999998e-13, lambda=5.4276603549660526e-13, initial
> slope=-5.0252210945441613e-15
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160113/85bf8f39/attachment-0001.html>

From bsmith at mcs.anl.gov  Wed Jan 13 15:05:49 2016
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Wed, 13 Jan 2016 15:05:49 -0600
Subject: [petsc-users] SNES NEWTONLS serial vs. parallel
In-Reply-To: <CAJCWK9Dt+Nmq2HLQS4zB0-eOjeE6zk+_swq-1m4pD0orEur_SQ@mail.gmail.com>
References: <CAJCWK9AhTy+fUJjht-J9uAagryczsimd6PToduq0ONQGvKjbnQ@mail.gmail.com>
	<CAJCWK9Dt+Nmq2HLQS4zB0-eOjeE6zk+_swq-1m4pD0orEur_SQ@mail.gmail.com>
Message-ID: <A8F4E50E-4E16-452D-B53E-796F8EEEA180@mcs.anl.gov>


  Since you are using a direct solver almost for sure a bug in your parallel function or parallel Jacobian.

   Try -snes_mf_operator   try -snes_fd    try -snes_type test  as three different approaches to see what is going on.

   Barry

> On Jan 13, 2016, at 2:51 PM, David Knezevic <david.knezevic at akselos.com> wrote:
> 
> Oops! I pasted the wrong text for the serial case. The correct text is below:
> 
> Serial case:
>   NL step  0, |residual|_2 = 4.714515e-02
>       Line search: gnorm after quadratic fit 7.862867755130e-02
>       Line search: Cubically determined step, current gnorm 4.663945044088e-02 lambda=1.4276549223307832e-02
>   NL step  1, |residual|_2 = 4.663945e-02
>       Line search: gnorm after quadratic fit 6.977268532963e-02
>       Line search: Cubically determined step, current gnorm 4.594912791877e-02 lambda=2.3644826349821228e-02
>   NL step  2, |residual|_2 = 4.594913e-02
>       Line search: gnorm after quadratic fit 5.502067915588e-02
>       Line search: Cubically determined step, current gnorm 4.494531287593e-02 lambda=4.1260496881982515e-02
>   NL step  3, |residual|_2 = 4.494531e-02
>       Line search: gnorm after quadratic fit 5.415371014813e-02
>       Line search: Cubically determined step, current gnorm 4.392165909219e-02 lambda=3.6375617606865668e-02
>   NL step  4, |residual|_2 = 4.392166e-02
>       Line search: gnorm after quadratic fit 4.631663907262e-02
>       Line search: Cubically determined step, current gnorm 4.246200768767e-02 lambda=5.0000000000000003e-02
>   NL step  5, |residual|_2 = 4.246201e-02
>       Line search: gnorm after quadratic fit 4.222105256158e-02
>       Line search: Quadratically determined step, lambda=1.0000000000000001e-01
>   NL step  6, |residual|_2 = 4.222105e-02
>       Line search: gnorm after quadratic fit 4.026081168915e-02
>       Line search: Quadratically determined step, lambda=1.0000000000000001e-01
>   NL step  7, |residual|_2 = 4.026081e-02
>       Line search: gnorm after quadratic fit 3.776439443011e-02
>       Line search: Quadratically determined step, lambda=1.0000000000000001e-01
>   NL step  8, |residual|_2 = 3.776439e-02
>       Line search: gnorm after quadratic fit 3.659796213553e-02
>       Line search: Quadratically determined step, lambda=1.0000000000000001e-01
>   NL step  9, |residual|_2 = 3.659796e-02
>       Line search: gnorm after quadratic fit 3.423207563496e-02
>       Line search: Quadratically determined step, lambda=1.0000000000000001e-01
>   NL step 10, |residual|_2 = 3.423208e-02
>       Line search: gnorm after quadratic fit 3.116928356075e-02
>       Line search: Quadratically determined step, lambda=1.0000000000000001e-01
>   NL step 11, |residual|_2 = 3.116928e-02
>       Line search: gnorm after quadratic fit 2.874310673331e-02
>       Line search: Quadratically determined step, lambda=1.0000000000000001e-01
>   NL step 12, |residual|_2 = 2.874311e-02
>       Line search: gnorm after quadratic fit 2.587826447631e-02
>       Line search: Quadratically determined step, lambda=1.0000000000000001e-01
>   NL step 13, |residual|_2 = 2.587826e-02
>       Line search: gnorm after quadratic fit 2.344160918669e-02
>       Line search: Quadratically determined step, lambda=1.0000000000000001e-01
>   NL step 14, |residual|_2 = 2.344161e-02
>       Line search: gnorm after quadratic fit 2.187719801063e-02
>       Line search: Quadratically determined step, lambda=1.0000000000000001e-01
>   NL step 15, |residual|_2 = 2.187720e-02
>       Line search: gnorm after quadratic fit 1.983089025936e-02
>       Line search: Quadratically determined step, lambda=1.0000000000000001e-01
>   NL step 16, |residual|_2 = 1.983089e-02
>       Line search: gnorm after quadratic fit 1.791227696650e-02
>       Line search: Quadratically determined step, lambda=1.0000000000000001e-01
>   NL step 17, |residual|_2 = 1.791228e-02
>       Line search: gnorm after quadratic fit 1.613250592206e-02
>       Line search: Quadratically determined step, lambda=1.0000000000000001e-01
>   NL step 18, |residual|_2 = 1.613251e-02
>       Line search: gnorm after quadratic fit 1.455841890804e-02
>       Line search: Quadratically determined step, lambda=1.0000000000000001e-01
>   NL step 19, |residual|_2 = 1.455842e-02
>       Line search: gnorm after quadratic fit 1.321849665170e-02
>       Line search: Quadratically determined step, lambda=1.0574900347563776e-01
>   NL step 20, |residual|_2 = 1.321850e-02
>       Line search: gnorm after quadratic fit 9.209642717528e-03
>       Line search: Quadratically determined step, lambda=3.0589679103560180e-01
>   NL step 21, |residual|_2 = 9.209643e-03
>       Line search: gnorm after quadratic fit 7.590944125425e-03
>       Line search: Quadratically determined step, lambda=2.0920307644146574e-01
>   NL step 22, |residual|_2 = 7.590944e-03
>       Line search: gnorm after quadratic fit 4.373921456388e-03
>       Line search: Quadratically determined step, lambda=4.2379743756255861e-01
>   NL step 23, |residual|_2 = 4.373921e-03
>       Line search: gnorm after quadratic fit 3.681355014898e-03
>       Line search: Quadratically determined step, lambda=1.9626628361883081e-01
>   NL step 24, |residual|_2 = 3.681355e-03
>       Line search: gnorm after quadratic fit 2.594785108727e-03
>       Line search: Quadratically determined step, lambda=3.8057573229158653e-01
>   NL step 25, |residual|_2 = 2.594785e-03
>       Line search: gnorm after quadratic fit 1.803191839408e-03
>       Line search: Quadratically determined step, lambda=4.3574150080610474e-01
>   NL step 26, |residual|_2 = 1.803192e-03
>       Line search: Using full step: fnorm 1.803191839408e-03 gnorm 9.015954497317e-04
>   NL step 27, |residual|_2 = 9.015954e-04
>       Line search: Using full step: fnorm 9.015954497317e-04 gnorm 1.390181456520e-13
>   NL step 28, |residual|_2 = 1.390181e-13
> Number of nonlinear iterations: 28
> 
> 
> On Wed, Jan 13, 2016 at 3:48 PM, David Knezevic <david.knezevic at akselos.com> wrote:
> I'm using NEWTONLS (with mumps for the linear solves) to do a nonlinear PDE solve. It converges well when I use 1 core. When I use 2 or more cores, the line search stagnates. I've pasted the output of -snes_linesearch_monitor below in these two cases.
> 
> I was wondering if this implies that I must have a bug in parallel, or if perhaps the NEWTONLS solver can behave slightly differently in parallel?
> 
> Thanks,
> David
> 
> ---------------------------------------------------------------------------------------
> 
> 
> 
> Parallel case:
>   NL step  0, |residual|_2 = 4.714515e-02
>       Line search: gnorm after quadratic fit 7.862867755323e-02
>       Line search: Cubically determined step, current gnorm 4.663945043239e-02 lambda=1.4276549921126183e-02
>   NL step  1, |residual|_2 = 4.663945e-02
>       Line search: gnorm after quadratic fit 6.977268575068e-02
>       Line search: Cubically determined step, current gnorm 4.594912794004e-02 lambda=2.3644825912085998e-02
>   NL step  2, |residual|_2 = 4.594913e-02
>       Line search: gnorm after quadratic fit 5.502067932478e-02
>       Line search: Cubically determined step, current gnorm 4.494531294405e-02 lambda=4.1260497615261321e-02
>   NL step  3, |residual|_2 = 4.494531e-02
>       Line search: gnorm after quadratic fit 5.415371063247e-02
>       Line search: Cubically determined step, current gnorm 4.392165925471e-02 lambda=3.6375618871780056e-02
>   NL step  4, |residual|_2 = 4.392166e-02
>       Line search: gnorm after quadratic fit 4.631663976615e-02
>       Line search: Cubically determined step, current gnorm 4.246200798775e-02 lambda=5.0000000000000003e-02
>   NL step  5, |residual|_2 = 4.246201e-02
>       Line search: gnorm after quadratic fit 4.222105321728e-02
>       Line search: Quadratically determined step, lambda=1.0000000000000001e-01
>   NL step  6, |residual|_2 = 4.222105e-02
>       Line search: gnorm after quadratic fit 4.026081251872e-02
>       Line search: Quadratically determined step, lambda=1.0000000000000001e-01
>   NL step  7, |residual|_2 = 4.026081e-02
>       Line search: gnorm after quadratic fit 3.776439532346e-02
>       Line search: Quadratically determined step, lambda=1.0000000000000001e-01
>   NL step  8, |residual|_2 = 3.776440e-02
>       Line search: gnorm after quadratic fit 3.659796311121e-02
>       Line search: Quadratically determined step, lambda=1.0000000000000001e-01
>   NL step  9, |residual|_2 = 3.659796e-02
>       Line search: gnorm after quadratic fit 3.423207664901e-02
>       Line search: Quadratically determined step, lambda=1.0000000000000001e-01
>   NL step 10, |residual|_2 = 3.423208e-02
>       Line search: gnorm after quadratic fit 3.116928452225e-02
>       Line search: Quadratically determined step, lambda=1.0000000000000001e-01
>   NL step 11, |residual|_2 = 3.116928e-02
>       Line search: gnorm after quadratic fit 2.874310955274e-02
>       Line search: Quadratically determined step, lambda=1.0000000000000001e-01
>   NL step 12, |residual|_2 = 2.874311e-02
>       Line search: gnorm after quadratic fit 2.587826662305e-02
>       Line search: Quadratically determined step, lambda=1.0000000000000001e-01
>   NL step 13, |residual|_2 = 2.587827e-02
>       Line search: gnorm after quadratic fit 2.344161073075e-02
>       Line search: Quadratically determined step, lambda=1.0000000000000001e-01
>   NL step 14, |residual|_2 = 2.344161e-02
>       Line search: gnorm after quadratic fit 2.187719889554e-02
>       Line search: Quadratically determined step, lambda=1.0000000000000001e-01
>   NL step 15, |residual|_2 = 2.187720e-02
>       Line search: gnorm after quadratic fit 1.983089075086e-02
>       Line search: Quadratically determined step, lambda=1.0000000000000001e-01
>   NL step 16, |residual|_2 = 1.983089e-02
>       Line search: gnorm after quadratic fit 1.791227711151e-02
>       Line search: Quadratically determined step, lambda=1.0000000000000001e-01
>   NL step 17, |residual|_2 = 1.791228e-02
>       Line search: gnorm after quadratic fit 1.613250573900e-02
>       Line search: Quadratically determined step, lambda=1.0000000000000001e-01
>   NL step 18, |residual|_2 = 1.613251e-02
>       Line search: gnorm after quadratic fit 1.455841843183e-02
>       Line search: Quadratically determined step, lambda=1.0000000000000001e-01
>   NL step 19, |residual|_2 = 1.455842e-02
>       Line search: gnorm after quadratic fit 1.321849780208e-02
>       Line search: Quadratically determined step, lambda=1.0574876450981290e-01
>   NL step 20, |residual|_2 = 1.321850e-02
>       Line search: gnorm after quadratic fit 9.209641609489e-03
>       Line search: Quadratically determined step, lambda=3.0589684959139674e-01
>   NL step 21, |residual|_2 = 9.209642e-03
>       Line search: gnorm after quadratic fit 7.590942028574e-03
>       Line search: Quadratically determined step, lambda=2.0920305898507460e-01
>   NL step 22, |residual|_2 = 7.590942e-03
>       Line search: gnorm after quadratic fit 4.373918927227e-03
>       Line search: Quadratically determined step, lambda=4.2379743128074154e-01
>   NL step 23, |residual|_2 = 4.373919e-03
>       Line search: gnorm after quadratic fit 3.681351665911e-03
>       Line search: Quadratically determined step, lambda=1.9626618428089049e-01
>   NL step 24, |residual|_2 = 3.681352e-03
>       Line search: gnorm after quadratic fit 2.594782418891e-03
>       Line search: Quadratically determined step, lambda=3.8057533372167579e-01
>   NL step 25, |residual|_2 = 2.594782e-03
>       Line search: gnorm after quadratic fit 1.803188279452e-03
>       Line search: Quadratically determined step, lambda=4.3574109448916826e-01
>   NL step 26, |residual|_2 = 1.803188e-03
>       Line search: Using full step: fnorm 1.803188279452e-03 gnorm 9.015947319176e-04
>   NL step 27, |residual|_2 = 9.015947e-04
>       Line search: Using full step: fnorm 9.015947319176e-04 gnorm 7.088879385731e-08
>   NL step 28, |residual|_2 = 7.088879e-08
>       Line search: gnorm after quadratic fit 7.088878906502e-08
>       Line search: Cubic step no good, shrinking lambda, current gnorm 7.088878957116e-08 lambda=2.1132490715284968e-01
>       Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385683e-08 lambda=9.2196195824189087e-02
>       Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385711e-08 lambda=4.0004532931495446e-02
>       Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385722e-08 lambda=1.7374764617622523e-02
>       Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385726e-08 lambda=7.5449542135114234e-03
>       Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=3.2764749100364717e-03
>       Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=1.4228361655588414e-03
>       Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=6.1787884492365153e-04
>       Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=2.6831916265377548e-04
>       Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=1.1651988987471248e-04
>       Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=5.0599757911789984e-05
>       Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=2.1973377296845284e-05
>       Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=9.5421268734746417e-06
>       Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=4.1437501409853001e-06
>       Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=1.7994589108402447e-06
>       Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=7.8143041004756041e-07
>       Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=3.3934283359762142e-07
>       Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=1.4736252548330828e-07
>       Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=6.3993436038104693e-08
>       Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=2.7789696481734489e-08
>       Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=1.2067913185456743e-08
>       Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=5.2405944320925838e-09
>       Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=2.2757729177880525e-09
>       Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=9.8827383810151057e-10
>       Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=4.2916635989551390e-10
>       Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=1.8636915940199893e-10
>       Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=8.0932400164504977e-11
>       Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=3.5145586412497970e-11
>       Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=1.5262271250668997e-11
>       Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=6.6277717206096633e-12
>       Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=2.8781665100197773e-12
>       Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=1.2498684035299616e-12
>       Line search: Cubic step no good, shrinking lambda, current gnorm 7.088879385731e-08 lambda=5.4276603549660526e-13
>       Line search: unable to find good step length! After 33 tries 
>       Line search: fnorm=7.0888793857309783e-08, gnorm=7.0888793857309783e-08, ynorm=2.4650076775058285e-08, minlambda=9.9999999999999998e-13, lambda=5.4276603549660526e-13, initial slope=-5.0252210945441613e-15
> 
> 


From david.knezevic at akselos.com  Wed Jan 13 15:08:01 2016
From: david.knezevic at akselos.com (David Knezevic)
Date: Wed, 13 Jan 2016 16:08:01 -0500
Subject: [petsc-users] SNES NEWTONLS serial vs. parallel
In-Reply-To: <A8F4E50E-4E16-452D-B53E-796F8EEEA180@mcs.anl.gov>
References: <CAJCWK9AhTy+fUJjht-J9uAagryczsimd6PToduq0ONQGvKjbnQ@mail.gmail.com>
	<CAJCWK9Dt+Nmq2HLQS4zB0-eOjeE6zk+_swq-1m4pD0orEur_SQ@mail.gmail.com>
	<A8F4E50E-4E16-452D-B53E-796F8EEEA180@mcs.anl.gov>
Message-ID: <CAJCWK9CCy44SgqNHcsQ=Rs5TjeCf6o9-BQ+bDuKh2tV-jj17hg@mail.gmail.com>

OK, will do, thanks.

David


On Wed, Jan 13, 2016 at 4:05 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
>   Since you are using a direct solver almost for sure a bug in your
> parallel function or parallel Jacobian.
>
>    Try -snes_mf_operator   try -snes_fd    try -snes_type test  as three
> different approaches to see what is going on.
>
>    Barry
>
> > On Jan 13, 2016, at 2:51 PM, David Knezevic <david.knezevic at akselos.com>
> wrote:
> >
> > Oops! I pasted the wrong text for the serial case. The correct text is
> below:
> >
> > Serial case:
> >   NL step  0, |residual|_2 = 4.714515e-02
> >       Line search: gnorm after quadratic fit 7.862867755130e-02
> >       Line search: Cubically determined step, current gnorm
> 4.663945044088e-02 lambda=1.4276549223307832e-02
> >   NL step  1, |residual|_2 = 4.663945e-02
> >       Line search: gnorm after quadratic fit 6.977268532963e-02
> >       Line search: Cubically determined step, current gnorm
> 4.594912791877e-02 lambda=2.3644826349821228e-02
> >   NL step  2, |residual|_2 = 4.594913e-02
> >       Line search: gnorm after quadratic fit 5.502067915588e-02
> >       Line search: Cubically determined step, current gnorm
> 4.494531287593e-02 lambda=4.1260496881982515e-02
> >   NL step  3, |residual|_2 = 4.494531e-02
> >       Line search: gnorm after quadratic fit 5.415371014813e-02
> >       Line search: Cubically determined step, current gnorm
> 4.392165909219e-02 lambda=3.6375617606865668e-02
> >   NL step  4, |residual|_2 = 4.392166e-02
> >       Line search: gnorm after quadratic fit 4.631663907262e-02
> >       Line search: Cubically determined step, current gnorm
> 4.246200768767e-02 lambda=5.0000000000000003e-02
> >   NL step  5, |residual|_2 = 4.246201e-02
> >       Line search: gnorm after quadratic fit 4.222105256158e-02
> >       Line search: Quadratically determined step,
> lambda=1.0000000000000001e-01
> >   NL step  6, |residual|_2 = 4.222105e-02
> >       Line search: gnorm after quadratic fit 4.026081168915e-02
> >       Line search: Quadratically determined step,
> lambda=1.0000000000000001e-01
> >   NL step  7, |residual|_2 = 4.026081e-02
> >       Line search: gnorm after quadratic fit 3.776439443011e-02
> >       Line search: Quadratically determined step,
> lambda=1.0000000000000001e-01
> >   NL step  8, |residual|_2 = 3.776439e-02
> >       Line search: gnorm after quadratic fit 3.659796213553e-02
> >       Line search: Quadratically determined step,
> lambda=1.0000000000000001e-01
> >   NL step  9, |residual|_2 = 3.659796e-02
> >       Line search: gnorm after quadratic fit 3.423207563496e-02
> >       Line search: Quadratically determined step,
> lambda=1.0000000000000001e-01
> >   NL step 10, |residual|_2 = 3.423208e-02
> >       Line search: gnorm after quadratic fit 3.116928356075e-02
> >       Line search: Quadratically determined step,
> lambda=1.0000000000000001e-01
> >   NL step 11, |residual|_2 = 3.116928e-02
> >       Line search: gnorm after quadratic fit 2.874310673331e-02
> >       Line search: Quadratically determined step,
> lambda=1.0000000000000001e-01
> >   NL step 12, |residual|_2 = 2.874311e-02
> >       Line search: gnorm after quadratic fit 2.587826447631e-02
> >       Line search: Quadratically determined step,
> lambda=1.0000000000000001e-01
> >   NL step 13, |residual|_2 = 2.587826e-02
> >       Line search: gnorm after quadratic fit 2.344160918669e-02
> >       Line search: Quadratically determined step,
> lambda=1.0000000000000001e-01
> >   NL step 14, |residual|_2 = 2.344161e-02
> >       Line search: gnorm after quadratic fit 2.187719801063e-02
> >       Line search: Quadratically determined step,
> lambda=1.0000000000000001e-01
> >   NL step 15, |residual|_2 = 2.187720e-02
> >       Line search: gnorm after quadratic fit 1.983089025936e-02
> >       Line search: Quadratically determined step,
> lambda=1.0000000000000001e-01
> >   NL step 16, |residual|_2 = 1.983089e-02
> >       Line search: gnorm after quadratic fit 1.791227696650e-02
> >       Line search: Quadratically determined step,
> lambda=1.0000000000000001e-01
> >   NL step 17, |residual|_2 = 1.791228e-02
> >       Line search: gnorm after quadratic fit 1.613250592206e-02
> >       Line search: Quadratically determined step,
> lambda=1.0000000000000001e-01
> >   NL step 18, |residual|_2 = 1.613251e-02
> >       Line search: gnorm after quadratic fit 1.455841890804e-02
> >       Line search: Quadratically determined step,
> lambda=1.0000000000000001e-01
> >   NL step 19, |residual|_2 = 1.455842e-02
> >       Line search: gnorm after quadratic fit 1.321849665170e-02
> >       Line search: Quadratically determined step,
> lambda=1.0574900347563776e-01
> >   NL step 20, |residual|_2 = 1.321850e-02
> >       Line search: gnorm after quadratic fit 9.209642717528e-03
> >       Line search: Quadratically determined step,
> lambda=3.0589679103560180e-01
> >   NL step 21, |residual|_2 = 9.209643e-03
> >       Line search: gnorm after quadratic fit 7.590944125425e-03
> >       Line search: Quadratically determined step,
> lambda=2.0920307644146574e-01
> >   NL step 22, |residual|_2 = 7.590944e-03
> >       Line search: gnorm after quadratic fit 4.373921456388e-03
> >       Line search: Quadratically determined step,
> lambda=4.2379743756255861e-01
> >   NL step 23, |residual|_2 = 4.373921e-03
> >       Line search: gnorm after quadratic fit 3.681355014898e-03
> >       Line search: Quadratically determined step,
> lambda=1.9626628361883081e-01
> >   NL step 24, |residual|_2 = 3.681355e-03
> >       Line search: gnorm after quadratic fit 2.594785108727e-03
> >       Line search: Quadratically determined step,
> lambda=3.8057573229158653e-01
> >   NL step 25, |residual|_2 = 2.594785e-03
> >       Line search: gnorm after quadratic fit 1.803191839408e-03
> >       Line search: Quadratically determined step,
> lambda=4.3574150080610474e-01
> >   NL step 26, |residual|_2 = 1.803192e-03
> >       Line search: Using full step: fnorm 1.803191839408e-03 gnorm
> 9.015954497317e-04
> >   NL step 27, |residual|_2 = 9.015954e-04
> >       Line search: Using full step: fnorm 9.015954497317e-04 gnorm
> 1.390181456520e-13
> >   NL step 28, |residual|_2 = 1.390181e-13
> > Number of nonlinear iterations: 28
> >
> >
> > On Wed, Jan 13, 2016 at 3:48 PM, David Knezevic <
> david.knezevic at akselos.com> wrote:
> > I'm using NEWTONLS (with mumps for the linear solves) to do a nonlinear
> PDE solve. It converges well when I use 1 core. When I use 2 or more cores,
> the line search stagnates. I've pasted the output of
> -snes_linesearch_monitor below in these two cases.
> >
> > I was wondering if this implies that I must have a bug in parallel, or
> if perhaps the NEWTONLS solver can behave slightly differently in parallel?
> >
> > Thanks,
> > David
> >
> >
> ---------------------------------------------------------------------------------------
> >
> >
> >
> > Parallel case:
> >   NL step  0, |residual|_2 = 4.714515e-02
> >       Line search: gnorm after quadratic fit 7.862867755323e-02
> >       Line search: Cubically determined step, current gnorm
> 4.663945043239e-02 lambda=1.4276549921126183e-02
> >   NL step  1, |residual|_2 = 4.663945e-02
> >       Line search: gnorm after quadratic fit 6.977268575068e-02
> >       Line search: Cubically determined step, current gnorm
> 4.594912794004e-02 lambda=2.3644825912085998e-02
> >   NL step  2, |residual|_2 = 4.594913e-02
> >       Line search: gnorm after quadratic fit 5.502067932478e-02
> >       Line search: Cubically determined step, current gnorm
> 4.494531294405e-02 lambda=4.1260497615261321e-02
> >   NL step  3, |residual|_2 = 4.494531e-02
> >       Line search: gnorm after quadratic fit 5.415371063247e-02
> >       Line search: Cubically determined step, current gnorm
> 4.392165925471e-02 lambda=3.6375618871780056e-02
> >   NL step  4, |residual|_2 = 4.392166e-02
> >       Line search: gnorm after quadratic fit 4.631663976615e-02
> >       Line search: Cubically determined step, current gnorm
> 4.246200798775e-02 lambda=5.0000000000000003e-02
> >   NL step  5, |residual|_2 = 4.246201e-02
> >       Line search: gnorm after quadratic fit 4.222105321728e-02
> >       Line search: Quadratically determined step,
> lambda=1.0000000000000001e-01
> >   NL step  6, |residual|_2 = 4.222105e-02
> >       Line search: gnorm after quadratic fit 4.026081251872e-02
> >       Line search: Quadratically determined step,
> lambda=1.0000000000000001e-01
> >   NL step  7, |residual|_2 = 4.026081e-02
> >       Line search: gnorm after quadratic fit 3.776439532346e-02
> >       Line search: Quadratically determined step,
> lambda=1.0000000000000001e-01
> >   NL step  8, |residual|_2 = 3.776440e-02
> >       Line search: gnorm after quadratic fit 3.659796311121e-02
> >       Line search: Quadratically determined step,
> lambda=1.0000000000000001e-01
> >   NL step  9, |residual|_2 = 3.659796e-02
> >       Line search: gnorm after quadratic fit 3.423207664901e-02
> >       Line search: Quadratically determined step,
> lambda=1.0000000000000001e-01
> >   NL step 10, |residual|_2 = 3.423208e-02
> >       Line search: gnorm after quadratic fit 3.116928452225e-02
> >       Line search: Quadratically determined step,
> lambda=1.0000000000000001e-01
> >   NL step 11, |residual|_2 = 3.116928e-02
> >       Line search: gnorm after quadratic fit 2.874310955274e-02
> >       Line search: Quadratically determined step,
> lambda=1.0000000000000001e-01
> >   NL step 12, |residual|_2 = 2.874311e-02
> >       Line search: gnorm after quadratic fit 2.587826662305e-02
> >       Line search: Quadratically determined step,
> lambda=1.0000000000000001e-01
> >   NL step 13, |residual|_2 = 2.587827e-02
> >       Line search: gnorm after quadratic fit 2.344161073075e-02
> >       Line search: Quadratically determined step,
> lambda=1.0000000000000001e-01
> >   NL step 14, |residual|_2 = 2.344161e-02
> >       Line search: gnorm after quadratic fit 2.187719889554e-02
> >       Line search: Quadratically determined step,
> lambda=1.0000000000000001e-01
> >   NL step 15, |residual|_2 = 2.187720e-02
> >       Line search: gnorm after quadratic fit 1.983089075086e-02
> >       Line search: Quadratically determined step,
> lambda=1.0000000000000001e-01
> >   NL step 16, |residual|_2 = 1.983089e-02
> >       Line search: gnorm after quadratic fit 1.791227711151e-02
> >       Line search: Quadratically determined step,
> lambda=1.0000000000000001e-01
> >   NL step 17, |residual|_2 = 1.791228e-02
> >       Line search: gnorm after quadratic fit 1.613250573900e-02
> >       Line search: Quadratically determined step,
> lambda=1.0000000000000001e-01
> >   NL step 18, |residual|_2 = 1.613251e-02
> >       Line search: gnorm after quadratic fit 1.455841843183e-02
> >       Line search: Quadratically determined step,
> lambda=1.0000000000000001e-01
> >   NL step 19, |residual|_2 = 1.455842e-02
> >       Line search: gnorm after quadratic fit 1.321849780208e-02
> >       Line search: Quadratically determined step,
> lambda=1.0574876450981290e-01
> >   NL step 20, |residual|_2 = 1.321850e-02
> >       Line search: gnorm after quadratic fit 9.209641609489e-03
> >       Line search: Quadratically determined step,
> lambda=3.0589684959139674e-01
> >   NL step 21, |residual|_2 = 9.209642e-03
> >       Line search: gnorm after quadratic fit 7.590942028574e-03
> >       Line search: Quadratically determined step,
> lambda=2.0920305898507460e-01
> >   NL step 22, |residual|_2 = 7.590942e-03
> >       Line search: gnorm after quadratic fit 4.373918927227e-03
> >       Line search: Quadratically determined step,
> lambda=4.2379743128074154e-01
> >   NL step 23, |residual|_2 = 4.373919e-03
> >       Line search: gnorm after quadratic fit 3.681351665911e-03
> >       Line search: Quadratically determined step,
> lambda=1.9626618428089049e-01
> >   NL step 24, |residual|_2 = 3.681352e-03
> >       Line search: gnorm after quadratic fit 2.594782418891e-03
> >       Line search: Quadratically determined step,
> lambda=3.8057533372167579e-01
> >   NL step 25, |residual|_2 = 2.594782e-03
> >       Line search: gnorm after quadratic fit 1.803188279452e-03
> >       Line search: Quadratically determined step,
> lambda=4.3574109448916826e-01
> >   NL step 26, |residual|_2 = 1.803188e-03
> >       Line search: Using full step: fnorm 1.803188279452e-03 gnorm
> 9.015947319176e-04
> >   NL step 27, |residual|_2 = 9.015947e-04
> >       Line search: Using full step: fnorm 9.015947319176e-04 gnorm
> 7.088879385731e-08
> >   NL step 28, |residual|_2 = 7.088879e-08
> >       Line search: gnorm after quadratic fit 7.088878906502e-08
> >       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088878957116e-08 lambda=2.1132490715284968e-01
> >       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385683e-08 lambda=9.2196195824189087e-02
> >       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385711e-08 lambda=4.0004532931495446e-02
> >       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385722e-08 lambda=1.7374764617622523e-02
> >       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385726e-08 lambda=7.5449542135114234e-03
> >       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=3.2764749100364717e-03
> >       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=1.4228361655588414e-03
> >       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=6.1787884492365153e-04
> >       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=2.6831916265377548e-04
> >       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=1.1651988987471248e-04
> >       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=5.0599757911789984e-05
> >       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=2.1973377296845284e-05
> >       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=9.5421268734746417e-06
> >       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=4.1437501409853001e-06
> >       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=1.7994589108402447e-06
> >       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=7.8143041004756041e-07
> >       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=3.3934283359762142e-07
> >       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=1.4736252548330828e-07
> >       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=6.3993436038104693e-08
> >       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=2.7789696481734489e-08
> >       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=1.2067913185456743e-08
> >       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=5.2405944320925838e-09
> >       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=2.2757729177880525e-09
> >       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=9.8827383810151057e-10
> >       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=4.2916635989551390e-10
> >       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=1.8636915940199893e-10
> >       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=8.0932400164504977e-11
> >       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=3.5145586412497970e-11
> >       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=1.5262271250668997e-11
> >       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=6.6277717206096633e-12
> >       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=2.8781665100197773e-12
> >       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=1.2498684035299616e-12
> >       Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.088879385731e-08 lambda=5.4276603549660526e-13
> >       Line search: unable to find good step length! After 33 tries
> >       Line search: fnorm=7.0888793857309783e-08,
> gnorm=7.0888793857309783e-08, ynorm=2.4650076775058285e-08,
> minlambda=9.9999999999999998e-13, lambda=5.4276603549660526e-13, initial
> slope=-5.0252210945441613e-15
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160113/f99c284c/attachment-0001.html>

From amneetb at live.unc.edu  Wed Jan 13 20:01:33 2016
From: amneetb at live.unc.edu (Bhalla, Amneet Pal S)
Date: Thu, 14 Jan 2016 02:01:33 +0000
Subject: [petsc-users] HPCToolKit/HPCViewer on OS X
Message-ID: <669480D3-5CB2-4A0E-B6BD-FF4B702BB5A7@ad.unc.edu>


Hi Folks,

I am trying to profile my application code that uses a lot of PETSc solvers. I am running applications on OS X - Yosemite. I am thinking
of using HPCToolKit for the purpose, but could not find a dmg package for that. I have access to a remote linux machine that has HPCToolkit
and HPCViewer installed on it ? so I just need to have a viewer on my local Mac machine to analyze the files generate by HPCToolkit.
Has anyone tried building/installing these packages on OS X?

Thanks,

? Amneet


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160114/3f277355/attachment.html>

From knepley at gmail.com  Wed Jan 13 20:22:22 2016
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 13 Jan 2016 20:22:22 -0600
Subject: [petsc-users] HPCToolKit/HPCViewer on OS X
In-Reply-To: <669480D3-5CB2-4A0E-B6BD-FF4B702BB5A7@ad.unc.edu>
References: <669480D3-5CB2-4A0E-B6BD-FF4B702BB5A7@ad.unc.edu>
Message-ID: <CAMYG4GnReTGLYVxZ5wr3EUmi9L+6UnZHpO5=0S60SPv3jbrzSw@mail.gmail.com>

On Wed, Jan 13, 2016 at 8:01 PM, Bhalla, Amneet Pal S <amneetb at live.unc.edu>
wrote:

>
> Hi Folks,
>
> I am trying to profile my application code that uses a lot of PETSc
> solvers. I am running applications on OS X - Yosemite. I am thinking
> of using HPCToolKit for the purpose, but could not find a dmg package for
> that. I have access to a remote linux machine that has HPCToolkit
> and HPCViewer installed on it ? so I just need to have a viewer on my
> local Mac machine to analyze the files generate by HPCToolkit.
> Has anyone tried building/installing these packages on OS X?
>

I have not done it on OSX. Can you mail us a -log_summary for a rough cut?
Sometimes its hard
to interpret the data avalanche from one of those tools without a simple
map.

  Thanks,

     Matt


> Thanks,
>
> ? Amneet
>
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160113/17d3d26b/attachment.html>

From bsmith at mcs.anl.gov  Wed Jan 13 20:59:05 2016
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Wed, 13 Jan 2016 20:59:05 -0600
Subject: [petsc-users] [petsc-maint] HPCToolKit/HPCViewer on OS X
In-Reply-To: <CAMYG4GnReTGLYVxZ5wr3EUmi9L+6UnZHpO5=0S60SPv3jbrzSw@mail.gmail.com>
References: <669480D3-5CB2-4A0E-B6BD-FF4B702BB5A7@ad.unc.edu>
	<CAMYG4GnReTGLYVxZ5wr3EUmi9L+6UnZHpO5=0S60SPv3jbrzSw@mail.gmail.com>
Message-ID: <6571084C-52AF-4CD4-B96A-A9EECB924060@mcs.anl.gov>


  The Instruments tool on the Mac, part of Xcode is trivial to use (you don't need to use Xcode GUI to build) and seems to provide useful information.

   Barry

> On Jan 13, 2016, at 8:22 PM, Matthew Knepley <knepley at gmail.com> wrote:
> 
> On Wed, Jan 13, 2016 at 8:01 PM, Bhalla, Amneet Pal S <amneetb at live.unc.edu> wrote:
> 
> Hi Folks,
> 
> I am trying to profile my application code that uses a lot of PETSc solvers. I am running applications on OS X - Yosemite. I am thinking 
> of using HPCToolKit for the purpose, but could not find a dmg package for that. I have access to a remote linux machine that has HPCToolkit 
> and HPCViewer installed on it ? so I just need to have a viewer on my local Mac machine to analyze the files generate by HPCToolkit.
> Has anyone tried building/installing these packages on OS X?
> 
> I have not done it on OSX. Can you mail us a -log_summary for a rough cut? Sometimes its hard
> to interpret the data avalanche from one of those tools without a simple map.
> 
>   Thanks,
> 
>      Matt
>  
> Thanks,
> 
> ? Amneet 
> 
> 
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener


From jychang48 at gmail.com  Wed Jan 13 21:05:44 2016
From: jychang48 at gmail.com (Justin Chang)
Date: Wed, 13 Jan 2016 20:05:44 -0700
Subject: [petsc-users] Difference between Block Jacobi and ILU?
Message-ID: <CAP2=TMibi=+SQ4isSOMZWzyBWCygz=G+B-TGXYz3rkn8+jOhdg@mail.gmail.com>

Hi all,

What exactly is the difference between these two preconditioners? When I
use them to solve a Galerkin finite element poisson problem, I get the
exact same performance (iterations, wall-clock time, etc). Only thing is I
can't seem to use ILU in parallel though.

Thanks,
Justin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160113/31cdd7ad/attachment.html>

From balay at mcs.anl.gov  Wed Jan 13 21:26:30 2016
From: balay at mcs.anl.gov (Satish Balay)
Date: Wed, 13 Jan 2016 21:26:30 -0600
Subject: [petsc-users] Difference between Block Jacobi and ILU?
In-Reply-To: <CAP2=TMibi=+SQ4isSOMZWzyBWCygz=G+B-TGXYz3rkn8+jOhdg@mail.gmail.com>
References: <CAP2=TMibi=+SQ4isSOMZWzyBWCygz=G+B-TGXYz3rkn8+jOhdg@mail.gmail.com>
Message-ID: <alpine.LFD.2.20.1601132122040.17714@asterix>

On Wed, 13 Jan 2016, Justin Chang wrote:

> Hi all,
> 
> What exactly is the difference between these two preconditioners? When I
> use them to solve a Galerkin finite element poisson problem, I get the
> exact same performance (iterations, wall-clock time, etc).

you mean - when you run sequentially?

With block jacobi - you decide the number of blocks. The default is 1-block/proc
i.e - for sequnetial run you have only 1block i.e  the whole matrix.

So the following are essentially the same:
-pc_type bjacobi -pc_bjacobi_blocks 1 [default] -sub_pc_type ilu [default]
-pc_type ilu

Satish

> Only thing is I can't seem to use ILU in parallel though.


From jychang48 at gmail.com  Wed Jan 13 21:37:12 2016
From: jychang48 at gmail.com (Justin Chang)
Date: Wed, 13 Jan 2016 20:37:12 -0700
Subject: [petsc-users] Difference between Block Jacobi and ILU?
In-Reply-To: <alpine.LFD.2.20.1601132122040.17714@asterix>
References: <CAP2=TMibi=+SQ4isSOMZWzyBWCygz=G+B-TGXYz3rkn8+jOhdg@mail.gmail.com>
	<alpine.LFD.2.20.1601132122040.17714@asterix>
Message-ID: <CAP2=TMhGeob799hBuVxo5imoageK1+mnL1_amftqcoCTo_RC4g@mail.gmail.com>

Thanks Satish,

And yes I meant sequentially.

On Wed, Jan 13, 2016 at 8:26 PM, Satish Balay <balay at mcs.anl.gov> wrote:

> On Wed, 13 Jan 2016, Justin Chang wrote:
>
> > Hi all,
> >
> > What exactly is the difference between these two preconditioners? When I
> > use them to solve a Galerkin finite element poisson problem, I get the
> > exact same performance (iterations, wall-clock time, etc).
>
> you mean - when you run sequentially?
>
> With block jacobi - you decide the number of blocks. The default is
> 1-block/proc
> i.e - for sequnetial run you have only 1block i.e  the whole matrix.
>
> So the following are essentially the same:
> -pc_type bjacobi -pc_bjacobi_blocks 1 [default] -sub_pc_type ilu [default]
> -pc_type ilu
>
> Satish
>
> > Only thing is I can't seem to use ILU in parallel though.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160113/424650fa/attachment.html>

From jychang48 at gmail.com  Wed Jan 13 21:57:38 2016
From: jychang48 at gmail.com (Justin Chang)
Date: Wed, 13 Jan 2016 20:57:38 -0700
Subject: [petsc-users] Why use MATMPIBAIJ?
Message-ID: <CAP2=TMiuZV+DDPk6dqPjZ5Ndy8W5C5GaEreny7_FQuZ39i1gBw@mail.gmail.com>

Hi all,

1) I am guessing MATMPIBAIJ could theoretically have better performance
than simply using MATMPIAIJ. Why is that? Is it similar to the reasoning
that block (dense) matrix-vector multiply is "faster" than simple
matrix-vector?

2) I am looking through the manual and online documentation and it seems
the term "block" used everywhere. In the section on "block matrices" (3.1.3
of the manual), it refers to field splitting, where you could either have a
monolithic matrix or a nested matrix. Does that concept have anything to do
with MATMPIBAIJ?

It makes sense to me that one could create a BAIJ where if you have 5 dofs
of the same type of physics (e.g., five different primary species of a
geochemical reaction) per grid point, you could create a block size of 5.
And if you have different physics (e.g., velocity and pressure) you would
ideally want to separate them out (i.e., nested matrices) for better
preconditioning.

Thanks,
Justin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160113/c5dbe0b1/attachment-0001.html>

From bsmith at mcs.anl.gov  Wed Jan 13 22:12:20 2016
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Wed, 13 Jan 2016 22:12:20 -0600
Subject: [petsc-users] Why use MATMPIBAIJ?
In-Reply-To: <CAP2=TMiuZV+DDPk6dqPjZ5Ndy8W5C5GaEreny7_FQuZ39i1gBw@mail.gmail.com>
References: <CAP2=TMiuZV+DDPk6dqPjZ5Ndy8W5C5GaEreny7_FQuZ39i1gBw@mail.gmail.com>
Message-ID: <2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov>


> On Jan 13, 2016, at 9:57 PM, Justin Chang <jychang48 at gmail.com> wrote:
> 
> Hi all,
> 
> 1) I am guessing MATMPIBAIJ could theoretically have better performance than simply using MATMPIAIJ. Why is that? Is it similar to the reasoning that block (dense) matrix-vector multiply is "faster" than simple matrix-vector?

  See for example table 1 in http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.38.7668&rep=rep1&type=pdf

> 
> 2) I am looking through the manual and online documentation and it seems the term "block" used everywhere. In the section on "block matrices" (3.1.3 of the manual), it refers to field splitting, where you could either have a monolithic matrix or a nested matrix. Does that concept have anything to do with MATMPIBAIJ? 

   Unfortunately the numerical analysis literature uses the term block in multiple ways. For small blocks, sometimes called "point-block" with BAIJ and for very large blocks (where the blocks are sparse themselves). I used fieldsplit for big sparse blocks to try to avoid confusion in PETSc. 
> 
> It makes sense to me that one could create a BAIJ where if you have 5 dofs of the same type of physics (e.g., five different primary species of a geochemical reaction) per grid point, you could create a block size of 5. And if you have different physics (e.g., velocity and pressure) you would ideally want to separate them out (i.e., nested matrices) for better preconditioning.

   Sometimes you put them together with BAIJ and sometimes you keep them separate with nested matrices.

> 
> Thanks,
> Justin


From jychang48 at gmail.com  Wed Jan 13 22:24:46 2016
From: jychang48 at gmail.com (Justin Chang)
Date: Wed, 13 Jan 2016 21:24:46 -0700
Subject: [petsc-users] Why use MATMPIBAIJ?
In-Reply-To: <2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov>
References: <CAP2=TMiuZV+DDPk6dqPjZ5Ndy8W5C5GaEreny7_FQuZ39i1gBw@mail.gmail.com>
	<2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov>
Message-ID: <CAP2=TMhZeZ4JZ3AKARajab8CmFwdANnh0HOHqDDQRd8JfGkO3Q@mail.gmail.com>

Thanks Barry,

1) So for block matrices, the ja array is smaller. But what's the
"hardware" explanation for this performance improvement? Does it have to do
with spatial locality where you are more likely to reuse data in that ja
array, or does it have to do with the fact that loading/storing smaller
arrays are less likely to invoke a cache miss, thus reducing the amount of
bandwidth?

2) So if one wants to assemble a monolithic matrix (i.e., aggregation of
more than one dof per point) then using the BAIJ format is highly
advisable. But if I want to form a nested matrix, say I am solving Stokes
equation, then each "submatrix" is of AIJ format? Can these sub matrices
also be BAIJ?

Thanks,
Justin

On Wed, Jan 13, 2016 at 9:12 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
> > On Jan 13, 2016, at 9:57 PM, Justin Chang <jychang48 at gmail.com> wrote:
> >
> > Hi all,
> >
> > 1) I am guessing MATMPIBAIJ could theoretically have better performance
> than simply using MATMPIAIJ. Why is that? Is it similar to the reasoning
> that block (dense) matrix-vector multiply is "faster" than simple
> matrix-vector?
>
>   See for example table 1 in
> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.38.7668&rep=rep1&type=pdf
>
> >
> > 2) I am looking through the manual and online documentation and it seems
> the term "block" used everywhere. In the section on "block matrices" (3.1.3
> of the manual), it refers to field splitting, where you could either have a
> monolithic matrix or a nested matrix. Does that concept have anything to do
> with MATMPIBAIJ?
>
>    Unfortunately the numerical analysis literature uses the term block in
> multiple ways. For small blocks, sometimes called "point-block" with BAIJ
> and for very large blocks (where the blocks are sparse themselves). I used
> fieldsplit for big sparse blocks to try to avoid confusion in PETSc.
> >
> > It makes sense to me that one could create a BAIJ where if you have 5
> dofs of the same type of physics (e.g., five different primary species of a
> geochemical reaction) per grid point, you could create a block size of 5.
> And if you have different physics (e.g., velocity and pressure) you would
> ideally want to separate them out (i.e., nested matrices) for better
> preconditioning.
>
>    Sometimes you put them together with BAIJ and sometimes you keep them
> separate with nested matrices.
>
> >
> > Thanks,
> > Justin
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160113/90bc6e1f/attachment.html>

From gideon.simpson at gmail.com  Wed Jan 13 22:42:21 2016
From: gideon.simpson at gmail.com (Gideon Simpson)
Date: Wed, 13 Jan 2016 23:42:21 -0500
Subject: [petsc-users] compiler error
Message-ID: <18F5EB28-AE2E-4E28-B95E-2D6AD1DBECEE@gmail.com>

I haven?t seen this before:

/mnt/HA/opt/openmpi/intel/2015/1.8.1-mlnx-ofed/bin/mpicc -o fixed_batch.o -c -fPIC  -wd1572 -g   -I/home/simpson/software/petsc/include -I/home/simpson/software/petsc/arch-linux2-c-debug/include -I/mnt/HA/opt/openmpi/intel/2015/1.8.1-mlnx-ofed/include   -Wall `pwd`/fixed_batch.c
/home/simpson/projects/dnls/petsc/fixed_batch.c(44): warning #167: argument of type "PetscScalar={PetscReal={double}} *" is incompatible with parameter of type "const char *"
      PetscOptionsGetScalar(NULL,"-xmax",&xmax,NULL);
                                         ^

/home/simpson/projects/dnls/petsc/fixed_batch.c(44): error #165: too few arguments in function call
      PetscOptionsGetScalar(NULL,"-xmax",&xmax,NULL);
 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160113/e8c7f51c/attachment.html>

From balay at mcs.anl.gov  Wed Jan 13 22:54:02 2016
From: balay at mcs.anl.gov (Satish Balay)
Date: Wed, 13 Jan 2016 22:54:02 -0600
Subject: [petsc-users] compiler error
In-Reply-To: <18F5EB28-AE2E-4E28-B95E-2D6AD1DBECEE@gmail.com>
References: <18F5EB28-AE2E-4E28-B95E-2D6AD1DBECEE@gmail.com>
Message-ID: <alpine.LFD.2.20.1601132253120.17714@asterix>

On Wed, 13 Jan 2016, Gideon Simpson wrote:

> I haven?t seen this before:
> 
> /mnt/HA/opt/openmpi/intel/2015/1.8.1-mlnx-ofed/bin/mpicc -o fixed_batch.o -c -fPIC  -wd1572 -g   -I/home/simpson/software/petsc/include -I/home/simpson/software/petsc/arch-linux2-c-debug/include -I/mnt/HA/opt/openmpi/intel/2015/1.8.1-mlnx-ofed/include   -Wall `pwd`/fixed_batch.c
> /home/simpson/projects/dnls/petsc/fixed_batch.c(44): warning #167: argument of type "PetscScalar={PetscReal={double}} *" is incompatible with parameter of type "const char *"
>       PetscOptionsGetScalar(NULL,"-xmax",&xmax,NULL);
>                                          ^
> 
> /home/simpson/projects/dnls/petsc/fixed_batch.c(44): error #165: too few arguments in function call
>       PetscOptionsGetScalar(NULL,"-xmax",&xmax,NULL);

Try:

      PetscOptionsGetScalar(NULL,NULL,"-xmax",&xmax,NULL);

Satish

From bsmith at mcs.anl.gov  Wed Jan 13 23:12:18 2016
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Wed, 13 Jan 2016 23:12:18 -0600
Subject: [petsc-users] Why use MATMPIBAIJ?
In-Reply-To: <CAP2=TMhZeZ4JZ3AKARajab8CmFwdANnh0HOHqDDQRd8JfGkO3Q@mail.gmail.com>
References: <CAP2=TMiuZV+DDPk6dqPjZ5Ndy8W5C5GaEreny7_FQuZ39i1gBw@mail.gmail.com>
	<2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov>
	<CAP2=TMhZeZ4JZ3AKARajab8CmFwdANnh0HOHqDDQRd8JfGkO3Q@mail.gmail.com>
Message-ID: <DAA1C465-680F-4A5E-BA8F-9C6F7C055DAD@mcs.anl.gov>


> On Jan 13, 2016, at 10:24 PM, Justin Chang <jychang48 at gmail.com> wrote:
> 
> Thanks Barry,
> 
> 1) So for block matrices, the ja array is smaller. But what's the "hardware" explanation for this performance improvement? Does it have to do with spatial locality where you are more likely to reuse data in that ja array, or does it have to do with the fact that loading/storing smaller arrays are less likely to invoke a cache miss, thus reducing the amount of bandwidth? 

There are two distinct reasons for the improvement:

1) For 5 by 5 blocks the ja array is 1/25th the size. The "hardware" savings is that you have to load something that is much smaller than before. Cache/spatial locality have nothing to do with this particular improvement.

2) The other improvement comes from the reuse of each x[j] value multiplied by 5 values (a column) of the little block. The hardware explanation is that x[j] can be reused in a register for the 5 multiplies (while otherwise it would have to come from cache to register 5 times and sometimes might even have been flushed from the cache so would have to come from memory). This is why we have code like

    for (j=0; j<n; j++) {
      xb    = x + 5*(*idx++);
      x1    = xb[0]; x2 = xb[1]; x3 = xb[2]; x4 = xb[3]; x5 = xb[4];
      sum1 += v[0]*x1 + v[5]*x2 + v[10]*x3  + v[15]*x4 + v[20]*x5;
      sum2 += v[1]*x1 + v[6]*x2 + v[11]*x3  + v[16]*x4 + v[21]*x5;
      sum3 += v[2]*x1 + v[7]*x2 + v[12]*x3  + v[17]*x4 + v[22]*x5;
      sum4 += v[3]*x1 + v[8]*x2 + v[13]*x3  + v[18]*x4 + v[23]*x5;
      sum5 += v[4]*x1 + v[9]*x2 + v[14]*x3  + v[19]*x4 + v[24]*x5;
      v    += 25;
    } 

to do the block multiple.

> 
> 2) So if one wants to assemble a monolithic matrix (i.e., aggregation of more than one dof per point) then using the BAIJ format is highly advisable. But if I want to form a nested matrix, say I am solving Stokes equation, then each "submatrix" is of AIJ format? Can these sub matrices also be BAIJ?

   Sure, but if you have separated all the variables of pressure, velocity_x, velocity_y, etc into there own regions of the vector then the block size for the sub matrices would be 1 so BAIJ does not help. 

   There are Stokes solvers that use Vanka smoothing that keep the variables interlaced and hence would use BAIJ and NOT use fieldsplit 


> 
> Thanks,
> Justin 
> 
> On Wed, Jan 13, 2016 at 9:12 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> 
> > On Jan 13, 2016, at 9:57 PM, Justin Chang <jychang48 at gmail.com> wrote:
> >
> > Hi all,
> >
> > 1) I am guessing MATMPIBAIJ could theoretically have better performance than simply using MATMPIAIJ. Why is that? Is it similar to the reasoning that block (dense) matrix-vector multiply is "faster" than simple matrix-vector?
> 
>   See for example table 1 in http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.38.7668&rep=rep1&type=pdf
> 
> >
> > 2) I am looking through the manual and online documentation and it seems the term "block" used everywhere. In the section on "block matrices" (3.1.3 of the manual), it refers to field splitting, where you could either have a monolithic matrix or a nested matrix. Does that concept have anything to do with MATMPIBAIJ?
> 
>    Unfortunately the numerical analysis literature uses the term block in multiple ways. For small blocks, sometimes called "point-block" with BAIJ and for very large blocks (where the blocks are sparse themselves). I used fieldsplit for big sparse blocks to try to avoid confusion in PETSc.
> >
> > It makes sense to me that one could create a BAIJ where if you have 5 dofs of the same type of physics (e.g., five different primary species of a geochemical reaction) per grid point, you could create a block size of 5. And if you have different physics (e.g., velocity and pressure) you would ideally want to separate them out (i.e., nested matrices) for better preconditioning.
> 
>    Sometimes you put them together with BAIJ and sometimes you keep them separate with nested matrices.
> 
> >
> > Thanks,
> > Justin
> 
> 


From amneetb at live.unc.edu  Wed Jan 13 23:12:46 2016
From: amneetb at live.unc.edu (Bhalla, Amneet Pal S)
Date: Thu, 14 Jan 2016 05:12:46 +0000
Subject: [petsc-users] HPCToolKit/HPCViewer on OS X
In-Reply-To: <CAMYG4GnReTGLYVxZ5wr3EUmi9L+6UnZHpO5=0S60SPv3jbrzSw@mail.gmail.com>
References: <669480D3-5CB2-4A0E-B6BD-FF4B702BB5A7@ad.unc.edu>
	<CAMYG4GnReTGLYVxZ5wr3EUmi9L+6UnZHpO5=0S60SPv3jbrzSw@mail.gmail.com>
Message-ID: <6780B32F-E05B-4FDB-9671-1A0C963B3B85@ad.unc.edu>


On Jan 13, 2016, at 6:22 PM, Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>> wrote:

Can you mail us a -log_summary for a rough cut? Sometimes its hard
to interpret the data avalanche from one of those tools without a simple map.

Does this indicate some hot spots?


************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./main2d on a darwin-dbg named Amneets-MBP.attlocal.net<http://Amneets-MBP.attlocal.net> with 1 processor, by Taylor Wed Jan 13 21:07:43 2016
Using Petsc Development GIT revision: v3.6.1-2556-g6721a46  GIT Date: 2015-11-16 13:07:08 -0600

                         Max       Max/Min        Avg      Total
Time (sec):           1.039e+01      1.00000   1.039e+01
Objects:              2.834e+03      1.00000   2.834e+03
Flops:                3.552e+08      1.00000   3.552e+08  3.552e+08
Flops/sec:            3.418e+07      1.00000   3.418e+07  3.418e+07
Memory:               3.949e+07      1.00000              3.949e+07
MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Reductions:       0.000e+00      0.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total
 0:      Main Stage: 1.0391e+01 100.0%  3.5520e+08 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------


      ##########################################################
      #                                                        #
      #                          WARNING!!!                    #
      #                                                        #
      #   This code was compiled with a debugging option,      #
      #   To get timing results run ./configure                #
      #   using --with-debugging=no, the performance will      #
      #   be generally two or three times faster.              #
      #                                                        #
      ##########################################################


Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecDot                 4 1.0 9.0525e-04 1.0 3.31e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    37
VecMDot              533 1.0 1.5936e-02 1.0 5.97e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0   375
VecNorm              412 1.0 9.2107e-03 1.0 3.57e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0   388
VecScale             331 1.0 5.8195e-01 1.0 1.41e+06 1.0 0.0e+00 0.0e+00 0.0e+00  6  0  0  0  0   6  0  0  0  0     2
VecCopy              116 1.0 1.9983e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet             18362 1.0 1.5249e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY              254 1.0 4.3961e-01 1.0 1.95e+06 1.0 0.0e+00 0.0e+00 0.0e+00  4  1  0  0  0   4  1  0  0  0     4
VecAYPX               92 1.0 2.5167e-03 1.0 2.66e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   106
VecAXPBYCZ            36 1.0 8.6242e-04 1.0 2.94e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   341
VecWAXPY              58 1.0 1.2539e-03 1.0 2.47e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   197
VecMAXPY             638 1.0 2.3439e-02 1.0 7.68e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0   328
VecSwap              111 1.0 1.9721e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
VecAssemblyBegin     607 1.0 3.8150e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd       607 1.0 8.3705e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin    26434 1.0 3.0096e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0     0
VecNormalize         260 1.0 4.9754e-01 1.0 3.84e+06 1.0 0.0e+00 0.0e+00 0.0e+00  5  1  0  0  0   5  1  0  0  0     8
BuildTwoSidedF       600 1.0 1.8942e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatMult              365 1.0 6.0306e-01 1.0 6.26e+07 1.0 0.0e+00 0.0e+00 0.0e+00  6 18  0  0  0   6 18  0  0  0   104
MatSolve            8775 1.0 6.8506e-01 1.0 2.25e+08 1.0 0.0e+00 0.0e+00 0.0e+00  7 63  0  0  0   7 63  0  0  0   328
MatLUFactorSym        85 1.0 1.0664e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
MatLUFactorNum        85 1.0 1.2066e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00 0.0e+00  1 12  0  0  0   1 12  0  0  0   350
MatScale               4 1.0 4.0145e-04 1.0 2.51e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   625
MatAssemblyBegin     108 1.0 4.8849e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd       108 1.0 9.8455e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetRow          33120 1.0 1.4157e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
MatGetRowIJ           85 1.0 2.6060e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetSubMatrice       4 1.0 4.2922e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering        85 1.0 3.1230e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAXPY                4 1.0 4.0459e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  4  0  0  0  0   4  0  0  0  0     0
MatPtAP                4 1.0 1.1362e-01 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1  0  0  0    44
MatPtAPSymbolic        4 1.0 6.4973e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
MatPtAPNumeric         4 1.0 4.8521e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0   103
MatGetSymTrans         4 1.0 5.9780e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPGMRESOrthog       182 1.0 2.0538e-02 1.0 5.11e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0   249
KSPSetUp              90 1.0 2.1210e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 9.5567e+00 1.0 3.50e+08 1.0 0.0e+00 0.0e+00 0.0e+00 92 98  0  0  0  92 98  0  0  0    37
PCSetUp               90 1.0 4.0597e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00 0.0e+00  4 12  0  0  0   4 12  0  0  0   104
PCSetUpOnBlocks       91 1.0 2.9886e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00 0.0e+00  3 12  0  0  0   3 12  0  0  0   141
PCApply               13 1.0 9.0558e+00 1.0 3.49e+08 1.0 0.0e+00 0.0e+00 0.0e+00 87 98  0  0  0  87 98  0  0  0    39
SNESSolve              1 1.0 9.5729e+00 1.0 3.50e+08 1.0 0.0e+00 0.0e+00 0.0e+00 92 98  0  0  0  92 98  0  0  0    37
SNESFunctionEval       2 1.0 1.3347e-02 1.0 4.68e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     4
SNESJacobianEval       1 1.0 2.4613e-03 1.0 4.26e+03 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     2
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector   870            762     13314200     0.
      Vector Scatter   290            289       189584     0.
           Index Set  1171            823       951096     0.
   IS L to G Mapping   110            109      2156656     0.
   Application Order     6              6        99952     0.
             MatMFFD     1              1          776     0.
              Matrix   189            189     24202324     0.
   Matrix Null Space     4              4         2432     0.
       Krylov Solver    90             90       190080     0.
     DMKSP interface     1              1          648     0.
      Preconditioner    90             90        89128     0.
                SNES     1              1         1328     0.
      SNESLineSearch     1              1          856     0.
              DMSNES     1              1          664     0.
    Distributed Mesh     2              2         9024     0.
Star Forest Bipartite Graph     4              4         3168     0.
     Discrete System     2              2         1696     0.
              Viewer     1              0            0     0.
========================================================================================================================
Average time to get PetscTime(): 4.74e-08
#PETSc Option Table entries:
-ib_ksp_converged_reason
-ib_ksp_monitor_true_residual
-ib_snes_type ksponly
-log_summary
-stokes_ib_pc_level_ksp_richardson_self_scae
-stokes_ib_pc_level_ksp_type gmres
-stokes_ib_pc_level_pc_asm_local_type additive
-stokes_ib_pc_level_pc_asm_type interpolate
-stokes_ib_pc_level_sub_pc_factor_nonzeros_along_diagonal
-stokes_ib_pc_level_sub_pc_type lu
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --CC=mpicc --CXX=mpicxx --FC=mpif90 --PETSC_ARCH=darwin-dbg --with-debugging=1 --with-c++-support=1 --with-hypre=1 --download-hypre=1 --with-hdf5=yes --with-hdf5-dir=/Users/Taylor/Documents/SOFTWARES/HDF5/
-----------------------------------------
Libraries compiled on Mon Nov 16 15:11:21 2015 on d209.math.ucdavis.edu<http://d209.math.ucdavis.edu>
Machine characteristics: Darwin-14.5.0-x86_64-i386-64bit
Using PETSc directory: /Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc
Using PETSc arch: darwin-dbg
-----------------------------------------

Using C compiler: mpicc    -g  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: mpif90   -g   ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------

Using include paths: -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/include -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/include -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/include -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/include -I/opt/X11/include -I/Users/Taylor/Documents/SOFTWARES/HDF5/include -I/opt/local/include -I/Users/Taylor/Documents/SOFTWARES/MPICH/include
-----------------------------------------

Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib -L/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib -lpetsc -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib -L/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib -lHYPRE -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/MPICH/lib -L/Users/Taylor/Documents/SOFTWARES/MPICH/lib -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/6.1.0/lib/darwin -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/6.1.0/lib/darwin -lclang_rt.osx -lmpicxx -lc++ -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin -lclang_rt.osx -llapack -lblas -Wl,-rpath,/opt/X11/lib -L/opt/X11/lib -lX11 -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/HDF5/lib -L/Users/Taylor/Documents/SOFTWARES/HDF5/lib -lhdf5_hl -lhdf5 -lssl -lcrypto -lmpifort -lgfortran -Wl,-rpath,/opt/local/lib/gcc49/gcc/x86_64-apple-darwin14/4.9.1 -L/opt/local/lib/gcc49/gcc/x86_64-apple-darwin14/4.9.1 -Wl,-rpath,/opt/local/lib/gcc49 -L/opt/local/lib/gcc49 -lgfortran -lgcc_ext.10.5 -lquadmath -lm -lclang_rt.osx -lmpicxx -lc++ -lclang_rt.osx -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/MPICH/lib -L/Users/Taylor/Documents/SOFTWARES/MPICH/lib -ldl -lmpi -lpmpi -lSystem -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin -lclang_rt.osx -ldl
-----------------------------------------


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160114/d1f67271/attachment-0001.html>

From boyceg at email.unc.edu  Wed Jan 13 23:17:21 2016
From: boyceg at email.unc.edu (Griffith, Boyce Eugene)
Date: Thu, 14 Jan 2016 05:17:21 +0000
Subject: [petsc-users] HPCToolKit/HPCViewer on OS X
In-Reply-To: <6780B32F-E05B-4FDB-9671-1A0C963B3B85@ad.unc.edu>
References: <669480D3-5CB2-4A0E-B6BD-FF4B702BB5A7@ad.unc.edu>
	<CAMYG4GnReTGLYVxZ5wr3EUmi9L+6UnZHpO5=0S60SPv3jbrzSw@mail.gmail.com>
	<6780B32F-E05B-4FDB-9671-1A0C963B3B85@ad.unc.edu>
Message-ID: <2A7F8BF5-7E17-4774-A13E-E75394109F95@email.unc.edu>

I see one hot spot:

On Jan 14, 2016, at 12:12 AM, Bhalla, Amneet Pal S <amneetb at live.unc.edu<mailto:amneetb at live.unc.edu>> wrote:

      ##########################################################
      #                                                        #
      #                          WARNING!!!                    #
      #                                                        #
      #   This code was compiled with a debugging option,      #
      #   To get timing results run ./configure                #
      #   using --with-debugging=no, the performance will      #
      #   be generally two or three times faster.              #
      #                                                        #
      ##########################################################


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160114/c1908786/attachment.html>

From mfadams at lbl.gov  Wed Jan 13 23:22:18 2016
From: mfadams at lbl.gov (Mark Adams)
Date: Wed, 13 Jan 2016 21:22:18 -0800
Subject: [petsc-users] osx configuration error
In-Reply-To: <alpine.LFD.2.20.1601131146190.3368@asterix>
References: <CADOhEh74XWmCXLU=4GWSrSDNWikjm3QStu1PfBLfAO1b04SFUA@mail.gmail.com>
	<alpine.LFD.2.20.1601121829290.32027@asterix>
	<CAMYG4Gn--VDEzid5ra=fir1bh3+6oAa0Vi9D98FTzSrPU+k2gA@mail.gmail.com>
	<CADOhEh7TSOhyKDdbJkyqHzyYVqEccXi1nHO2sp6BW6n5z=p1BA@mail.gmail.com>
	<alpine.LFD.2.20.1601131146190.3368@asterix>
Message-ID: <CADOhEh5gw08NwgoaovVOn-A_BHA-Tj7UO6=Z9g4totHQdyMvVg@mail.gmail.com>

Thanks Satish, this worked.

On Wed, Jan 13, 2016 at 9:49 AM, Satish Balay <balay at mcs.anl.gov> wrote:

> >>>>>>>>
> Executing: mpif90  -o
> /var/folders/sw/67cq0mmx43g93vrb5xkf1j7c0000gn/T/petsc-2z06LS/config.setCompilers/conftest
>
> /var/folders/sw/67cq0mmx43g93vrb5xkf1j7c0000gn/T/petsc-2z06LS/config.setCompilers/conftest.o
> Testing executable
> /var/folders/sw/67cq0mmx43g93vrb5xkf1j7c0000gn/T/petsc-2z06LS/config.setCompilers/conftest
> to see if it can be run
> Executing:
> /var/folders/sw/67cq0mmx43g93vrb5xkf1j7c0000gn/T/petsc-2z06LS/config.setCompilers/conftest
> Executing:
> /var/folders/sw/67cq0mmx43g93vrb5xkf1j7c0000gn/T/petsc-2z06LS/config.setCompilers/conftest
> ERROR while running executable: Could not execute
> "/var/folders/sw/67cq0mmx43g93vrb5xkf1j7c0000gn/T/petsc-2z06LS/config.setCompilers/conftest":
> dyld: Library not loaded:
> /Users/markadams/homebrew/lib/gcc/x86_64-apple-darwin13.4.0/4.9.1/libgfortran.3.dylib
>   Referenced from: /Users/markadams/homebrew/lib/libmpifort.12.dylib
>   Reason: image not found
> <<<<<<<<<<<
>
> Mostlikely you haven't reinstalled mpich - as its refering to
> gfortran-4.9.1. Current gfortran is 5.3
> GNU Fortran (Homebrew gcc 5.3.0) 5.3.0
>
>
> This is what I would do to reinstall brew
>
> 1. Make list of pkgs to reinstall
>
> brew leaves > reinstall.lst
>
> 2. delete all installed brew pacakges.
>
> brew cleanup
> brew list > delete.lst
> brew remove `cat delete.lst
>
> 3. Now reinstall all required packages
> brew update
> brew install `cat reinstall.lst`
>
>
> Satish
>
>
> On Wed, 13 Jan 2016, Mark Adams wrote:
>
> > I'm still having problems.  I have upgraded gcc and mpich. I am now
> > upgrading everything from homebrew.  Any ideas on this error?
> > thanks,
> >
> > On Wed, Jan 13, 2016 at 2:15 AM, Matthew Knepley <knepley at gmail.com>
> wrote:
> >
> > > On Tue, Jan 12, 2016 at 6:31 PM, Satish Balay <balay at mcs.anl.gov>
> wrote:
> > >
> > >> > 'file' object has no attribute 'getvalue'  File
> > >> "/Users/markadams/Codes/petsc/config/configure.py", line 363, in
> > >> petsc_configure
> > >>
> > >> Hm - have to figure this one out - but the primary issue is:
> > >>
> > >> > stderr:
> > >> > gfortran: warning: couldn't understand kern.osversion '15.2.0
> > >> > ld: -rpath can only be used when targeting Mac OS X 10.5 or later
> > >>
> > >
> > > I get this. The remedy I use is to put
> > >
> > >   MACOSX_DEPLOYMENT_TARGET=10.5
> > >
> > > in the environment. Its annoying, and quintessentially Mac.
> > >
> > >   Matt
> > >
> > >
> > >> Perhaps you've updated xcode or OSX - but did not reinstall
> brew/gfortran.
> > >>
> > >> > Executing: mpif90 --version
> > >> > stdout:
> > >> > GNU Fortran (Homebrew gcc 4.9.1) 4.9.1
> > >>
> > >> I suggest uninstalling/reinstalling homebrew packages.
> > >>
> > >> Satish
> > >>
> > >>
> > >>
> > >> On Tue, 12 Jan 2016, Mark Adams wrote:
> > >>
> > >> > I did nuke the arch directory.  This has worked in the past and
> don't
> > >> know
> > >> > what I might have changed.  it's been awhile since I've
> reconfigured.
> > >> > Thanks,
> > >> > Mark
> > >> >
> > >>
> > >>
> > >
> > >
> > > --
> > > What most experimenters take for granted before they begin their
> > > experiments is infinitely more interesting than any results to which
> their
> > > experiments lead.
> > > -- Norbert Wiener
> > >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160113/f00fa011/attachment.html>

From praveenpetsc at gmail.com  Thu Jan 14 00:03:52 2016
From: praveenpetsc at gmail.com (praveen kumar)
Date: Thu, 14 Jan 2016 11:33:52 +0530
Subject: [petsc-users] undefined reference error in make test
Message-ID: <CAJC+_cNaeW1N=XaeLzCJ0nOYK-_RPE22Kq6Wusn3OVH2jLRmvQ@mail.gmail.com>

I?ve written a fortan code (F90)  for domain decomposition.* I've specified
**the paths of include files and libraries, but the compiler/linker still *


*complained about undefined references.undefined reference to
`vectorset_'undefined reference to `dmdagetlocalinfo_'*I?m attaching
makefile and code. any help will be appreciated.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160114/473a03b7/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: makefile
Type: application/octet-stream
Size: 326 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160114/473a03b7/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.F90
Type: text/x-fortran
Size: 3078 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160114/473a03b7/attachment.bin>

From jychang48 at gmail.com  Thu Jan 14 00:05:59 2016
From: jychang48 at gmail.com (Justin Chang)
Date: Wed, 13 Jan 2016 23:05:59 -0700
Subject: [petsc-users] HPCToolKit/HPCViewer on OS X
In-Reply-To: <2A7F8BF5-7E17-4774-A13E-E75394109F95@email.unc.edu>
References: <669480D3-5CB2-4A0E-B6BD-FF4B702BB5A7@ad.unc.edu>
	<CAMYG4GnReTGLYVxZ5wr3EUmi9L+6UnZHpO5=0S60SPv3jbrzSw@mail.gmail.com>
	<6780B32F-E05B-4FDB-9671-1A0C963B3B85@ad.unc.edu>
	<2A7F8BF5-7E17-4774-A13E-E75394109F95@email.unc.edu>
Message-ID: <CAP2=TMhAH3GfH--dSyOvrU01o==ayfqVj3Jrbs692UjAg3tSaQ@mail.gmail.com>

HPCToolkit for MacOSX doesn't require any installation.

Just go to:

http://hpctoolkit.org/download/hpcviewer/

and download this file:

hpctraceviewer-5.4.2-r20160111-macosx.cocoa.x86_64.zip

Important note: be sure to unzip the file via the terminal, not with
Finder. It may screw up the GUI. On my MacOSX I had to "Download Linked
File As..."

Then you can drag the corresponding hpctraceviewer.app into your
Applications directory. Now you *should* be good to go.

Thanks,
Justin

On Wed, Jan 13, 2016 at 10:17 PM, Griffith, Boyce Eugene <
boyceg at email.unc.edu> wrote:

> I see one hot spot:
>
> On Jan 14, 2016, at 12:12 AM, Bhalla, Amneet Pal S <amneetb at live.unc.edu>
> wrote:
>
>       ##########################################################
>       #                                                        #
>       #                          WARNING!!!                    #
>       #                                                        #
>       #   This code was compiled with a debugging option,      #
>       #   To get timing results run ./configure                #
>       #   using --with-debugging=no, the performance will      #
>       #   be generally two or three times faster.              #
>       #                                                        #
>       ##########################################################
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160113/f4a42e41/attachment-0001.html>

From jychang48 at gmail.com  Thu Jan 14 00:13:31 2016
From: jychang48 at gmail.com (Justin Chang)
Date: Wed, 13 Jan 2016 23:13:31 -0700
Subject: [petsc-users] Why use MATMPIBAIJ?
In-Reply-To: <DAA1C465-680F-4A5E-BA8F-9C6F7C055DAD@mcs.anl.gov>
References: <CAP2=TMiuZV+DDPk6dqPjZ5Ndy8W5C5GaEreny7_FQuZ39i1gBw@mail.gmail.com>
	<2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov>
	<CAP2=TMhZeZ4JZ3AKARajab8CmFwdANnh0HOHqDDQRd8JfGkO3Q@mail.gmail.com>
	<DAA1C465-680F-4A5E-BA8F-9C6F7C055DAD@mcs.anl.gov>
Message-ID: <CAP2=TMjwihEL1V7NPNByiMWfWfC5Q0MJyE1ByxnLokx1Df+OJA@mail.gmail.com>

Okay that makes sense, thanks

On Wed, Jan 13, 2016 at 10:12 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
> > On Jan 13, 2016, at 10:24 PM, Justin Chang <jychang48 at gmail.com> wrote:
> >
> > Thanks Barry,
> >
> > 1) So for block matrices, the ja array is smaller. But what's the
> "hardware" explanation for this performance improvement? Does it have to do
> with spatial locality where you are more likely to reuse data in that ja
> array, or does it have to do with the fact that loading/storing smaller
> arrays are less likely to invoke a cache miss, thus reducing the amount of
> bandwidth?
>
> There are two distinct reasons for the improvement:
>
> 1) For 5 by 5 blocks the ja array is 1/25th the size. The "hardware"
> savings is that you have to load something that is much smaller than
> before. Cache/spatial locality have nothing to do with this particular
> improvement.
>
> 2) The other improvement comes from the reuse of each x[j] value
> multiplied by 5 values (a column) of the little block. The hardware
> explanation is that x[j] can be reused in a register for the 5 multiplies
> (while otherwise it would have to come from cache to register 5 times and
> sometimes might even have been flushed from the cache so would have to come
> from memory). This is why we have code like
>
>     for (j=0; j<n; j++) {
>       xb    = x + 5*(*idx++);
>       x1    = xb[0]; x2 = xb[1]; x3 = xb[2]; x4 = xb[3]; x5 = xb[4];
>       sum1 += v[0]*x1 + v[5]*x2 + v[10]*x3  + v[15]*x4 + v[20]*x5;
>       sum2 += v[1]*x1 + v[6]*x2 + v[11]*x3  + v[16]*x4 + v[21]*x5;
>       sum3 += v[2]*x1 + v[7]*x2 + v[12]*x3  + v[17]*x4 + v[22]*x5;
>       sum4 += v[3]*x1 + v[8]*x2 + v[13]*x3  + v[18]*x4 + v[23]*x5;
>       sum5 += v[4]*x1 + v[9]*x2 + v[14]*x3  + v[19]*x4 + v[24]*x5;
>       v    += 25;
>     }
>
> to do the block multiple.
>
> >
> > 2) So if one wants to assemble a monolithic matrix (i.e., aggregation of
> more than one dof per point) then using the BAIJ format is highly
> advisable. But if I want to form a nested matrix, say I am solving Stokes
> equation, then each "submatrix" is of AIJ format? Can these sub matrices
> also be BAIJ?
>
>    Sure, but if you have separated all the variables of pressure,
> velocity_x, velocity_y, etc into there own regions of the vector then the
> block size for the sub matrices would be 1 so BAIJ does not help.
>
>    There are Stokes solvers that use Vanka smoothing that keep the
> variables interlaced and hence would use BAIJ and NOT use fieldsplit
>
>
> >
> > Thanks,
> > Justin
> >
> > On Wed, Jan 13, 2016 at 9:12 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> > > On Jan 13, 2016, at 9:57 PM, Justin Chang <jychang48 at gmail.com> wrote:
> > >
> > > Hi all,
> > >
> > > 1) I am guessing MATMPIBAIJ could theoretically have better
> performance than simply using MATMPIAIJ. Why is that? Is it similar to the
> reasoning that block (dense) matrix-vector multiply is "faster" than simple
> matrix-vector?
> >
> >   See for example table 1 in
> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.38.7668&rep=rep1&type=pdf
> >
> > >
> > > 2) I am looking through the manual and online documentation and it
> seems the term "block" used everywhere. In the section on "block matrices"
> (3.1.3 of the manual), it refers to field splitting, where you could either
> have a monolithic matrix or a nested matrix. Does that concept have
> anything to do with MATMPIBAIJ?
> >
> >    Unfortunately the numerical analysis literature uses the term block
> in multiple ways. For small blocks, sometimes called "point-block" with
> BAIJ and for very large blocks (where the blocks are sparse themselves). I
> used fieldsplit for big sparse blocks to try to avoid confusion in PETSc.
> > >
> > > It makes sense to me that one could create a BAIJ where if you have 5
> dofs of the same type of physics (e.g., five different primary species of a
> geochemical reaction) per grid point, you could create a block size of 5.
> And if you have different physics (e.g., velocity and pressure) you would
> ideally want to separate them out (i.e., nested matrices) for better
> preconditioning.
> >
> >    Sometimes you put them together with BAIJ and sometimes you keep them
> separate with nested matrices.
> >
> > >
> > > Thanks,
> > > Justin
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160113/34718b07/attachment.html>

From amneetb at live.unc.edu  Thu Jan 14 00:19:42 2016
From: amneetb at live.unc.edu (Bhalla, Amneet Pal S)
Date: Thu, 14 Jan 2016 06:19:42 +0000
Subject: [petsc-users] HPCToolKit/HPCViewer on OS X
In-Reply-To: <CAP2=TMhAH3GfH--dSyOvrU01o==ayfqVj3Jrbs692UjAg3tSaQ@mail.gmail.com>
References: <669480D3-5CB2-4A0E-B6BD-FF4B702BB5A7@ad.unc.edu>
	<CAMYG4GnReTGLYVxZ5wr3EUmi9L+6UnZHpO5=0S60SPv3jbrzSw@mail.gmail.com>
	<6780B32F-E05B-4FDB-9671-1A0C963B3B85@ad.unc.edu>
	<2A7F8BF5-7E17-4774-A13E-E75394109F95@email.unc.edu>
	<CAP2=TMhAH3GfH--dSyOvrU01o==ayfqVj3Jrbs692UjAg3tSaQ@mail.gmail.com>
Message-ID: <F53DF5D3-3BD7-413D-94D2-0E2FDE6A9B03@ad.unc.edu>

Thanks! That worked for me.

On Jan 13, 2016, at 10:05 PM, Justin Chang <jychang48 at gmail.com<mailto:jychang48 at gmail.com>> wrote:

HPCToolkit for MacOSX doesn't require any installation.

Just go to:

http://hpctoolkit.org/download/hpcviewer/

and download this file:

hpctraceviewer-5.4.2-r20160111-macosx.cocoa.x86_64.zip

Important note: be sure to unzip the file via the terminal, not with Finder. It may screw up the GUI. On my MacOSX I had to "Download Linked File As..."

Then you can drag the corresponding hpctraceviewer.app into your Applications directory. Now you *should* be good to go.

Thanks,
Justin

On Wed, Jan 13, 2016 at 10:17 PM, Griffith, Boyce Eugene <boyceg at email.unc.edu<mailto:boyceg at email.unc.edu>> wrote:
I see one hot spot:

On Jan 14, 2016, at 12:12 AM, Bhalla, Amneet Pal S <amneetb at live.unc.edu<mailto:amneetb at live.unc.edu>> wrote:

      ##########################################################
      #                                                        #
      #                          WARNING!!!                    #
      #                                                        #
      #   This code was compiled with a debugging option,      #
      #   To get timing results run ./configure                #
      #   using --with-debugging=no, the performance will      #
      #   be generally two or three times faster.              #
      #                                                        #
      ##########################################################


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160114/755df188/attachment.html>

From amneetb at live.unc.edu  Thu Jan 14 01:26:57 2016
From: amneetb at live.unc.edu (Bhalla, Amneet Pal S)
Date: Thu, 14 Jan 2016 07:26:57 +0000
Subject: [petsc-users] HPCToolKit/HPCViewer on OS X
In-Reply-To: <2A7F8BF5-7E17-4774-A13E-E75394109F95@email.unc.edu>
References: <669480D3-5CB2-4A0E-B6BD-FF4B702BB5A7@ad.unc.edu>
	<CAMYG4GnReTGLYVxZ5wr3EUmi9L+6UnZHpO5=0S60SPv3jbrzSw@mail.gmail.com>
	<6780B32F-E05B-4FDB-9671-1A0C963B3B85@ad.unc.edu>
	<2A7F8BF5-7E17-4774-A13E-E75394109F95@email.unc.edu>
Message-ID: <CCFA7B8B-2682-402D-9B83-E7ECD11ED56C@ad.unc.edu>


On Jan 13, 2016, at 9:17 PM, Griffith, Boyce Eugene <boyceg at email.unc.edu<mailto:boyceg at email.unc.edu>> wrote:

I see one hot spot:


Here is with opt build

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./main2d on a linux-opt named aorta with 1 processor, by amneetb Thu Jan 14 02:24:43 2016
Using Petsc Development GIT revision: v3.6.3-3098-ga3ecda2  GIT Date: 2016-01-13 21:30:26 -0600

                         Max       Max/Min        Avg      Total
Time (sec):           1.018e+00      1.00000   1.018e+00
Objects:              2.935e+03      1.00000   2.935e+03
Flops:                4.957e+08      1.00000   4.957e+08  4.957e+08
Flops/sec:            4.868e+08      1.00000   4.868e+08  4.868e+08
MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Reductions:       0.000e+00      0.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total
 0:      Main Stage: 1.0183e+00 100.0%  4.9570e+08 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecDot                 4 1.0 2.9564e-05 1.0 3.31e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1120
VecDotNorm2          272 1.0 1.4565e-03 1.0 4.25e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  2920
VecMDot              624 1.0 8.4300e-03 1.0 5.29e+06 1.0 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1  0  0  0   627
VecNorm              565 1.0 3.8033e-03 1.0 4.38e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  1151
VecScale              86 1.0 5.5480e-04 1.0 1.55e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   279
VecCopy               28 1.0 5.2261e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet             14567 1.0 1.2443e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY              903 1.0 4.2996e-03 1.0 6.66e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  1550
VecAYPX              225 1.0 1.2550e-03 1.0 8.55e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   681
VecAXPBYCZ            42 1.0 1.7118e-04 1.0 3.45e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  2014
VecWAXPY              70 1.0 1.9503e-04 1.0 2.98e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1528
VecMAXPY             641 1.0 1.1136e-02 1.0 5.29e+06 1.0 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1  0  0  0   475
VecSwap              135 1.0 4.5896e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyBegin     745 1.0 4.9477e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd       745 1.0 9.2411e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin    40831 1.0 3.4502e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0     0
BuildTwoSidedF       738 1.0 2.6712e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatMult              513 1.0 9.1235e-02 1.0 7.75e+07 1.0 0.0e+00 0.0e+00 0.0e+00  9 16  0  0  0   9 16  0  0  0   849
MatSolve           13568 1.0 2.3605e-01 1.0 3.45e+08 1.0 0.0e+00 0.0e+00 0.0e+00 23 70  0  0  0  23 70  0  0  0  1460
MatLUFactorSym        84 1.0 3.7430e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  4  0  0  0  0   4  0  0  0  0     0
MatLUFactorNum        85 1.0 3.9623e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00  4  8  0  0  0   4  8  0  0  0  1058
MatILUFactorSym        1 1.0 3.3617e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatScale               4 1.0 2.5511e-04 1.0 2.51e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   984
MatAssemblyBegin     108 1.0 6.3658e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd       108 1.0 2.9490e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetRow          33120 1.0 2.0157e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
MatGetRowIJ           85 1.0 1.2145e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetSubMatrice       4 1.0 8.4379e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
MatGetOrdering        85 1.0 7.7887e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
MatAXPY                4 1.0 4.9596e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  5  0  0  0  0   5  0  0  0  0     0
MatPtAP                4 1.0 4.4426e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 0.0e+00  4  1  0  0  0   4  1  0  0  0   112
MatPtAPSymbolic        4 1.0 2.7664e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0     0
MatPtAPNumeric         4 1.0 1.6732e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   298
MatGetSymTrans         4 1.0 3.6621e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPGMRESOrthog        16 1.0 9.7778e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
KSPSetUp              90 1.0 5.7650e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 7.8831e-01 1.0 4.90e+08 1.0 0.0e+00 0.0e+00 0.0e+00 77 99  0  0  0  77 99  0  0  0   622
PCSetUp               90 1.0 9.9725e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00 10  8  0  0  0  10  8  0  0  0   420
PCSetUpOnBlocks      112 1.0 8.7547e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00  9  8  0  0  0   9  8  0  0  0   479
PCApply               16 1.0 7.1952e-01 1.0 4.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00 71 99  0  0  0  71 99  0  0  0   680
SNESSolve              1 1.0 7.9225e-01 1.0 4.90e+08 1.0 0.0e+00 0.0e+00 0.0e+00 78 99  0  0  0  78 99  0  0  0   619
SNESFunctionEval       2 1.0 3.2940e-03 1.0 4.68e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    14
SNESJacobianEval       1 1.0 4.7255e-04 1.0 4.26e+03 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     9
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector   971            839     15573352     0.
      Vector Scatter   290            289       189584     0.
           Index Set  1171            823       951928     0.
   IS L to G Mapping   110            109      2156656     0.
   Application Order     6              6        99952     0.
             MatMFFD     1              1          776     0.
              Matrix   189            189     24083332     0.
   Matrix Null Space     4              4         2432     0.
       Krylov Solver    90             90       122720     0.
     DMKSP interface     1              1          648     0.
      Preconditioner    90             90        89872     0.
                SNES     1              1         1328     0.
      SNESLineSearch     1              1          984     0.
              DMSNES     1              1          664     0.
    Distributed Mesh     2              2         9168     0.
Star Forest Bipartite Graph     4              4         3168     0.
     Discrete System     2              2         1712     0.
              Viewer     1              0            0     0.
========================================================================================================================
Average time to get PetscTime(): 9.53674e-07
#PETSc Option Table entries:
-ib_ksp_converged_reason
-ib_ksp_monitor_true_residual
-ib_snes_type ksponly
-log_summary
-stokes_ib_pc_level_0_sub_pc_factor_nonzeros_along_diagonal
-stokes_ib_pc_level_0_sub_pc_type ilu
-stokes_ib_pc_level_ksp_richardson_self_scale
-stokes_ib_pc_level_ksp_type richardson
-stokes_ib_pc_level_pc_asm_local_type additive
-stokes_ib_pc_level_sub_pc_factor_nonzeros_along_diagonal
-stokes_ib_pc_level_sub_pc_type lu
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --CC=mpicc --CXX=mpicxx --FC=mpif90 --with-default-arch=0 --PETSC_ARCH=linux-opt --with-debugging=0 --with-c++-support=1 --with-hypre=1 --download-hypre=1 --with-hdf5=yes --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3
-----------------------------------------
Libraries compiled on Thu Jan 14 01:29:56 2016 on aorta
Machine characteristics: Linux-3.13.0-63-generic-x86_64-with-Ubuntu-14.04-trusty
Using PETSc directory: /not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc
Using PETSc arch: linux-opt
-----------------------------------------

Using C compiler: mpicc  -fPIC  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -Qunused-arguments -O3  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: mpif90  -fPIC -Wall -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -O3   ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------

Using include paths: -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/include -I/not_backed_up/softwares/MPICH/include
-----------------------------------------

Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -L/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -lpetsc -Wl,-rpath,/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -L/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -lHYPRE -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -L/not_backed_up/softwares/MPICH/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpicxx -lstdc++ -llapack -lblas -lpthread -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lX11 -lm -lmpifort -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpicxx -lstdc++ -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -L/not_backed_up/softwares/MPICH/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -ldl -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -lmpi -lgcc_s -ldl
-----------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160114/fc05cbee/attachment-0001.html>

From hgbk2008 at gmail.com  Thu Jan 14 05:04:35 2016
From: hgbk2008 at gmail.com (Hoang Giang Bui)
Date: Thu, 14 Jan 2016 12:04:35 +0100
Subject: [petsc-users] Why use MATMPIBAIJ?
In-Reply-To: <CAP2=TMjwihEL1V7NPNByiMWfWfC5Q0MJyE1ByxnLokx1Df+OJA@mail.gmail.com>
References: <CAP2=TMiuZV+DDPk6dqPjZ5Ndy8W5C5GaEreny7_FQuZ39i1gBw@mail.gmail.com>
	<2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov>
	<CAP2=TMhZeZ4JZ3AKARajab8CmFwdANnh0HOHqDDQRd8JfGkO3Q@mail.gmail.com>
	<DAA1C465-680F-4A5E-BA8F-9C6F7C055DAD@mcs.anl.gov>
	<CAP2=TMjwihEL1V7NPNByiMWfWfC5Q0MJyE1ByxnLokx1Df+OJA@mail.gmail.com>
Message-ID: <CAJW_hKeU9MO8v+ub+5CjnaH3OpBkQNtGA8ZEiDtbYoQnSQ5rGQ@mail.gmail.com>

This is a very interesting thread because use of block matrix improves the
performance of AMG a lot. In my case is the elasticity problem.

One more question I like to ask, which is more on the performance of the
solver. That if I have a coupled problem, says the point block is [u_x u_y
u_z p] in which entries of p block in stiffness matrix is in a much smaller
scale than u (p~1e-6, u~1e+8), then AMG with hypre in PETSc still scale?
Also, is there a utility in PETSc which does automatic scaling of variables?

Giang

On Thu, Jan 14, 2016 at 7:13 AM, Justin Chang <jychang48 at gmail.com> wrote:

> Okay that makes sense, thanks
>
> On Wed, Jan 13, 2016 at 10:12 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
>>
>> > On Jan 13, 2016, at 10:24 PM, Justin Chang <jychang48 at gmail.com> wrote:
>> >
>> > Thanks Barry,
>> >
>> > 1) So for block matrices, the ja array is smaller. But what's the
>> "hardware" explanation for this performance improvement? Does it have to do
>> with spatial locality where you are more likely to reuse data in that ja
>> array, or does it have to do with the fact that loading/storing smaller
>> arrays are less likely to invoke a cache miss, thus reducing the amount of
>> bandwidth?
>>
>> There are two distinct reasons for the improvement:
>>
>> 1) For 5 by 5 blocks the ja array is 1/25th the size. The "hardware"
>> savings is that you have to load something that is much smaller than
>> before. Cache/spatial locality have nothing to do with this particular
>> improvement.
>>
>> 2) The other improvement comes from the reuse of each x[j] value
>> multiplied by 5 values (a column) of the little block. The hardware
>> explanation is that x[j] can be reused in a register for the 5 multiplies
>> (while otherwise it would have to come from cache to register 5 times and
>> sometimes might even have been flushed from the cache so would have to come
>> from memory). This is why we have code like
>>
>>     for (j=0; j<n; j++) {
>>       xb    = x + 5*(*idx++);
>>       x1    = xb[0]; x2 = xb[1]; x3 = xb[2]; x4 = xb[3]; x5 = xb[4];
>>       sum1 += v[0]*x1 + v[5]*x2 + v[10]*x3  + v[15]*x4 + v[20]*x5;
>>       sum2 += v[1]*x1 + v[6]*x2 + v[11]*x3  + v[16]*x4 + v[21]*x5;
>>       sum3 += v[2]*x1 + v[7]*x2 + v[12]*x3  + v[17]*x4 + v[22]*x5;
>>       sum4 += v[3]*x1 + v[8]*x2 + v[13]*x3  + v[18]*x4 + v[23]*x5;
>>       sum5 += v[4]*x1 + v[9]*x2 + v[14]*x3  + v[19]*x4 + v[24]*x5;
>>       v    += 25;
>>     }
>>
>> to do the block multiple.
>>
>> >
>> > 2) So if one wants to assemble a monolithic matrix (i.e., aggregation
>> of more than one dof per point) then using the BAIJ format is highly
>> advisable. But if I want to form a nested matrix, say I am solving Stokes
>> equation, then each "submatrix" is of AIJ format? Can these sub matrices
>> also be BAIJ?
>>
>>    Sure, but if you have separated all the variables of pressure,
>> velocity_x, velocity_y, etc into there own regions of the vector then the
>> block size for the sub matrices would be 1 so BAIJ does not help.
>>
>>    There are Stokes solvers that use Vanka smoothing that keep the
>> variables interlaced and hence would use BAIJ and NOT use fieldsplit
>>
>>
>> >
>> > Thanks,
>> > Justin
>> >
>> > On Wed, Jan 13, 2016 at 9:12 PM, Barry Smith <bsmith at mcs.anl.gov>
>> wrote:
>> >
>> > > On Jan 13, 2016, at 9:57 PM, Justin Chang <jychang48 at gmail.com>
>> wrote:
>> > >
>> > > Hi all,
>> > >
>> > > 1) I am guessing MATMPIBAIJ could theoretically have better
>> performance than simply using MATMPIAIJ. Why is that? Is it similar to the
>> reasoning that block (dense) matrix-vector multiply is "faster" than simple
>> matrix-vector?
>> >
>> >   See for example table 1 in
>> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.38.7668&rep=rep1&type=pdf
>> >
>> > >
>> > > 2) I am looking through the manual and online documentation and it
>> seems the term "block" used everywhere. In the section on "block matrices"
>> (3.1.3 of the manual), it refers to field splitting, where you could either
>> have a monolithic matrix or a nested matrix. Does that concept have
>> anything to do with MATMPIBAIJ?
>> >
>> >    Unfortunately the numerical analysis literature uses the term block
>> in multiple ways. For small blocks, sometimes called "point-block" with
>> BAIJ and for very large blocks (where the blocks are sparse themselves). I
>> used fieldsplit for big sparse blocks to try to avoid confusion in PETSc.
>> > >
>> > > It makes sense to me that one could create a BAIJ where if you have 5
>> dofs of the same type of physics (e.g., five different primary species of a
>> geochemical reaction) per grid point, you could create a block size of 5.
>> And if you have different physics (e.g., velocity and pressure) you would
>> ideally want to separate them out (i.e., nested matrices) for better
>> preconditioning.
>> >
>> >    Sometimes you put them together with BAIJ and sometimes you keep
>> them separate with nested matrices.
>> >
>> > >
>> > > Thanks,
>> > > Justin
>> >
>> >
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160114/2b4afd0a/attachment.html>

From knepley at gmail.com  Thu Jan 14 07:24:54 2016
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 14 Jan 2016 07:24:54 -0600
Subject: [petsc-users] HPCToolKit/HPCViewer on OS X
In-Reply-To: <6780B32F-E05B-4FDB-9671-1A0C963B3B85@ad.unc.edu>
References: <669480D3-5CB2-4A0E-B6BD-FF4B702BB5A7@ad.unc.edu>
	<CAMYG4GnReTGLYVxZ5wr3EUmi9L+6UnZHpO5=0S60SPv3jbrzSw@mail.gmail.com>
	<6780B32F-E05B-4FDB-9671-1A0C963B3B85@ad.unc.edu>
Message-ID: <CAMYG4GmkPq5Dd2qW4dNuOf+kHMBMo=MzFqcF=WC=SqhjDjrAoA@mail.gmail.com>

On Wed, Jan 13, 2016 at 11:12 PM, Bhalla, Amneet Pal S <amneetb at live.unc.edu
> wrote:

>
>
> On Jan 13, 2016, at 6:22 PM, Matthew Knepley <knepley at gmail.com> wrote:
>
> Can you mail us a -log_summary for a rough cut? Sometimes its hard
> to interpret the data avalanche from one of those tools without a simple
> map.
>
>
> Does this indicate some hot spots?
>

1) There is a misspelled option -stokes_ib_pc_level_ksp_richardson_self_scae

You can try to avoid this by giving -options_left

2) Are you using any custom code during the solve? There is a gaping whole
in the timing. It take 9s to
    do PCApply(), but something like a collective 1s to do everything we
time under that.

Since this is serial, we can use something like kcachegrind to look at
performance as well, which should
at least tell us what is sucking up this time so we can put a PETSc even on
it.

  Thanks,

     Matt


>
> ************************************************************************************************************************
> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
> -fCourier9' to print this document            ***
>
> ************************************************************************************************************************
>
> ---------------------------------------------- PETSc Performance Summary:
> ----------------------------------------------
>
> ./main2d on a darwin-dbg named Amneets-MBP.attlocal.net with 1 processor,
> by Taylor Wed Jan 13 21:07:43 2016
> Using Petsc Development GIT revision: v3.6.1-2556-g6721a46  GIT Date:
> 2015-11-16 13:07:08 -0600
>
>                          Max       Max/Min        Avg      Total
> Time (sec):           1.039e+01      1.00000   1.039e+01
> Objects:              2.834e+03      1.00000   2.834e+03
> Flops:                3.552e+08      1.00000   3.552e+08  3.552e+08
> Flops/sec:            3.418e+07      1.00000   3.418e+07  3.418e+07
> Memory:               3.949e+07      1.00000              3.949e+07
> MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
> MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
> MPI Reductions:       0.000e+00      0.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
>                             e.g., VecAXPY() for real vectors of length N
> --> 2N flops
>                             and VecAXPY() for complex vectors of length N
> --> 8N flops
>
> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages
> ---  -- Message Lengths --  -- Reductions --
>                         Avg     %Total     Avg     %Total   counts
> %Total     Avg         %Total   counts   %Total
>  0:      Main Stage: 1.0391e+01 100.0%  3.5520e+08 100.0%  0.000e+00
> 0.0%  0.000e+00        0.0%  0.000e+00   0.0%
>
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
>    Count: number of times phase was executed
>    Time and Flops: Max - maximum over all processors
>                    Ratio - ratio of maximum to minimum over all processors
>    Mess: number of messages sent
>    Avg. len: average message length (bytes)
>    Reduct: number of global reductions
>    Global: entire computation
>    Stage: stages of a computation. Set stages with PetscLogStagePush() and
> PetscLogStagePop().
>       %T - percent time in this phase         %F - percent flops in this
> phase
>       %M - percent messages in this phase     %L - percent message lengths
> in this phase
>       %R - percent reductions in this phase
>    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
> over all processors)
>
> ------------------------------------------------------------------------------------------------------------------------
>
>
>       ##########################################################
>       #                                                        #
>       #                          WARNING!!!                    #
>       #                                                        #
>       #   This code was compiled with a debugging option,      #
>       #   To get timing results run ./configure                #
>       #   using --with-debugging=no, the performance will      #
>       #   be generally two or three times faster.              #
>       #                                                        #
>       ##########################################################
>
>
> Event                Count      Time (sec)     Flops
>       --- Global ---  --- Stage ---   Total
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> VecDot                 4 1.0 9.0525e-04 1.0 3.31e+04 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0    37
> VecMDot              533 1.0 1.5936e-02 1.0 5.97e+06 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  2  0  0  0   0  2  0  0  0   375
> VecNorm              412 1.0 9.2107e-03 1.0 3.57e+06 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  1  0  0  0   0  1  0  0  0   388
> VecScale             331 1.0 5.8195e-01 1.0 1.41e+06 1.0 0.0e+00 0.0e+00
> 0.0e+00  6  0  0  0  0   6  0  0  0  0     2
> VecCopy              116 1.0 1.9983e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSet             18362 1.0 1.5249e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
> VecAXPY              254 1.0 4.3961e-01 1.0 1.95e+06 1.0 0.0e+00 0.0e+00
> 0.0e+00  4  1  0  0  0   4  1  0  0  0     4
> VecAYPX               92 1.0 2.5167e-03 1.0 2.66e+05 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0   106
> VecAXPBYCZ            36 1.0 8.6242e-04 1.0 2.94e+05 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0   341
> VecWAXPY              58 1.0 1.2539e-03 1.0 2.47e+05 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0   197
> VecMAXPY             638 1.0 2.3439e-02 1.0 7.68e+06 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  2  0  0  0   0  2  0  0  0   328
> VecSwap              111 1.0 1.9721e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
> VecAssemblyBegin     607 1.0 3.8150e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAssemblyEnd       607 1.0 8.3705e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecScatterBegin    26434 1.0 3.0096e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  3  0  0  0  0   3  0  0  0  0     0
> VecNormalize         260 1.0 4.9754e-01 1.0 3.84e+06 1.0 0.0e+00 0.0e+00
> 0.0e+00  5  1  0  0  0   5  1  0  0  0     8
> BuildTwoSidedF       600 1.0 1.8942e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatMult              365 1.0 6.0306e-01 1.0 6.26e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00  6 18  0  0  0   6 18  0  0  0   104
> MatSolve            8775 1.0 6.8506e-01 1.0 2.25e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  7 63  0  0  0   7 63  0  0  0   328
> MatLUFactorSym        85 1.0 1.0664e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
> MatLUFactorNum        85 1.0 1.2066e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00  1 12  0  0  0   1 12  0  0  0   350
> MatScale               4 1.0 4.0145e-04 1.0 2.51e+05 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0   625
> MatAssemblyBegin     108 1.0 4.8849e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyEnd       108 1.0 9.8455e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetRow          33120 1.0 1.4157e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
> MatGetRowIJ           85 1.0 2.6060e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetSubMatrice       4 1.0 4.2922e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetOrdering        85 1.0 3.1230e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAXPY                4 1.0 4.0459e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  4  0  0  0  0   4  0  0  0  0     0
> MatPtAP                4 1.0 1.1362e-01 1.0 4.99e+06 1.0 0.0e+00 0.0e+00
> 0.0e+00  1  1  0  0  0   1  1  0  0  0    44
> MatPtAPSymbolic        4 1.0 6.4973e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
> MatPtAPNumeric         4 1.0 4.8521e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  1  0  0  0   0  1  0  0  0   103
> MatGetSymTrans         4 1.0 5.9780e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPGMRESOrthog       182 1.0 2.0538e-02 1.0 5.11e+06 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  1  0  0  0   0  1  0  0  0   249
> KSPSetUp              90 1.0 2.1210e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve               1 1.0 9.5567e+00 1.0 3.50e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 92 98  0  0  0  92 98  0  0  0    37
> PCSetUp               90 1.0 4.0597e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00  4 12  0  0  0   4 12  0  0  0   104
> PCSetUpOnBlocks       91 1.0 2.9886e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00  3 12  0  0  0   3 12  0  0  0   141
> PCApply               13 1.0 9.0558e+00 1.0 3.49e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 87 98  0  0  0  87 98  0  0  0    39
> SNESSolve              1 1.0 9.5729e+00 1.0 3.50e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 92 98  0  0  0  92 98  0  0  0    37
> SNESFunctionEval       2 1.0 1.3347e-02 1.0 4.68e+04 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     4
> SNESJacobianEval       1 1.0 2.4613e-03 1.0 4.26e+03 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     2
>
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type          Creations   Destructions     Memory  Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
>               Vector   870            762     13314200     0.
>       Vector Scatter   290            289       189584     0.
>            Index Set  1171            823       951096     0.
>    IS L to G Mapping   110            109      2156656     0.
>    Application Order     6              6        99952     0.
>              MatMFFD     1              1          776     0.
>               Matrix   189            189     24202324     0.
>    Matrix Null Space     4              4         2432     0.
>        Krylov Solver    90             90       190080     0.
>      DMKSP interface     1              1          648     0.
>       Preconditioner    90             90        89128     0.
>                 SNES     1              1         1328     0.
>       SNESLineSearch     1              1          856     0.
>               DMSNES     1              1          664     0.
>     Distributed Mesh     2              2         9024     0.
> Star Forest Bipartite Graph     4              4         3168     0.
>      Discrete System     2              2         1696     0.
>               Viewer     1              0            0     0.
>
> ========================================================================================================================
> Average time to get PetscTime(): 4.74e-08
> #PETSc Option Table entries:
> -ib_ksp_converged_reason
> -ib_ksp_monitor_true_residual
> -ib_snes_type ksponly
> -log_summary
> -stokes_ib_pc_level_ksp_richardson_self_scae
> -stokes_ib_pc_level_ksp_type gmres
> -stokes_ib_pc_level_pc_asm_local_type additive
> -stokes_ib_pc_level_pc_asm_type interpolate
> -stokes_ib_pc_level_sub_pc_factor_nonzeros_along_diagonal
> -stokes_ib_pc_level_sub_pc_type lu
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> sizeof(PetscScalar) 8 sizeof(PetscInt) 4
> Configure options: --CC=mpicc --CXX=mpicxx --FC=mpif90
> --PETSC_ARCH=darwin-dbg --with-debugging=1 --with-c++-support=1
> --with-hypre=1 --download-hypre=1 --with-hdf5=yes
> --with-hdf5-dir=/Users/Taylor/Documents/SOFTWARES/HDF5/
> -----------------------------------------
> Libraries compiled on Mon Nov 16 15:11:21 2015 on d209.math.ucdavis.edu
> Machine characteristics: Darwin-14.5.0-x86_64-i386-64bit
> Using PETSc directory:
> /Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc
> Using PETSc arch: darwin-dbg
> -----------------------------------------
>
> Using C compiler: mpicc    -g  ${COPTFLAGS} ${CFLAGS}
> Using Fortran compiler: mpif90   -g   ${FOPTFLAGS} ${FFLAGS}
> -----------------------------------------
>
> Using include paths:
> -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/include
> -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/include
> -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/include
> -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/include
> -I/opt/X11/include -I/Users/Taylor/Documents/SOFTWARES/HDF5/include
> -I/opt/local/include -I/Users/Taylor/Documents/SOFTWARES/MPICH/include
> -----------------------------------------
>
> Using C linker: mpicc
> Using Fortran linker: mpif90
> Using libraries:
> -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib
> -L/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib
> -lpetsc
> -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib
> -L/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib
> -lHYPRE -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/MPICH/lib
> -L/Users/Taylor/Documents/SOFTWARES/MPICH/lib
> -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/6.1.0/lib/darwin
> -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/6.1.0/lib/darwin
> -lclang_rt.osx -lmpicxx -lc++
> -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin
> -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin
> -lclang_rt.osx -llapack -lblas -Wl,-rpath,/opt/X11/lib -L/opt/X11/lib -lX11
> -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/HDF5/lib
> -L/Users/Taylor/Documents/SOFTWARES/HDF5/lib -lhdf5_hl -lhdf5 -lssl
> -lcrypto -lmpifort -lgfortran
> -Wl,-rpath,/opt/local/lib/gcc49/gcc/x86_64-apple-darwin14/4.9.1
> -L/opt/local/lib/gcc49/gcc/x86_64-apple-darwin14/4.9.1
> -Wl,-rpath,/opt/local/lib/gcc49 -L/opt/local/lib/gcc49 -lgfortran
> -lgcc_ext.10.5 -lquadmath -lm -lclang_rt.osx -lmpicxx -lc++ -lclang_rt.osx
> -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/MPICH/lib
> -L/Users/Taylor/Documents/SOFTWARES/MPICH/lib -ldl -lmpi -lpmpi -lSystem
> -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin
> -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin
> -lclang_rt.osx -ldl
> -----------------------------------------
>
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160114/30fe6e7a/attachment-0001.html>

From dave.mayhem23 at gmail.com  Thu Jan 14 07:37:11 2016
From: dave.mayhem23 at gmail.com (Dave May)
Date: Thu, 14 Jan 2016 14:37:11 +0100
Subject: [petsc-users] HPCToolKit/HPCViewer on OS X
In-Reply-To: <CAMYG4GmkPq5Dd2qW4dNuOf+kHMBMo=MzFqcF=WC=SqhjDjrAoA@mail.gmail.com>
References: <669480D3-5CB2-4A0E-B6BD-FF4B702BB5A7@ad.unc.edu>
	<CAMYG4GnReTGLYVxZ5wr3EUmi9L+6UnZHpO5=0S60SPv3jbrzSw@mail.gmail.com>
	<6780B32F-E05B-4FDB-9671-1A0C963B3B85@ad.unc.edu>
	<CAMYG4GmkPq5Dd2qW4dNuOf+kHMBMo=MzFqcF=WC=SqhjDjrAoA@mail.gmail.com>
Message-ID: <CAJ98EDorzL4bG68f7jgzp2o0qR5BQGtKYNG3QDMFFux21+4-Ew@mail.gmail.com>

On 14 January 2016 at 14:24, Matthew Knepley <knepley at gmail.com> wrote:

> On Wed, Jan 13, 2016 at 11:12 PM, Bhalla, Amneet Pal S <
> amneetb at live.unc.edu> wrote:
>
>>
>>
>> On Jan 13, 2016, at 6:22 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>
>> Can you mail us a -log_summary for a rough cut? Sometimes its hard
>> to interpret the data avalanche from one of those tools without a simple
>> map.
>>
>>
>> Does this indicate some hot spots?
>>
>
> 1) There is a misspelled option -stokes_ib_pc_level_ksp_
> richardson_self_scae
>
> You can try to avoid this by giving -options_left
>
> 2) Are you using any custom code during the solve? There is a gaping whole
> in the timing. It take 9s to
>     do PCApply(), but something like a collective 1s to do everything we
> time under that.
>


You are looking at the timing from a debug build.
The results from the optimized build don't have such a gaping hole.


>
> Since this is serial, we can use something like kcachegrind to look at
> performance as well, which should
> at least tell us what is sucking up this time so we can put a PETSc even
> on it.
>
>   Thanks,
>
>      Matt
>
>
>
>>
>> ************************************************************************************************************************
>> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
>> -fCourier9' to print this document            ***
>>
>> ************************************************************************************************************************
>>
>> ---------------------------------------------- PETSc Performance Summary:
>> ----------------------------------------------
>>
>> ./main2d on a darwin-dbg named Amneets-MBP.attlocal.net with 1
>> processor, by Taylor Wed Jan 13 21:07:43 2016
>> Using Petsc Development GIT revision: v3.6.1-2556-g6721a46  GIT Date:
>> 2015-11-16 13:07:08 -0600
>>
>>                          Max       Max/Min        Avg      Total
>> Time (sec):           1.039e+01      1.00000   1.039e+01
>> Objects:              2.834e+03      1.00000   2.834e+03
>> Flops:                3.552e+08      1.00000   3.552e+08  3.552e+08
>> Flops/sec:            3.418e+07      1.00000   3.418e+07  3.418e+07
>> Memory:               3.949e+07      1.00000              3.949e+07
>> MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
>> MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
>> MPI Reductions:       0.000e+00      0.00000
>>
>> Flop counting convention: 1 flop = 1 real number operation of type
>> (multiply/divide/add/subtract)
>>                             e.g., VecAXPY() for real vectors of length N
>> --> 2N flops
>>                             and VecAXPY() for complex vectors of length N
>> --> 8N flops
>>
>> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages
>> ---  -- Message Lengths --  -- Reductions --
>>                         Avg     %Total     Avg     %Total   counts
>> %Total     Avg         %Total   counts   %Total
>>  0:      Main Stage: 1.0391e+01 100.0%  3.5520e+08 100.0%  0.000e+00
>> 0.0%  0.000e+00        0.0%  0.000e+00   0.0%
>>
>>
>> ------------------------------------------------------------------------------------------------------------------------
>> See the 'Profiling' chapter of the users' manual for details on
>> interpreting output.
>> Phase summary info:
>>    Count: number of times phase was executed
>>    Time and Flops: Max - maximum over all processors
>>                    Ratio - ratio of maximum to minimum over all processors
>>    Mess: number of messages sent
>>    Avg. len: average message length (bytes)
>>    Reduct: number of global reductions
>>    Global: entire computation
>>    Stage: stages of a computation. Set stages with PetscLogStagePush()
>> and PetscLogStagePop().
>>       %T - percent time in this phase         %F - percent flops in this
>> phase
>>       %M - percent messages in this phase     %L - percent message
>> lengths in this phase
>>       %R - percent reductions in this phase
>>    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
>> over all processors)
>>
>> ------------------------------------------------------------------------------------------------------------------------
>>
>>
>>       ##########################################################
>>       #                                                        #
>>       #                          WARNING!!!                    #
>>       #                                                        #
>>       #   This code was compiled with a debugging option,      #
>>       #   To get timing results run ./configure                #
>>       #   using --with-debugging=no, the performance will      #
>>       #   be generally two or three times faster.              #
>>       #                                                        #
>>       ##########################################################
>>
>>
>> Event                Count      Time (sec)     Flops
>>         --- Global ---  --- Stage ---   Total
>>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
>> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>>
>> ------------------------------------------------------------------------------------------------------------------------
>>
>> --- Event Stage 0: Main Stage
>>
>> VecDot                 4 1.0 9.0525e-04 1.0 3.31e+04 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0    37
>> VecMDot              533 1.0 1.5936e-02 1.0 5.97e+06 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  2  0  0  0   0  2  0  0  0   375
>> VecNorm              412 1.0 9.2107e-03 1.0 3.57e+06 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  1  0  0  0   0  1  0  0  0   388
>> VecScale             331 1.0 5.8195e-01 1.0 1.41e+06 1.0 0.0e+00 0.0e+00
>> 0.0e+00  6  0  0  0  0   6  0  0  0  0     2
>> VecCopy              116 1.0 1.9983e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecSet             18362 1.0 1.5249e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>> VecAXPY              254 1.0 4.3961e-01 1.0 1.95e+06 1.0 0.0e+00 0.0e+00
>> 0.0e+00  4  1  0  0  0   4  1  0  0  0     4
>> VecAYPX               92 1.0 2.5167e-03 1.0 2.66e+05 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0   106
>> VecAXPBYCZ            36 1.0 8.6242e-04 1.0 2.94e+05 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0   341
>> VecWAXPY              58 1.0 1.2539e-03 1.0 2.47e+05 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0   197
>> VecMAXPY             638 1.0 2.3439e-02 1.0 7.68e+06 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  2  0  0  0   0  2  0  0  0   328
>> VecSwap              111 1.0 1.9721e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
>> VecAssemblyBegin     607 1.0 3.8150e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecAssemblyEnd       607 1.0 8.3705e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecScatterBegin    26434 1.0 3.0096e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  3  0  0  0  0   3  0  0  0  0     0
>> VecNormalize         260 1.0 4.9754e-01 1.0 3.84e+06 1.0 0.0e+00 0.0e+00
>> 0.0e+00  5  1  0  0  0   5  1  0  0  0     8
>> BuildTwoSidedF       600 1.0 1.8942e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatMult              365 1.0 6.0306e-01 1.0 6.26e+07 1.0 0.0e+00 0.0e+00
>> 0.0e+00  6 18  0  0  0   6 18  0  0  0   104
>> MatSolve            8775 1.0 6.8506e-01 1.0 2.25e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00  7 63  0  0  0   7 63  0  0  0   328
>> MatLUFactorSym        85 1.0 1.0664e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>> MatLUFactorNum        85 1.0 1.2066e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00
>> 0.0e+00  1 12  0  0  0   1 12  0  0  0   350
>> MatScale               4 1.0 4.0145e-04 1.0 2.51e+05 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0   625
>> MatAssemblyBegin     108 1.0 4.8849e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatAssemblyEnd       108 1.0 9.8455e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatGetRow          33120 1.0 1.4157e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>> MatGetRowIJ           85 1.0 2.6060e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatGetSubMatrice       4 1.0 4.2922e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatGetOrdering        85 1.0 3.1230e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatAXPY                4 1.0 4.0459e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  4  0  0  0  0   4  0  0  0  0     0
>> MatPtAP                4 1.0 1.1362e-01 1.0 4.99e+06 1.0 0.0e+00 0.0e+00
>> 0.0e+00  1  1  0  0  0   1  1  0  0  0    44
>> MatPtAPSymbolic        4 1.0 6.4973e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>> MatPtAPNumeric         4 1.0 4.8521e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  1  0  0  0   0  1  0  0  0   103
>> MatGetSymTrans         4 1.0 5.9780e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> KSPGMRESOrthog       182 1.0 2.0538e-02 1.0 5.11e+06 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  1  0  0  0   0  1  0  0  0   249
>> KSPSetUp              90 1.0 2.1210e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> KSPSolve               1 1.0 9.5567e+00 1.0 3.50e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00 92 98  0  0  0  92 98  0  0  0    37
>> PCSetUp               90 1.0 4.0597e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00
>> 0.0e+00  4 12  0  0  0   4 12  0  0  0   104
>> PCSetUpOnBlocks       91 1.0 2.9886e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00
>> 0.0e+00  3 12  0  0  0   3 12  0  0  0   141
>> PCApply               13 1.0 9.0558e+00 1.0 3.49e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00 87 98  0  0  0  87 98  0  0  0    39
>> SNESSolve              1 1.0 9.5729e+00 1.0 3.50e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00 92 98  0  0  0  92 98  0  0  0    37
>> SNESFunctionEval       2 1.0 1.3347e-02 1.0 4.68e+04 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     4
>> SNESJacobianEval       1 1.0 2.4613e-03 1.0 4.26e+03 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     2
>>
>> ------------------------------------------------------------------------------------------------------------------------
>>
>> Memory usage is given in bytes:
>>
>> Object Type          Creations   Destructions     Memory  Descendants'
>> Mem.
>> Reports information only for process 0.
>>
>> --- Event Stage 0: Main Stage
>>
>>               Vector   870            762     13314200     0.
>>       Vector Scatter   290            289       189584     0.
>>            Index Set  1171            823       951096     0.
>>    IS L to G Mapping   110            109      2156656     0.
>>    Application Order     6              6        99952     0.
>>              MatMFFD     1              1          776     0.
>>               Matrix   189            189     24202324     0.
>>    Matrix Null Space     4              4         2432     0.
>>        Krylov Solver    90             90       190080     0.
>>      DMKSP interface     1              1          648     0.
>>       Preconditioner    90             90        89128     0.
>>                 SNES     1              1         1328     0.
>>       SNESLineSearch     1              1          856     0.
>>               DMSNES     1              1          664     0.
>>     Distributed Mesh     2              2         9024     0.
>> Star Forest Bipartite Graph     4              4         3168     0.
>>      Discrete System     2              2         1696     0.
>>               Viewer     1              0            0     0.
>>
>> ========================================================================================================================
>> Average time to get PetscTime(): 4.74e-08
>> #PETSc Option Table entries:
>> -ib_ksp_converged_reason
>> -ib_ksp_monitor_true_residual
>> -ib_snes_type ksponly
>> -log_summary
>> -stokes_ib_pc_level_ksp_richardson_self_scae
>> -stokes_ib_pc_level_ksp_type gmres
>> -stokes_ib_pc_level_pc_asm_local_type additive
>> -stokes_ib_pc_level_pc_asm_type interpolate
>> -stokes_ib_pc_level_sub_pc_factor_nonzeros_along_diagonal
>> -stokes_ib_pc_level_sub_pc_type lu
>> #End of PETSc Option Table entries
>> Compiled without FORTRAN kernels
>> Compiled with full precision matrices (default)
>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
>> sizeof(PetscScalar) 8 sizeof(PetscInt) 4
>> Configure options: --CC=mpicc --CXX=mpicxx --FC=mpif90
>> --PETSC_ARCH=darwin-dbg --with-debugging=1 --with-c++-support=1
>> --with-hypre=1 --download-hypre=1 --with-hdf5=yes
>> --with-hdf5-dir=/Users/Taylor/Documents/SOFTWARES/HDF5/
>> -----------------------------------------
>> Libraries compiled on Mon Nov 16 15:11:21 2015 on d209.math.ucdavis.edu
>> Machine characteristics: Darwin-14.5.0-x86_64-i386-64bit
>> Using PETSc directory:
>> /Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc
>> Using PETSc arch: darwin-dbg
>> -----------------------------------------
>>
>> Using C compiler: mpicc    -g  ${COPTFLAGS} ${CFLAGS}
>> Using Fortran compiler: mpif90   -g   ${FOPTFLAGS} ${FFLAGS}
>> -----------------------------------------
>>
>> Using include paths:
>> -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/include
>> -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/include
>> -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/include
>> -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/include
>> -I/opt/X11/include -I/Users/Taylor/Documents/SOFTWARES/HDF5/include
>> -I/opt/local/include -I/Users/Taylor/Documents/SOFTWARES/MPICH/include
>> -----------------------------------------
>>
>> Using C linker: mpicc
>> Using Fortran linker: mpif90
>> Using libraries:
>> -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib
>> -L/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib
>> -lpetsc
>> -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib
>> -L/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib
>> -lHYPRE -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/MPICH/lib
>> -L/Users/Taylor/Documents/SOFTWARES/MPICH/lib
>> -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/6.1.0/lib/darwin
>> -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/6.1.0/lib/darwin
>> -lclang_rt.osx -lmpicxx -lc++
>> -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin
>> -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin
>> -lclang_rt.osx -llapack -lblas -Wl,-rpath,/opt/X11/lib -L/opt/X11/lib -lX11
>> -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/HDF5/lib
>> -L/Users/Taylor/Documents/SOFTWARES/HDF5/lib -lhdf5_hl -lhdf5 -lssl
>> -lcrypto -lmpifort -lgfortran
>> -Wl,-rpath,/opt/local/lib/gcc49/gcc/x86_64-apple-darwin14/4.9.1
>> -L/opt/local/lib/gcc49/gcc/x86_64-apple-darwin14/4.9.1
>> -Wl,-rpath,/opt/local/lib/gcc49 -L/opt/local/lib/gcc49 -lgfortran
>> -lgcc_ext.10.5 -lquadmath -lm -lclang_rt.osx -lmpicxx -lc++ -lclang_rt.osx
>> -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/MPICH/lib
>> -L/Users/Taylor/Documents/SOFTWARES/MPICH/lib -ldl -lmpi -lpmpi -lSystem
>> -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin
>> -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin
>> -lclang_rt.osx -ldl
>> -----------------------------------------
>>
>>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160114/1153c83d/attachment-0001.html>

From gideon.simpson at gmail.com  Thu Jan 14 07:39:17 2016
From: gideon.simpson at gmail.com (Gideon Simpson)
Date: Thu, 14 Jan 2016 08:39:17 -0500
Subject: [petsc-users] compiler error
In-Reply-To: <alpine.LFD.2.20.1601132253120.17714@asterix>
References: <18F5EB28-AE2E-4E28-B95E-2D6AD1DBECEE@gmail.com>
	<alpine.LFD.2.20.1601132253120.17714@asterix>
Message-ID: <DC4DEF9E-A500-41F5-901D-75CEDB981956@gmail.com>

I know I did a git pull recently, but when did that change?  What?s the fifth argument represent?

-gideon

> On Jan 13, 2016, at 11:54 PM, Satish Balay <balay at mcs.anl.gov> wrote:
> 
> On Wed, 13 Jan 2016, Gideon Simpson wrote:
> 
>> I haven?t seen this before:
>> 
>> /mnt/HA/opt/openmpi/intel/2015/1.8.1-mlnx-ofed/bin/mpicc -o fixed_batch.o -c -fPIC  -wd1572 -g   -I/home/simpson/software/petsc/include -I/home/simpson/software/petsc/arch-linux2-c-debug/include -I/mnt/HA/opt/openmpi/intel/2015/1.8.1-mlnx-ofed/include   -Wall `pwd`/fixed_batch.c
>> /home/simpson/projects/dnls/petsc/fixed_batch.c(44): warning #167: argument of type "PetscScalar={PetscReal={double}} *" is incompatible with parameter of type "const char *"
>>      PetscOptionsGetScalar(NULL,"-xmax",&xmax,NULL);
>>                                         ^
>> 
>> /home/simpson/projects/dnls/petsc/fixed_batch.c(44): error #165: too few arguments in function call
>>      PetscOptionsGetScalar(NULL,"-xmax",&xmax,NULL);
> 
> Try:
> 
>      PetscOptionsGetScalar(NULL,NULL,"-xmax",&xmax,NULL);
> 
> Satish

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160114/1e6eb403/attachment.html>

From knepley at gmail.com  Thu Jan 14 07:44:47 2016
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 14 Jan 2016 07:44:47 -0600
Subject: [petsc-users] HPCToolKit/HPCViewer on OS X
In-Reply-To: <CAJ98EDorzL4bG68f7jgzp2o0qR5BQGtKYNG3QDMFFux21+4-Ew@mail.gmail.com>
References: <669480D3-5CB2-4A0E-B6BD-FF4B702BB5A7@ad.unc.edu>
	<CAMYG4GnReTGLYVxZ5wr3EUmi9L+6UnZHpO5=0S60SPv3jbrzSw@mail.gmail.com>
	<6780B32F-E05B-4FDB-9671-1A0C963B3B85@ad.unc.edu>
	<CAMYG4GmkPq5Dd2qW4dNuOf+kHMBMo=MzFqcF=WC=SqhjDjrAoA@mail.gmail.com>
	<CAJ98EDorzL4bG68f7jgzp2o0qR5BQGtKYNG3QDMFFux21+4-Ew@mail.gmail.com>
Message-ID: <CAMYG4G=mzdF7Df9Z683OyBcp0-FG_Zy7j-rQP81-yrtsDorDkw@mail.gmail.com>

On Thu, Jan 14, 2016 at 7:37 AM, Dave May <dave.mayhem23 at gmail.com> wrote:

>
>
> On 14 January 2016 at 14:24, Matthew Knepley <knepley at gmail.com> wrote:
>
>> On Wed, Jan 13, 2016 at 11:12 PM, Bhalla, Amneet Pal S <
>> amneetb at live.unc.edu> wrote:
>>
>>>
>>>
>>> On Jan 13, 2016, at 6:22 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>>
>>> Can you mail us a -log_summary for a rough cut? Sometimes its hard
>>> to interpret the data avalanche from one of those tools without a simple
>>> map.
>>>
>>>
>>> Does this indicate some hot spots?
>>>
>>
>> 1) There is a misspelled option -stokes_ib_pc_level_ksp_
>> richardson_self_scae
>>
>> You can try to avoid this by giving -options_left
>>
>> 2) Are you using any custom code during the solve? There is a gaping
>> whole in the timing. It take 9s to
>>     do PCApply(), but something like a collective 1s to do everything we
>> time under that.
>>
>
>
> You are looking at the timing from a debug build.
> The results from the optimized build don't have such a gaping hole.
>

It still looks like 50% of the runtime to me.

   Matt


>
>> Since this is serial, we can use something like kcachegrind to look at
>> performance as well, which should
>> at least tell us what is sucking up this time so we can put a PETSc even
>> on it.
>>
>>   Thanks,
>>
>>      Matt
>>
>>
>>
>>>
>>> ************************************************************************************************************************
>>> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
>>> -fCourier9' to print this document            ***
>>>
>>> ************************************************************************************************************************
>>>
>>> ---------------------------------------------- PETSc Performance
>>> Summary: ----------------------------------------------
>>>
>>> ./main2d on a darwin-dbg named Amneets-MBP.attlocal.net with 1
>>> processor, by Taylor Wed Jan 13 21:07:43 2016
>>> Using Petsc Development GIT revision: v3.6.1-2556-g6721a46  GIT Date:
>>> 2015-11-16 13:07:08 -0600
>>>
>>>                          Max       Max/Min        Avg      Total
>>> Time (sec):           1.039e+01      1.00000   1.039e+01
>>> Objects:              2.834e+03      1.00000   2.834e+03
>>> Flops:                3.552e+08      1.00000   3.552e+08  3.552e+08
>>> Flops/sec:            3.418e+07      1.00000   3.418e+07  3.418e+07
>>> Memory:               3.949e+07      1.00000              3.949e+07
>>> MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
>>> MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
>>> MPI Reductions:       0.000e+00      0.00000
>>>
>>> Flop counting convention: 1 flop = 1 real number operation of type
>>> (multiply/divide/add/subtract)
>>>                             e.g., VecAXPY() for real vectors of length N
>>> --> 2N flops
>>>                             and VecAXPY() for complex vectors of length
>>> N --> 8N flops
>>>
>>> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages
>>> ---  -- Message Lengths --  -- Reductions --
>>>                         Avg     %Total     Avg     %Total   counts
>>> %Total     Avg         %Total   counts   %Total
>>>  0:      Main Stage: 1.0391e+01 100.0%  3.5520e+08 100.0%  0.000e+00
>>> 0.0%  0.000e+00        0.0%  0.000e+00   0.0%
>>>
>>>
>>> ------------------------------------------------------------------------------------------------------------------------
>>> See the 'Profiling' chapter of the users' manual for details on
>>> interpreting output.
>>> Phase summary info:
>>>    Count: number of times phase was executed
>>>    Time and Flops: Max - maximum over all processors
>>>                    Ratio - ratio of maximum to minimum over all
>>> processors
>>>    Mess: number of messages sent
>>>    Avg. len: average message length (bytes)
>>>    Reduct: number of global reductions
>>>    Global: entire computation
>>>    Stage: stages of a computation. Set stages with PetscLogStagePush()
>>> and PetscLogStagePop().
>>>       %T - percent time in this phase         %F - percent flops in this
>>> phase
>>>       %M - percent messages in this phase     %L - percent message
>>> lengths in this phase
>>>       %R - percent reductions in this phase
>>>    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
>>> over all processors)
>>>
>>> ------------------------------------------------------------------------------------------------------------------------
>>>
>>>
>>>       ##########################################################
>>>       #                                                        #
>>>       #                          WARNING!!!                    #
>>>       #                                                        #
>>>       #   This code was compiled with a debugging option,      #
>>>       #   To get timing results run ./configure                #
>>>       #   using --with-debugging=no, the performance will      #
>>>       #   be generally two or three times faster.              #
>>>       #                                                        #
>>>       ##########################################################
>>>
>>>
>>> Event                Count      Time (sec)     Flops
>>>         --- Global ---  --- Stage ---   Total
>>>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
>>> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>>>
>>> ------------------------------------------------------------------------------------------------------------------------
>>>
>>> --- Event Stage 0: Main Stage
>>>
>>> VecDot                 4 1.0 9.0525e-04 1.0 3.31e+04 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0    37
>>> VecMDot              533 1.0 1.5936e-02 1.0 5.97e+06 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  2  0  0  0   0  2  0  0  0   375
>>> VecNorm              412 1.0 9.2107e-03 1.0 3.57e+06 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  1  0  0  0   0  1  0  0  0   388
>>> VecScale             331 1.0 5.8195e-01 1.0 1.41e+06 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  6  0  0  0  0   6  0  0  0  0     2
>>> VecCopy              116 1.0 1.9983e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecSet             18362 1.0 1.5249e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>>> VecAXPY              254 1.0 4.3961e-01 1.0 1.95e+06 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  4  1  0  0  0   4  1  0  0  0     4
>>> VecAYPX               92 1.0 2.5167e-03 1.0 2.66e+05 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0   106
>>> VecAXPBYCZ            36 1.0 8.6242e-04 1.0 2.94e+05 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0   341
>>> VecWAXPY              58 1.0 1.2539e-03 1.0 2.47e+05 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0   197
>>> VecMAXPY             638 1.0 2.3439e-02 1.0 7.68e+06 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  2  0  0  0   0  2  0  0  0   328
>>> VecSwap              111 1.0 1.9721e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
>>> VecAssemblyBegin     607 1.0 3.8150e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecAssemblyEnd       607 1.0 8.3705e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecScatterBegin    26434 1.0 3.0096e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  3  0  0  0  0   3  0  0  0  0     0
>>> VecNormalize         260 1.0 4.9754e-01 1.0 3.84e+06 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  5  1  0  0  0   5  1  0  0  0     8
>>> BuildTwoSidedF       600 1.0 1.8942e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatMult              365 1.0 6.0306e-01 1.0 6.26e+07 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  6 18  0  0  0   6 18  0  0  0   104
>>> MatSolve            8775 1.0 6.8506e-01 1.0 2.25e+08 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  7 63  0  0  0   7 63  0  0  0   328
>>> MatLUFactorSym        85 1.0 1.0664e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>>> MatLUFactorNum        85 1.0 1.2066e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  1 12  0  0  0   1 12  0  0  0   350
>>> MatScale               4 1.0 4.0145e-04 1.0 2.51e+05 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0   625
>>> MatAssemblyBegin     108 1.0 4.8849e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatAssemblyEnd       108 1.0 9.8455e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatGetRow          33120 1.0 1.4157e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>>> MatGetRowIJ           85 1.0 2.6060e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatGetSubMatrice       4 1.0 4.2922e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatGetOrdering        85 1.0 3.1230e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatAXPY                4 1.0 4.0459e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  4  0  0  0  0   4  0  0  0  0     0
>>> MatPtAP                4 1.0 1.1362e-01 1.0 4.99e+06 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  1  1  0  0  0   1  1  0  0  0    44
>>> MatPtAPSymbolic        4 1.0 6.4973e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>>> MatPtAPNumeric         4 1.0 4.8521e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  1  0  0  0   0  1  0  0  0   103
>>> MatGetSymTrans         4 1.0 5.9780e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> KSPGMRESOrthog       182 1.0 2.0538e-02 1.0 5.11e+06 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  1  0  0  0   0  1  0  0  0   249
>>> KSPSetUp              90 1.0 2.1210e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> KSPSolve               1 1.0 9.5567e+00 1.0 3.50e+08 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 92 98  0  0  0  92 98  0  0  0    37
>>> PCSetUp               90 1.0 4.0597e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  4 12  0  0  0   4 12  0  0  0   104
>>> PCSetUpOnBlocks       91 1.0 2.9886e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  3 12  0  0  0   3 12  0  0  0   141
>>> PCApply               13 1.0 9.0558e+00 1.0 3.49e+08 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 87 98  0  0  0  87 98  0  0  0    39
>>> SNESSolve              1 1.0 9.5729e+00 1.0 3.50e+08 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 92 98  0  0  0  92 98  0  0  0    37
>>> SNESFunctionEval       2 1.0 1.3347e-02 1.0 4.68e+04 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     4
>>> SNESJacobianEval       1 1.0 2.4613e-03 1.0 4.26e+03 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     2
>>>
>>> ------------------------------------------------------------------------------------------------------------------------
>>>
>>> Memory usage is given in bytes:
>>>
>>> Object Type          Creations   Destructions     Memory  Descendants'
>>> Mem.
>>> Reports information only for process 0.
>>>
>>> --- Event Stage 0: Main Stage
>>>
>>>               Vector   870            762     13314200     0.
>>>       Vector Scatter   290            289       189584     0.
>>>            Index Set  1171            823       951096     0.
>>>    IS L to G Mapping   110            109      2156656     0.
>>>    Application Order     6              6        99952     0.
>>>              MatMFFD     1              1          776     0.
>>>               Matrix   189            189     24202324     0.
>>>    Matrix Null Space     4              4         2432     0.
>>>        Krylov Solver    90             90       190080     0.
>>>      DMKSP interface     1              1          648     0.
>>>       Preconditioner    90             90        89128     0.
>>>                 SNES     1              1         1328     0.
>>>       SNESLineSearch     1              1          856     0.
>>>               DMSNES     1              1          664     0.
>>>     Distributed Mesh     2              2         9024     0.
>>> Star Forest Bipartite Graph     4              4         3168     0.
>>>      Discrete System     2              2         1696     0.
>>>               Viewer     1              0            0     0.
>>>
>>> ========================================================================================================================
>>> Average time to get PetscTime(): 4.74e-08
>>> #PETSc Option Table entries:
>>> -ib_ksp_converged_reason
>>> -ib_ksp_monitor_true_residual
>>> -ib_snes_type ksponly
>>> -log_summary
>>> -stokes_ib_pc_level_ksp_richardson_self_scae
>>> -stokes_ib_pc_level_ksp_type gmres
>>> -stokes_ib_pc_level_pc_asm_local_type additive
>>> -stokes_ib_pc_level_pc_asm_type interpolate
>>> -stokes_ib_pc_level_sub_pc_factor_nonzeros_along_diagonal
>>> -stokes_ib_pc_level_sub_pc_type lu
>>> #End of PETSc Option Table entries
>>> Compiled without FORTRAN kernels
>>> Compiled with full precision matrices (default)
>>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
>>> sizeof(PetscScalar) 8 sizeof(PetscInt) 4
>>> Configure options: --CC=mpicc --CXX=mpicxx --FC=mpif90
>>> --PETSC_ARCH=darwin-dbg --with-debugging=1 --with-c++-support=1
>>> --with-hypre=1 --download-hypre=1 --with-hdf5=yes
>>> --with-hdf5-dir=/Users/Taylor/Documents/SOFTWARES/HDF5/
>>> -----------------------------------------
>>> Libraries compiled on Mon Nov 16 15:11:21 2015 on d209.math.ucdavis.edu
>>> Machine characteristics: Darwin-14.5.0-x86_64-i386-64bit
>>> Using PETSc directory:
>>> /Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc
>>> Using PETSc arch: darwin-dbg
>>> -----------------------------------------
>>>
>>> Using C compiler: mpicc    -g  ${COPTFLAGS} ${CFLAGS}
>>> Using Fortran compiler: mpif90   -g   ${FOPTFLAGS} ${FFLAGS}
>>> -----------------------------------------
>>>
>>> Using include paths:
>>> -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/include
>>> -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/include
>>> -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/include
>>> -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/include
>>> -I/opt/X11/include -I/Users/Taylor/Documents/SOFTWARES/HDF5/include
>>> -I/opt/local/include -I/Users/Taylor/Documents/SOFTWARES/MPICH/include
>>> -----------------------------------------
>>>
>>> Using C linker: mpicc
>>> Using Fortran linker: mpif90
>>> Using libraries:
>>> -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib
>>> -L/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib
>>> -lpetsc
>>> -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib
>>> -L/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib
>>> -lHYPRE -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/MPICH/lib
>>> -L/Users/Taylor/Documents/SOFTWARES/MPICH/lib
>>> -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/6.1.0/lib/darwin
>>> -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/6.1.0/lib/darwin
>>> -lclang_rt.osx -lmpicxx -lc++
>>> -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin
>>> -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin
>>> -lclang_rt.osx -llapack -lblas -Wl,-rpath,/opt/X11/lib -L/opt/X11/lib -lX11
>>> -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/HDF5/lib
>>> -L/Users/Taylor/Documents/SOFTWARES/HDF5/lib -lhdf5_hl -lhdf5 -lssl
>>> -lcrypto -lmpifort -lgfortran
>>> -Wl,-rpath,/opt/local/lib/gcc49/gcc/x86_64-apple-darwin14/4.9.1
>>> -L/opt/local/lib/gcc49/gcc/x86_64-apple-darwin14/4.9.1
>>> -Wl,-rpath,/opt/local/lib/gcc49 -L/opt/local/lib/gcc49 -lgfortran
>>> -lgcc_ext.10.5 -lquadmath -lm -lclang_rt.osx -lmpicxx -lc++ -lclang_rt.osx
>>> -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/MPICH/lib
>>> -L/Users/Taylor/Documents/SOFTWARES/MPICH/lib -ldl -lmpi -lpmpi -lSystem
>>> -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin
>>> -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin
>>> -lclang_rt.osx -ldl
>>> -----------------------------------------
>>>
>>>
>>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160114/ef951fcc/attachment-0001.html>

From boyceg at email.unc.edu  Thu Jan 14 08:30:38 2016
From: boyceg at email.unc.edu (Griffith, Boyce Eugene)
Date: Thu, 14 Jan 2016 14:30:38 +0000
Subject: [petsc-users] HPCToolKit/HPCViewer on OS X
In-Reply-To: <CAMYG4G=mzdF7Df9Z683OyBcp0-FG_Zy7j-rQP81-yrtsDorDkw@mail.gmail.com>
References: <669480D3-5CB2-4A0E-B6BD-FF4B702BB5A7@ad.unc.edu>
	<CAMYG4GnReTGLYVxZ5wr3EUmi9L+6UnZHpO5=0S60SPv3jbrzSw@mail.gmail.com>
	<6780B32F-E05B-4FDB-9671-1A0C963B3B85@ad.unc.edu>
	<CAMYG4GmkPq5Dd2qW4dNuOf+kHMBMo=MzFqcF=WC=SqhjDjrAoA@mail.gmail.com>
	<CAJ98EDorzL4bG68f7jgzp2o0qR5BQGtKYNG3QDMFFux21+4-Ew@mail.gmail.com>
	<CAMYG4G=mzdF7Df9Z683OyBcp0-FG_Zy7j-rQP81-yrtsDorDkw@mail.gmail.com>
Message-ID: <960248D8-89F9-492D-A7EA-C503722E73C9@email.unc.edu>


On Jan 14, 2016, at 8:44 AM, Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>> wrote:

On Thu, Jan 14, 2016 at 7:37 AM, Dave May <dave.mayhem23 at gmail.com<mailto:dave.mayhem23 at gmail.com>> wrote:


On 14 January 2016 at 14:24, Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>> wrote:
On Wed, Jan 13, 2016 at 11:12 PM, Bhalla, Amneet Pal S <amneetb at live.unc.edu<mailto:amneetb at live.unc.edu>> wrote:


On Jan 13, 2016, at 6:22 PM, Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>> wrote:

Can you mail us a -log_summary for a rough cut? Sometimes its hard
to interpret the data avalanche from one of those tools without a simple map.

Does this indicate some hot spots?

1) There is a misspelled option -stokes_ib_pc_level_ksp_richardson_self_scae

You can try to avoid this by giving -options_left

2) Are you using any custom code during the solve? There is a gaping whole in the timing. It take 9s to
    do PCApply(), but something like a collective 1s to do everything we time under that.


You are looking at the timing from a debug build.
The results from the optimized build don't have such a gaping hole.

It still looks like 50% of the runtime to me.

Amneet, on OS X, I would echo Barry and suggest starting out using the timer profiler instrument (accessible through the Instruments app).

-- Boyce


   Matt


Since this is serial, we can use something like kcachegrind to look at performance as well, which should
at least tell us what is sucking up this time so we can put a PETSc even on it.

  Thanks,

     Matt


************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./main2d on a darwin-dbg named Amneets-MBP.attlocal.net<http://amneets-mbp.attlocal.net/> with 1 processor, by Taylor Wed Jan 13 21:07:43 2016
Using Petsc Development GIT revision: v3.6.1-2556-g6721a46  GIT Date: 2015-11-16 13:07:08 -0600

                         Max       Max/Min        Avg      Total
Time (sec):           1.039e+01      1.00000   1.039e+01
Objects:              2.834e+03      1.00000   2.834e+03
Flops:                3.552e+08      1.00000   3.552e+08  3.552e+08
Flops/sec:            3.418e+07      1.00000   3.418e+07  3.418e+07
Memory:               3.949e+07      1.00000              3.949e+07
MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Reductions:       0.000e+00      0.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total
 0:      Main Stage: 1.0391e+01 100.0%  3.5520e+08 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------


      ##########################################################
      #                                                        #
      #                          WARNING!!!                    #
      #                                                        #
      #   This code was compiled with a debugging option,      #
      #   To get timing results run ./configure                #
      #   using --with-debugging=no, the performance will      #
      #   be generally two or three times faster.              #
      #                                                        #
      ##########################################################


Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecDot                 4 1.0 9.0525e-04 1.0 3.31e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    37
VecMDot              533 1.0 1.5936e-02 1.0 5.97e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0   375
VecNorm              412 1.0 9.2107e-03 1.0 3.57e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0   388
VecScale             331 1.0 5.8195e-01 1.0 1.41e+06 1.0 0.0e+00 0.0e+00 0.0e+00  6  0  0  0  0   6  0  0  0  0     2
VecCopy              116 1.0 1.9983e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet             18362 1.0 1.5249e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY              254 1.0 4.3961e-01 1.0 1.95e+06 1.0 0.0e+00 0.0e+00 0.0e+00  4  1  0  0  0   4  1  0  0  0     4
VecAYPX               92 1.0 2.5167e-03 1.0 2.66e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   106
VecAXPBYCZ            36 1.0 8.6242e-04 1.0 2.94e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   341
VecWAXPY              58 1.0 1.2539e-03 1.0 2.47e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   197
VecMAXPY             638 1.0 2.3439e-02 1.0 7.68e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0   328
VecSwap              111 1.0 1.9721e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
VecAssemblyBegin     607 1.0 3.8150e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd       607 1.0 8.3705e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin    26434 1.0 3.0096e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0     0
VecNormalize         260 1.0 4.9754e-01 1.0 3.84e+06 1.0 0.0e+00 0.0e+00 0.0e+00  5  1  0  0  0   5  1  0  0  0     8
BuildTwoSidedF       600 1.0 1.8942e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatMult              365 1.0 6.0306e-01 1.0 6.26e+07 1.0 0.0e+00 0.0e+00 0.0e+00  6 18  0  0  0   6 18  0  0  0   104
MatSolve            8775 1.0 6.8506e-01 1.0 2.25e+08 1.0 0.0e+00 0.0e+00 0.0e+00  7 63  0  0  0   7 63  0  0  0   328
MatLUFactorSym        85 1.0 1.0664e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
MatLUFactorNum        85 1.0 1.2066e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00 0.0e+00  1 12  0  0  0   1 12  0  0  0   350
MatScale               4 1.0 4.0145e-04 1.0 2.51e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   625
MatAssemblyBegin     108 1.0 4.8849e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd       108 1.0 9.8455e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetRow          33120 1.0 1.4157e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
MatGetRowIJ           85 1.0 2.6060e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetSubMatrice       4 1.0 4.2922e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering        85 1.0 3.1230e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAXPY                4 1.0 4.0459e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  4  0  0  0  0   4  0  0  0  0     0
MatPtAP                4 1.0 1.1362e-01 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1  0  0  0    44
MatPtAPSymbolic        4 1.0 6.4973e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
MatPtAPNumeric         4 1.0 4.8521e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0   103
MatGetSymTrans         4 1.0 5.9780e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPGMRESOrthog       182 1.0 2.0538e-02 1.0 5.11e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0   249
KSPSetUp              90 1.0 2.1210e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 9.5567e+00 1.0 3.50e+08 1.0 0.0e+00 0.0e+00 0.0e+00 92 98  0  0  0  92 98  0  0  0    37
PCSetUp               90 1.0 4.0597e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00 0.0e+00  4 12  0  0  0   4 12  0  0  0   104
PCSetUpOnBlocks       91 1.0 2.9886e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00 0.0e+00  3 12  0  0  0   3 12  0  0  0   141
PCApply               13 1.0 9.0558e+00 1.0 3.49e+08 1.0 0.0e+00 0.0e+00 0.0e+00 87 98  0  0  0  87 98  0  0  0    39
SNESSolve              1 1.0 9.5729e+00 1.0 3.50e+08 1.0 0.0e+00 0.0e+00 0.0e+00 92 98  0  0  0  92 98  0  0  0    37
SNESFunctionEval       2 1.0 1.3347e-02 1.0 4.68e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     4
SNESJacobianEval       1 1.0 2.4613e-03 1.0 4.26e+03 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     2
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector   870            762     13314200     0.
      Vector Scatter   290            289       189584     0.
           Index Set  1171            823       951096     0.
   IS L to G Mapping   110            109      2156656     0.
   Application Order     6              6        99952     0.
             MatMFFD     1              1          776     0.
              Matrix   189            189     24202324     0.
   Matrix Null Space     4              4         2432     0.
       Krylov Solver    90             90       190080     0.
     DMKSP interface     1              1          648     0.
      Preconditioner    90             90        89128     0.
                SNES     1              1         1328     0.
      SNESLineSearch     1              1          856     0.
              DMSNES     1              1          664     0.
    Distributed Mesh     2              2         9024     0.
Star Forest Bipartite Graph     4              4         3168     0.
     Discrete System     2              2         1696     0.
              Viewer     1              0            0     0.
========================================================================================================================
Average time to get PetscTime(): 4.74e-08
#PETSc Option Table entries:
-ib_ksp_converged_reason
-ib_ksp_monitor_true_residual
-ib_snes_type ksponly
-log_summary
-stokes_ib_pc_level_ksp_richardson_self_scae
-stokes_ib_pc_level_ksp_type gmres
-stokes_ib_pc_level_pc_asm_local_type additive
-stokes_ib_pc_level_pc_asm_type interpolate
-stokes_ib_pc_level_sub_pc_factor_nonzeros_along_diagonal
-stokes_ib_pc_level_sub_pc_type lu
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --CC=mpicc --CXX=mpicxx --FC=mpif90 --PETSC_ARCH=darwin-dbg --with-debugging=1 --with-c++-support=1 --with-hypre=1 --download-hypre=1 --with-hdf5=yes --with-hdf5-dir=/Users/Taylor/Documents/SOFTWARES/HDF5/
-----------------------------------------
Libraries compiled on Mon Nov 16 15:11:21 2015 on d209.math.ucdavis.edu<http://d209.math.ucdavis.edu/>
Machine characteristics: Darwin-14.5.0-x86_64-i386-64bit
Using PETSc directory: /Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc
Using PETSc arch: darwin-dbg
-----------------------------------------

Using C compiler: mpicc    -g  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: mpif90   -g   ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------

Using include paths: -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/include -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/include -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/include -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/include -I/opt/X11/include -I/Users/Taylor/Documents/SOFTWARES/HDF5/include -I/opt/local/include -I/Users/Taylor/Documents/SOFTWARES/MPICH/include
-----------------------------------------

Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib -L/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib -lpetsc -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib -L/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib -lHYPRE -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/MPICH/lib -L/Users/Taylor/Documents/SOFTWARES/MPICH/lib -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/6.1.0/lib/darwin -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/6.1.0/lib/darwin -lclang_rt.osx -lmpicxx -lc++ -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin -lclang_rt.osx -llapack -lblas -Wl,-rpath,/opt/X11/lib -L/opt/X11/lib -lX11 -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/HDF5/lib -L/Users/Taylor/Documents/SOFTWARES/HDF5/lib -lhdf5_hl -lhdf5 -lssl -lcrypto -lmpifort -lgfortran -Wl,-rpath,/opt/local/lib/gcc49/gcc/x86_64-apple-darwin14/4.9.1 -L/opt/local/lib/gcc49/gcc/x86_64-apple-darwin14/4.9.1 -Wl,-rpath,/opt/local/lib/gcc49 -L/opt/local/lib/gcc49 -lgfortran -lgcc_ext.10.5 -lquadmath -lm -lclang_rt.osx -lmpicxx -lc++ -lclang_rt.osx -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/MPICH/lib -L/Users/Taylor/Documents/SOFTWARES/MPICH/lib -ldl -lmpi -lpmpi -lSystem -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin -lclang_rt.osx -ldl
-----------------------------------------


--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener


--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160114/49957f67/attachment-0001.html>

From balay at mcs.anl.gov  Thu Jan 14 09:42:44 2016
From: balay at mcs.anl.gov (Satish Balay)
Date: Thu, 14 Jan 2016 09:42:44 -0600
Subject: [petsc-users] compiler error
In-Reply-To: <DC4DEF9E-A500-41F5-901D-75CEDB981956@gmail.com>
References: <18F5EB28-AE2E-4E28-B95E-2D6AD1DBECEE@gmail.com>
	<alpine.LFD.2.20.1601132253120.17714@asterix>
	<DC4DEF9E-A500-41F5-901D-75CEDB981956@gmail.com>
Message-ID: <alpine.LFD.2.20.1601140940090.18400@asterix>

Hopefully all changes should be documented in the changes file..

http://www.mcs.anl.gov/petsc/documentation/changes/dev.html

You can use git to find out more info..

balay at asterix /home/balay/petsc (master=)
$ git grep PetscOptionsGetScalar include/
include/petscoptions.h:PETSC_EXTERN PetscErrorCode PetscOptionsGetScalar(PetscOptions,const char[],const char[],PetscScalar *,PetscBool *);
include/petscoptions.h:PETSC_EXTERN PetscErrorCode PetscOptionsGetScalarArray(PetscOptions,const char[],const char[],PetscScalar[],PetscInt *,PetscBool *);
balay at asterix /home/balay/petsc (hzhang/update-networkex=)
$ git annotate include/petscoptions.h |grep PetscOptionsGetScalar
c5929fdf       (Barry Smith           2015-10-30 21:20:21 -0500 17)PETSC_EXTERN PetscErrorCode PetscOptionsGetScalar(PetscOptions,const char[],const char[],PetscScalar *,PetscBool *);
c5929fdf       (Barry Smith           2015-10-30 21:20:21 -0500 20)PETSC_EXTERN PetscErrorCode PetscOptionsGetScalarArray(PetscOptions,const char[],const char[],PetscScalar[],PetscInt *,PetscBool *);
balay at asterix /home/balay/petsc (hzhang/update-networkex=)
$ git show -q c5929fdf
commit c5929fdf3082647d199855a5c1d0286204349b03
Author: Barry Smith <bsmith at mcs.anl.gov>
Date:   Fri Oct 30 21:20:21 2015 -0500

    Complete update to new PetscOptions interface
balay at asterix /home/balay/petsc (hzhang/update-networkex=)
$ gitk c5929fdf

etc..

Satish


On Thu, 14 Jan 2016, Gideon Simpson wrote:

> I know I did a git pull recently, but when did that change?  What?s the fifth argument represent?
> 
> -gideon
> 
> > On Jan 13, 2016, at 11:54 PM, Satish Balay <balay at mcs.anl.gov> wrote:
> > 
> > On Wed, 13 Jan 2016, Gideon Simpson wrote:
> > 
> >> I haven?t seen this before:
> >> 
> >> /mnt/HA/opt/openmpi/intel/2015/1.8.1-mlnx-ofed/bin/mpicc -o fixed_batch.o -c -fPIC  -wd1572 -g   -I/home/simpson/software/petsc/include -I/home/simpson/software/petsc/arch-linux2-c-debug/include -I/mnt/HA/opt/openmpi/intel/2015/1.8.1-mlnx-ofed/include   -Wall `pwd`/fixed_batch.c
> >> /home/simpson/projects/dnls/petsc/fixed_batch.c(44): warning #167: argument of type "PetscScalar={PetscReal={double}} *" is incompatible with parameter of type "const char *"
> >>      PetscOptionsGetScalar(NULL,"-xmax",&xmax,NULL);
> >>                                         ^
> >> 
> >> /home/simpson/projects/dnls/petsc/fixed_batch.c(44): error #165: too few arguments in function call
> >>      PetscOptionsGetScalar(NULL,"-xmax",&xmax,NULL);
> > 
> > Try:
> > 
> >      PetscOptionsGetScalar(NULL,NULL,"-xmax",&xmax,NULL);
> > 
> > Satish
> 
> 

From knepley at gmail.com  Thu Jan 14 10:09:46 2016
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 14 Jan 2016 10:09:46 -0600
Subject: [petsc-users] undefined reference error in make test
In-Reply-To: <CAJC+_cNaeW1N=XaeLzCJ0nOYK-_RPE22Kq6Wusn3OVH2jLRmvQ@mail.gmail.com>
References: <CAJC+_cNaeW1N=XaeLzCJ0nOYK-_RPE22Kq6Wusn3OVH2jLRmvQ@mail.gmail.com>
Message-ID: <CAMYG4GkueaKnfkRbx29qfhpd_wSgoX7G0nLNis54v=orLPhr7g@mail.gmail.com>

On Thu, Jan 14, 2016 at 12:03 AM, praveen kumar <praveenpetsc at gmail.com>
wrote:

> I?ve written a fortan code (F90)  for domain decomposition.* I've
> specified **the paths of include files and libraries, but the
> compiler/linker still *
>
>
> *complained about undefined references.undefined reference to `vectorset_'*
>

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecSet.html


>
> *undefined reference to `dmdagetlocalinfo_'*
>

This function is not supported in Fortran since it takes a structure.

  Thanks,

    Matt


> I?m attaching makefile and code. any help will be appreciated.
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160114/28bbdb5d/attachment.html>

From knepley at gmail.com  Thu Jan 14 10:20:30 2016
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 14 Jan 2016 10:20:30 -0600
Subject: [petsc-users] Why use MATMPIBAIJ?
In-Reply-To: <CAJW_hKeU9MO8v+ub+5CjnaH3OpBkQNtGA8ZEiDtbYoQnSQ5rGQ@mail.gmail.com>
References: <CAP2=TMiuZV+DDPk6dqPjZ5Ndy8W5C5GaEreny7_FQuZ39i1gBw@mail.gmail.com>
	<2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov>
	<CAP2=TMhZeZ4JZ3AKARajab8CmFwdANnh0HOHqDDQRd8JfGkO3Q@mail.gmail.com>
	<DAA1C465-680F-4A5E-BA8F-9C6F7C055DAD@mcs.anl.gov>
	<CAP2=TMjwihEL1V7NPNByiMWfWfC5Q0MJyE1ByxnLokx1Df+OJA@mail.gmail.com>
	<CAJW_hKeU9MO8v+ub+5CjnaH3OpBkQNtGA8ZEiDtbYoQnSQ5rGQ@mail.gmail.com>
Message-ID: <CAMYG4GnzaPQYPFRAZjNebxqUokjs4pLcdq3rTyBgWdc6gtZDjw@mail.gmail.com>

On Thu, Jan 14, 2016 at 5:04 AM, Hoang Giang Bui <hgbk2008 at gmail.com> wrote:

> This is a very interesting thread because use of block matrix improves the
> performance of AMG a lot. In my case is the elasticity problem.
>
> One more question I like to ask, which is more on the performance of the
> solver. That if I have a coupled problem, says the point block is [u_x u_y
> u_z p] in which entries of p block in stiffness matrix is in a much smaller
> scale than u (p~1e-6, u~1e+8), then AMG with hypre in PETSc still scale?
> Also, is there a utility in PETSc which does automatic scaling of variables?
>

You could use PC Jacobi, or perhaps


http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/KSP/KSPSetDiagonalScale.html

  Thanks,

     Matt


> Giang
>
> On Thu, Jan 14, 2016 at 7:13 AM, Justin Chang <jychang48 at gmail.com> wrote:
>
>> Okay that makes sense, thanks
>>
>> On Wed, Jan 13, 2016 at 10:12 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>
>>>
>>> > On Jan 13, 2016, at 10:24 PM, Justin Chang <jychang48 at gmail.com>
>>> wrote:
>>> >
>>> > Thanks Barry,
>>> >
>>> > 1) So for block matrices, the ja array is smaller. But what's the
>>> "hardware" explanation for this performance improvement? Does it have to do
>>> with spatial locality where you are more likely to reuse data in that ja
>>> array, or does it have to do with the fact that loading/storing smaller
>>> arrays are less likely to invoke a cache miss, thus reducing the amount of
>>> bandwidth?
>>>
>>> There are two distinct reasons for the improvement:
>>>
>>> 1) For 5 by 5 blocks the ja array is 1/25th the size. The "hardware"
>>> savings is that you have to load something that is much smaller than
>>> before. Cache/spatial locality have nothing to do with this particular
>>> improvement.
>>>
>>> 2) The other improvement comes from the reuse of each x[j] value
>>> multiplied by 5 values (a column) of the little block. The hardware
>>> explanation is that x[j] can be reused in a register for the 5 multiplies
>>> (while otherwise it would have to come from cache to register 5 times and
>>> sometimes might even have been flushed from the cache so would have to come
>>> from memory). This is why we have code like
>>>
>>>     for (j=0; j<n; j++) {
>>>       xb    = x + 5*(*idx++);
>>>       x1    = xb[0]; x2 = xb[1]; x3 = xb[2]; x4 = xb[3]; x5 = xb[4];
>>>       sum1 += v[0]*x1 + v[5]*x2 + v[10]*x3  + v[15]*x4 + v[20]*x5;
>>>       sum2 += v[1]*x1 + v[6]*x2 + v[11]*x3  + v[16]*x4 + v[21]*x5;
>>>       sum3 += v[2]*x1 + v[7]*x2 + v[12]*x3  + v[17]*x4 + v[22]*x5;
>>>       sum4 += v[3]*x1 + v[8]*x2 + v[13]*x3  + v[18]*x4 + v[23]*x5;
>>>       sum5 += v[4]*x1 + v[9]*x2 + v[14]*x3  + v[19]*x4 + v[24]*x5;
>>>       v    += 25;
>>>     }
>>>
>>> to do the block multiple.
>>>
>>> >
>>> > 2) So if one wants to assemble a monolithic matrix (i.e., aggregation
>>> of more than one dof per point) then using the BAIJ format is highly
>>> advisable. But if I want to form a nested matrix, say I am solving Stokes
>>> equation, then each "submatrix" is of AIJ format? Can these sub matrices
>>> also be BAIJ?
>>>
>>>    Sure, but if you have separated all the variables of pressure,
>>> velocity_x, velocity_y, etc into there own regions of the vector then the
>>> block size for the sub matrices would be 1 so BAIJ does not help.
>>>
>>>    There are Stokes solvers that use Vanka smoothing that keep the
>>> variables interlaced and hence would use BAIJ and NOT use fieldsplit
>>>
>>>
>>> >
>>> > Thanks,
>>> > Justin
>>> >
>>> > On Wed, Jan 13, 2016 at 9:12 PM, Barry Smith <bsmith at mcs.anl.gov>
>>> wrote:
>>> >
>>> > > On Jan 13, 2016, at 9:57 PM, Justin Chang <jychang48 at gmail.com>
>>> wrote:
>>> > >
>>> > > Hi all,
>>> > >
>>> > > 1) I am guessing MATMPIBAIJ could theoretically have better
>>> performance than simply using MATMPIAIJ. Why is that? Is it similar to the
>>> reasoning that block (dense) matrix-vector multiply is "faster" than simple
>>> matrix-vector?
>>> >
>>> >   See for example table 1 in
>>> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.38.7668&rep=rep1&type=pdf
>>> >
>>> > >
>>> > > 2) I am looking through the manual and online documentation and it
>>> seems the term "block" used everywhere. In the section on "block matrices"
>>> (3.1.3 of the manual), it refers to field splitting, where you could either
>>> have a monolithic matrix or a nested matrix. Does that concept have
>>> anything to do with MATMPIBAIJ?
>>> >
>>> >    Unfortunately the numerical analysis literature uses the term block
>>> in multiple ways. For small blocks, sometimes called "point-block" with
>>> BAIJ and for very large blocks (where the blocks are sparse themselves). I
>>> used fieldsplit for big sparse blocks to try to avoid confusion in PETSc.
>>> > >
>>> > > It makes sense to me that one could create a BAIJ where if you have
>>> 5 dofs of the same type of physics (e.g., five different primary species of
>>> a geochemical reaction) per grid point, you could create a block size of 5.
>>> And if you have different physics (e.g., velocity and pressure) you would
>>> ideally want to separate them out (i.e., nested matrices) for better
>>> preconditioning.
>>> >
>>> >    Sometimes you put them together with BAIJ and sometimes you keep
>>> them separate with nested matrices.
>>> >
>>> > >
>>> > > Thanks,
>>> > > Justin
>>> >
>>> >
>>>
>>>
>>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160114/d862627f/attachment.html>

From gideon.simpson at gmail.com  Thu Jan 14 11:40:08 2016
From: gideon.simpson at gmail.com (Gideon Simpson)
Date: Thu, 14 Jan 2016 12:40:08 -0500
Subject: [petsc-users] compiler error
In-Reply-To: <alpine.LFD.2.20.1601140940090.18400@asterix>
References: <18F5EB28-AE2E-4E28-B95E-2D6AD1DBECEE@gmail.com>
	<alpine.LFD.2.20.1601132253120.17714@asterix>
	<DC4DEF9E-A500-41F5-901D-75CEDB981956@gmail.com>
	<alpine.LFD.2.20.1601140940090.18400@asterix>
Message-ID: <128DCD06-7F08-41A0-A581-CD081A14B429@gmail.com>

Is this change going to be part of the next patch release, or the eventual 3.7?

-gideon

> On Jan 14, 2016, at 10:42 AM, Satish Balay <balay at mcs.anl.gov> wrote:
> 
> Hopefully all changes should be documented in the changes file..
> 
> http://www.mcs.anl.gov/petsc/documentation/changes/dev.html
> 
> You can use git to find out more info..
> 
> balay at asterix /home/balay/petsc (master=)
> $ git grep PetscOptionsGetScalar include/
> include/petscoptions.h:PETSC_EXTERN PetscErrorCode PetscOptionsGetScalar(PetscOptions,const char[],const char[],PetscScalar *,PetscBool *);
> include/petscoptions.h:PETSC_EXTERN PetscErrorCode PetscOptionsGetScalarArray(PetscOptions,const char[],const char[],PetscScalar[],PetscInt *,PetscBool *);
> balay at asterix /home/balay/petsc (hzhang/update-networkex=)
> $ git annotate include/petscoptions.h |grep PetscOptionsGetScalar
> c5929fdf       (Barry Smith           2015-10-30 21:20:21 -0500 17)PETSC_EXTERN PetscErrorCode PetscOptionsGetScalar(PetscOptions,const char[],const char[],PetscScalar *,PetscBool *);
> c5929fdf       (Barry Smith           2015-10-30 21:20:21 -0500 20)PETSC_EXTERN PetscErrorCode PetscOptionsGetScalarArray(PetscOptions,const char[],const char[],PetscScalar[],PetscInt *,PetscBool *);
> balay at asterix /home/balay/petsc (hzhang/update-networkex=)
> $ git show -q c5929fdf
> commit c5929fdf3082647d199855a5c1d0286204349b03
> Author: Barry Smith <bsmith at mcs.anl.gov>
> Date:   Fri Oct 30 21:20:21 2015 -0500
> 
>    Complete update to new PetscOptions interface
> balay at asterix /home/balay/petsc (hzhang/update-networkex=)
> $ gitk c5929fdf
> 
> etc..
> 
> Satish
> 
> 
> On Thu, 14 Jan 2016, Gideon Simpson wrote:
> 
>> I know I did a git pull recently, but when did that change?  What?s the fifth argument represent?
>> 
>> -gideon
>> 
>>> On Jan 13, 2016, at 11:54 PM, Satish Balay <balay at mcs.anl.gov> wrote:
>>> 
>>> On Wed, 13 Jan 2016, Gideon Simpson wrote:
>>> 
>>>> I haven?t seen this before:
>>>> 
>>>> /mnt/HA/opt/openmpi/intel/2015/1.8.1-mlnx-ofed/bin/mpicc -o fixed_batch.o -c -fPIC  -wd1572 -g   -I/home/simpson/software/petsc/include -I/home/simpson/software/petsc/arch-linux2-c-debug/include -I/mnt/HA/opt/openmpi/intel/2015/1.8.1-mlnx-ofed/include   -Wall `pwd`/fixed_batch.c
>>>> /home/simpson/projects/dnls/petsc/fixed_batch.c(44): warning #167: argument of type "PetscScalar={PetscReal={double}} *" is incompatible with parameter of type "const char *"
>>>>     PetscOptionsGetScalar(NULL,"-xmax",&xmax,NULL);
>>>>                                        ^
>>>> 
>>>> /home/simpson/projects/dnls/petsc/fixed_batch.c(44): error #165: too few arguments in function call
>>>>     PetscOptionsGetScalar(NULL,"-xmax",&xmax,NULL);
>>> 
>>> Try:
>>> 
>>>     PetscOptionsGetScalar(NULL,NULL,"-xmax",&xmax,NULL);
>>> 
>>> Satish
>> 
>> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160114/f42bf18e/attachment-0001.html>

From balay at mcs.anl.gov  Thu Jan 14 11:46:35 2016
From: balay at mcs.anl.gov (Satish Balay)
Date: Thu, 14 Jan 2016 11:46:35 -0600
Subject: [petsc-users] compiler error
In-Reply-To: <128DCD06-7F08-41A0-A581-CD081A14B429@gmail.com>
References: <18F5EB28-AE2E-4E28-B95E-2D6AD1DBECEE@gmail.com>
	<alpine.LFD.2.20.1601132253120.17714@asterix>
	<DC4DEF9E-A500-41F5-901D-75CEDB981956@gmail.com>
	<alpine.LFD.2.20.1601140940090.18400@asterix>
	<128DCD06-7F08-41A0-A581-CD081A14B429@gmail.com>
Message-ID: <alpine.LFD.2.20.1601141145130.4248@asterix>

future full release [3.7] will be from 'master' branch.

future patch fix release [3.6.x] will be from 'maint' branch.

You can choose the branch to use - based on your need..

Satish

On Thu, 14 Jan 2016, Gideon Simpson wrote:

> Is this change going to be part of the next patch release, or the eventual 3.7?
> 
> -gideon
> 
> > On Jan 14, 2016, at 10:42 AM, Satish Balay <balay at mcs.anl.gov> wrote:
> > 
> > Hopefully all changes should be documented in the changes file..
> > 
> > http://www.mcs.anl.gov/petsc/documentation/changes/dev.html
> > 
> > You can use git to find out more info..
> > 
> > balay at asterix /home/balay/petsc (master=)
> > $ git grep PetscOptionsGetScalar include/
> > include/petscoptions.h:PETSC_EXTERN PetscErrorCode PetscOptionsGetScalar(PetscOptions,const char[],const char[],PetscScalar *,PetscBool *);
> > include/petscoptions.h:PETSC_EXTERN PetscErrorCode PetscOptionsGetScalarArray(PetscOptions,const char[],const char[],PetscScalar[],PetscInt *,PetscBool *);
> > balay at asterix /home/balay/petsc (hzhang/update-networkex=)
> > $ git annotate include/petscoptions.h |grep PetscOptionsGetScalar
> > c5929fdf       (Barry Smith           2015-10-30 21:20:21 -0500 17)PETSC_EXTERN PetscErrorCode PetscOptionsGetScalar(PetscOptions,const char[],const char[],PetscScalar *,PetscBool *);
> > c5929fdf       (Barry Smith           2015-10-30 21:20:21 -0500 20)PETSC_EXTERN PetscErrorCode PetscOptionsGetScalarArray(PetscOptions,const char[],const char[],PetscScalar[],PetscInt *,PetscBool *);
> > balay at asterix /home/balay/petsc (hzhang/update-networkex=)
> > $ git show -q c5929fdf
> > commit c5929fdf3082647d199855a5c1d0286204349b03
> > Author: Barry Smith <bsmith at mcs.anl.gov>
> > Date:   Fri Oct 30 21:20:21 2015 -0500
> > 
> >    Complete update to new PetscOptions interface
> > balay at asterix /home/balay/petsc (hzhang/update-networkex=)
> > $ gitk c5929fdf
> > 
> > etc..
> > 
> > Satish
> > 
> > 
> > On Thu, 14 Jan 2016, Gideon Simpson wrote:
> > 
> >> I know I did a git pull recently, but when did that change?  What?s the fifth argument represent?
> >> 
> >> -gideon
> >> 
> >>> On Jan 13, 2016, at 11:54 PM, Satish Balay <balay at mcs.anl.gov> wrote:
> >>> 
> >>> On Wed, 13 Jan 2016, Gideon Simpson wrote:
> >>> 
> >>>> I haven?t seen this before:
> >>>> 
> >>>> /mnt/HA/opt/openmpi/intel/2015/1.8.1-mlnx-ofed/bin/mpicc -o fixed_batch.o -c -fPIC  -wd1572 -g   -I/home/simpson/software/petsc/include -I/home/simpson/software/petsc/arch-linux2-c-debug/include -I/mnt/HA/opt/openmpi/intel/2015/1.8.1-mlnx-ofed/include   -Wall `pwd`/fixed_batch.c
> >>>> /home/simpson/projects/dnls/petsc/fixed_batch.c(44): warning #167: argument of type "PetscScalar={PetscReal={double}} *" is incompatible with parameter of type "const char *"
> >>>>     PetscOptionsGetScalar(NULL,"-xmax",&xmax,NULL);
> >>>>                                        ^
> >>>> 
> >>>> /home/simpson/projects/dnls/petsc/fixed_batch.c(44): error #165: too few arguments in function call
> >>>>     PetscOptionsGetScalar(NULL,"-xmax",&xmax,NULL);
> >>> 
> >>> Try:
> >>> 
> >>>     PetscOptionsGetScalar(NULL,NULL,"-xmax",&xmax,NULL);
> >>> 
> >>> Satish
> >> 
> >> 
> 
> 

From knepley at gmail.com  Thu Jan 14 11:52:44 2016
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 14 Jan 2016 11:52:44 -0600
Subject: [petsc-users] compiler error
In-Reply-To: <128DCD06-7F08-41A0-A581-CD081A14B429@gmail.com>
References: <18F5EB28-AE2E-4E28-B95E-2D6AD1DBECEE@gmail.com>
	<alpine.LFD.2.20.1601132253120.17714@asterix>
	<DC4DEF9E-A500-41F5-901D-75CEDB981956@gmail.com>
	<alpine.LFD.2.20.1601140940090.18400@asterix>
	<128DCD06-7F08-41A0-A581-CD081A14B429@gmail.com>
Message-ID: <CAMYG4GnF+7soGiqJjRL6sn9Vsd44VL3HmP8cPf=jpg6W_fyTxA@mail.gmail.com>

On Thu, Jan 14, 2016 at 11:40 AM, Gideon Simpson <gideon.simpson at gmail.com>
wrote:

> Is this change going to be part of the next patch release, or the eventual
> 3.7?
>

Its in master, so it will be 3.7

  Thanks,

    Matt


> -gideon
>
> On Jan 14, 2016, at 10:42 AM, Satish Balay <balay at mcs.anl.gov> wrote:
>
> Hopefully all changes should be documented in the changes file..
>
> http://www.mcs.anl.gov/petsc/documentation/changes/dev.html
>
> You can use git to find out more info..
>
> balay at asterix /home/balay/petsc (master=)
> $ git grep PetscOptionsGetScalar include/
> include/petscoptions.h:PETSC_EXTERN PetscErrorCode
> PetscOptionsGetScalar(PetscOptions,const char[],const char[],PetscScalar
> *,PetscBool *);
> include/petscoptions.h:PETSC_EXTERN PetscErrorCode
> PetscOptionsGetScalarArray(PetscOptions,const char[],const
> char[],PetscScalar[],PetscInt *,PetscBool *);
> balay at asterix /home/balay/petsc (hzhang/update-networkex=)
> $ git annotate include/petscoptions.h |grep PetscOptionsGetScalar
> c5929fdf       (Barry Smith           2015-10-30 21:20:21 -0500
> 17)PETSC_EXTERN PetscErrorCode PetscOptionsGetScalar(PetscOptions,const
> char[],const char[],PetscScalar *,PetscBool *);
> c5929fdf       (Barry Smith           2015-10-30 21:20:21 -0500
> 20)PETSC_EXTERN PetscErrorCode
> PetscOptionsGetScalarArray(PetscOptions,const char[],const
> char[],PetscScalar[],PetscInt *,PetscBool *);
> balay at asterix /home/balay/petsc (hzhang/update-networkex=)
> $ git show -q c5929fdf
> commit c5929fdf3082647d199855a5c1d0286204349b03
> Author: Barry Smith <bsmith at mcs.anl.gov>
> Date:   Fri Oct 30 21:20:21 2015 -0500
>
>    Complete update to new PetscOptions interface
> balay at asterix /home/balay/petsc (hzhang/update-networkex=)
> $ gitk c5929fdf
>
> etc..
>
> Satish
>
>
> On Thu, 14 Jan 2016, Gideon Simpson wrote:
>
> I know I did a git pull recently, but when did that change?  What?s the
> fifth argument represent?
>
> -gideon
>
> On Jan 13, 2016, at 11:54 PM, Satish Balay <balay at mcs.anl.gov> wrote:
>
> On Wed, 13 Jan 2016, Gideon Simpson wrote:
>
> I haven?t seen this before:
>
> /mnt/HA/opt/openmpi/intel/2015/1.8.1-mlnx-ofed/bin/mpicc -o fixed_batch.o
> -c -fPIC  -wd1572 -g   -I/home/simpson/software/petsc/include
> -I/home/simpson/software/petsc/arch-linux2-c-debug/include
> -I/mnt/HA/opt/openmpi/intel/2015/1.8.1-mlnx-ofed/include   -Wall
> `pwd`/fixed_batch.c
> /home/simpson/projects/dnls/petsc/fixed_batch.c(44): warning #167:
> argument of type "PetscScalar={PetscReal={double}} *" is incompatible with
> parameter of type "const char *"
>     PetscOptionsGetScalar(NULL,"-xmax",&xmax,NULL);
>                                        ^
>
> /home/simpson/projects/dnls/petsc/fixed_batch.c(44): error #165: too few
> arguments in function call
>     PetscOptionsGetScalar(NULL,"-xmax",&xmax,NULL);
>
>
> Try:
>
>     PetscOptionsGetScalar(NULL,NULL,"-xmax",&xmax,NULL);
>
> Satish
>
>
>
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160114/f62a98da/attachment.html>

From bsmith at mcs.anl.gov  Thu Jan 14 12:50:50 2016
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Thu, 14 Jan 2016 12:50:50 -0600
Subject: [petsc-users] Why use MATMPIBAIJ?
In-Reply-To: <CAJW_hKeU9MO8v+ub+5CjnaH3OpBkQNtGA8ZEiDtbYoQnSQ5rGQ@mail.gmail.com>
References: <CAP2=TMiuZV+DDPk6dqPjZ5Ndy8W5C5GaEreny7_FQuZ39i1gBw@mail.gmail.com>
	<2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov>
	<CAP2=TMhZeZ4JZ3AKARajab8CmFwdANnh0HOHqDDQRd8JfGkO3Q@mail.gmail.com>
	<DAA1C465-680F-4A5E-BA8F-9C6F7C055DAD@mcs.anl.gov>
	<CAP2=TMjwihEL1V7NPNByiMWfWfC5Q0MJyE1ByxnLokx1Df+OJA@mail.gmail.com>
	<CAJW_hKeU9MO8v+ub+5CjnaH3OpBkQNtGA8ZEiDtbYoQnSQ5rGQ@mail.gmail.com>
Message-ID: <E92A337F-61F4-4E9D-9840-23A337FA788A@mcs.anl.gov>


> On Jan 14, 2016, at 5:04 AM, Hoang Giang Bui <hgbk2008 at gmail.com> wrote:
> 
> This is a very interesting thread because use of block matrix improves the performance of AMG a lot. In my case is the elasticity problem.
> 
> One more question I like to ask, which is more on the performance of the solver. That if I have a coupled problem, says the point block is [u_x u_y u_z p] in which entries of p block in stiffness matrix is in a much smaller scale than u (p~1e-6, u~1e+8), then AMG with hypre in PETSc still scale? Also, is there a utility in PETSc which does automatic scaling of variables?

   We highly recommend scaling in your MODEL (as much as possible) to have similar scaling of the various variables see https://en.wikipedia.org/wiki/Nondimensionalization.

   The problem with trying to do the scaling numerically after you have discretized your model is that the effect of the finite arithmetic as you "rescale" means that you lose possibly all the accuracy during the rescaling. For example say your "badly scaled" matrix J has a condition number of 1.e15; now you apply a numerical algorithm to "rescale" the variables to get a much better conditioned matrix. Since the accuracy of the numerical algorithm depends on the conditioning of J it will give you essentially no digits correct (due to the finite arithmetic) in your new J prime and in the transformation between your new and old variables.

   Barry

> 
> Giang
> 
> On Thu, Jan 14, 2016 at 7:13 AM, Justin Chang <jychang48 at gmail.com> wrote:
> Okay that makes sense, thanks
> 
> On Wed, Jan 13, 2016 at 10:12 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> 
> > On Jan 13, 2016, at 10:24 PM, Justin Chang <jychang48 at gmail.com> wrote:
> >
> > Thanks Barry,
> >
> > 1) So for block matrices, the ja array is smaller. But what's the "hardware" explanation for this performance improvement? Does it have to do with spatial locality where you are more likely to reuse data in that ja array, or does it have to do with the fact that loading/storing smaller arrays are less likely to invoke a cache miss, thus reducing the amount of bandwidth?
> 
> There are two distinct reasons for the improvement:
> 
> 1) For 5 by 5 blocks the ja array is 1/25th the size. The "hardware" savings is that you have to load something that is much smaller than before. Cache/spatial locality have nothing to do with this particular improvement.
> 
> 2) The other improvement comes from the reuse of each x[j] value multiplied by 5 values (a column) of the little block. The hardware explanation is that x[j] can be reused in a register for the 5 multiplies (while otherwise it would have to come from cache to register 5 times and sometimes might even have been flushed from the cache so would have to come from memory). This is why we have code like
> 
>     for (j=0; j<n; j++) {
>       xb    = x + 5*(*idx++);
>       x1    = xb[0]; x2 = xb[1]; x3 = xb[2]; x4 = xb[3]; x5 = xb[4];
>       sum1 += v[0]*x1 + v[5]*x2 + v[10]*x3  + v[15]*x4 + v[20]*x5;
>       sum2 += v[1]*x1 + v[6]*x2 + v[11]*x3  + v[16]*x4 + v[21]*x5;
>       sum3 += v[2]*x1 + v[7]*x2 + v[12]*x3  + v[17]*x4 + v[22]*x5;
>       sum4 += v[3]*x1 + v[8]*x2 + v[13]*x3  + v[18]*x4 + v[23]*x5;
>       sum5 += v[4]*x1 + v[9]*x2 + v[14]*x3  + v[19]*x4 + v[24]*x5;
>       v    += 25;
>     }
> 
> to do the block multiple.
> 
> >
> > 2) So if one wants to assemble a monolithic matrix (i.e., aggregation of more than one dof per point) then using the BAIJ format is highly advisable. But if I want to form a nested matrix, say I am solving Stokes equation, then each "submatrix" is of AIJ format? Can these sub matrices also be BAIJ?
> 
>    Sure, but if you have separated all the variables of pressure, velocity_x, velocity_y, etc into there own regions of the vector then the block size for the sub matrices would be 1 so BAIJ does not help.
> 
>    There are Stokes solvers that use Vanka smoothing that keep the variables interlaced and hence would use BAIJ and NOT use fieldsplit
> 
> 
> >
> > Thanks,
> > Justin
> >
> > On Wed, Jan 13, 2016 at 9:12 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> > > On Jan 13, 2016, at 9:57 PM, Justin Chang <jychang48 at gmail.com> wrote:
> > >
> > > Hi all,
> > >
> > > 1) I am guessing MATMPIBAIJ could theoretically have better performance than simply using MATMPIAIJ. Why is that? Is it similar to the reasoning that block (dense) matrix-vector multiply is "faster" than simple matrix-vector?
> >
> >   See for example table 1 in http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.38.7668&rep=rep1&type=pdf
> >
> > >
> > > 2) I am looking through the manual and online documentation and it seems the term "block" used everywhere. In the section on "block matrices" (3.1.3 of the manual), it refers to field splitting, where you could either have a monolithic matrix or a nested matrix. Does that concept have anything to do with MATMPIBAIJ?
> >
> >    Unfortunately the numerical analysis literature uses the term block in multiple ways. For small blocks, sometimes called "point-block" with BAIJ and for very large blocks (where the blocks are sparse themselves). I used fieldsplit for big sparse blocks to try to avoid confusion in PETSc.
> > >
> > > It makes sense to me that one could create a BAIJ where if you have 5 dofs of the same type of physics (e.g., five different primary species of a geochemical reaction) per grid point, you could create a block size of 5. And if you have different physics (e.g., velocity and pressure) you would ideally want to separate them out (i.e., nested matrices) for better preconditioning.
> >
> >    Sometimes you put them together with BAIJ and sometimes you keep them separate with nested matrices.
> >
> > >
> > > Thanks,
> > > Justin
> >
> >
> 
> 
> 


From jed at jedbrown.org  Thu Jan 14 12:57:37 2016
From: jed at jedbrown.org (Jed Brown)
Date: Thu, 14 Jan 2016 11:57:37 -0700
Subject: [petsc-users] Why use MATMPIBAIJ?
In-Reply-To: <CAJW_hKeU9MO8v+ub+5CjnaH3OpBkQNtGA8ZEiDtbYoQnSQ5rGQ@mail.gmail.com>
References: <CAP2=TMiuZV+DDPk6dqPjZ5Ndy8W5C5GaEreny7_FQuZ39i1gBw@mail.gmail.com>
	<2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov>
	<CAP2=TMhZeZ4JZ3AKARajab8CmFwdANnh0HOHqDDQRd8JfGkO3Q@mail.gmail.com>
	<DAA1C465-680F-4A5E-BA8F-9C6F7C055DAD@mcs.anl.gov>
	<CAP2=TMjwihEL1V7NPNByiMWfWfC5Q0MJyE1ByxnLokx1Df+OJA@mail.gmail.com>
	<CAJW_hKeU9MO8v+ub+5CjnaH3OpBkQNtGA8ZEiDtbYoQnSQ5rGQ@mail.gmail.com>
Message-ID: <87fuy07zvi.fsf@jedbrown.org>

Hoang Giang Bui <hgbk2008 at gmail.com> writes:
> One more question I like to ask, which is more on the performance of the
> solver. That if I have a coupled problem, says the point block is [u_x u_y
> u_z p] in which entries of p block in stiffness matrix is in a much smaller
> scale than u (p~1e-6, u~1e+8), then AMG with hypre in PETSc still scale?

You should scale the model (as Barry says).  But the names of your
variables suggest that the system is a saddle point problem, in which
case there's a good chance AMG won't work at all.  For example,
BoomerAMG produces a singular preconditioner in similar contexts, such
that the preconditioned residual drops smoothly while the true residual
stagnates (the equations are not solved at all).  So be vary careful if
you think it's "working".
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 818 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160114/485d5054/attachment.pgp>

From bsmith at mcs.anl.gov  Thu Jan 14 13:08:15 2016
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Thu, 14 Jan 2016 13:08:15 -0600
Subject: [petsc-users] Why use MATMPIBAIJ?
In-Reply-To: <87fuy07zvi.fsf@jedbrown.org>
References: <CAP2=TMiuZV+DDPk6dqPjZ5Ndy8W5C5GaEreny7_FQuZ39i1gBw@mail.gmail.com>
	<2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov>
	<CAP2=TMhZeZ4JZ3AKARajab8CmFwdANnh0HOHqDDQRd8JfGkO3Q@mail.gmail.com>
	<DAA1C465-680F-4A5E-BA8F-9C6F7C055DAD@mcs.anl.gov>
	<CAP2=TMjwihEL1V7NPNByiMWfWfC5Q0MJyE1ByxnLokx1Df+OJA@mail.gmail.com>
	<CAJW_hKeU9MO8v+ub+5CjnaH3OpBkQNtGA8ZEiDtbYoQnSQ5rGQ@mail.gmail.com>
	<87fuy07zvi.fsf@jedbrown.org>
Message-ID: <B1B62B13-BE6F-4B51-BC1A-F45FEF379854@mcs.anl.gov>


> On Jan 14, 2016, at 12:57 PM, Jed Brown <jed at jedbrown.org> wrote:
> 
> Hoang Giang Bui <hgbk2008 at gmail.com> writes:
>> One more question I like to ask, which is more on the performance of the
>> solver. That if I have a coupled problem, says the point block is [u_x u_y
>> u_z p] in which entries of p block in stiffness matrix is in a much smaller
>> scale than u (p~1e-6, u~1e+8), then AMG with hypre in PETSc still scale?
> 
> You should scale the model (as Barry says).  But the names of your
> variables suggest that the system is a saddle point problem, in which
> case there's a good chance AMG won't work at all.  For example,
> BoomerAMG produces a singular preconditioner in similar contexts, such
> that the preconditioned residual drops smoothly while the true residual
> stagnates (the equations are not solved at all).  So be vary careful if
> you think it's "working".

   The PCFIEDSPLIT preconditioner is designed for helping to solve saddle point problems.

   Barry


From bsmith at mcs.anl.gov  Thu Jan 14 13:24:25 2016
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Thu, 14 Jan 2016 13:24:25 -0600
Subject: [petsc-users] HPCToolKit/HPCViewer on OS X
In-Reply-To: <CCFA7B8B-2682-402D-9B83-E7ECD11ED56C@ad.unc.edu>
References: <669480D3-5CB2-4A0E-B6BD-FF4B702BB5A7@ad.unc.edu>
	<CAMYG4GnReTGLYVxZ5wr3EUmi9L+6UnZHpO5=0S60SPv3jbrzSw@mail.gmail.com>
	<6780B32F-E05B-4FDB-9671-1A0C963B3B85@ad.unc.edu>
	<2A7F8BF5-7E17-4774-A13E-E75394109F95@email.unc.edu>
	<CCFA7B8B-2682-402D-9B83-E7ECD11ED56C@ad.unc.edu>
Message-ID: <3586468C-25DB-46B6-B9E6-04853D234025@mcs.anl.gov>


   Matt is right, there is a lot of "missing" time from the output. Please send the output from -ksp_view so we can see exactly what solver is being used. 

   From the output we have:

    Nonlinear solver 78 % of the time (so your "setup code" outside of PETSC is taking about 22% of the time)
    Linear solver 77 % of the time (this is reasonable pretty much the entire cost of the nonlinear solve is the linear solve)
    Time to set up the preconditioner is 19%  (10 + 9)  
    Time of iteration in KSP 35 % (this is the sum of the vector operations and MatMult() and MatSolve())

     So 77 - (19 + 35) = 23 % unexplained time inside the linear solver (custom preconditioner???)

    Also getting the results with Instruments or HPCToolkit would be useful (so long as we don't need to install HPCTool ourselves to see the results).


   Barry

> On Jan 14, 2016, at 1:26 AM, Bhalla, Amneet Pal S <amneetb at live.unc.edu> wrote:
> 
> 
> 
>> On Jan 13, 2016, at 9:17 PM, Griffith, Boyce Eugene <boyceg at email.unc.edu> wrote:
>> 
>> I see one hot spot:
> 
> 
> Here is with opt build
> 
> ************************************************************************************************************************
> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
> ************************************************************************************************************************
> 
> ---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
> 
> ./main2d on a linux-opt named aorta with 1 processor, by amneetb Thu Jan 14 02:24:43 2016
> Using Petsc Development GIT revision: v3.6.3-3098-ga3ecda2  GIT Date: 2016-01-13 21:30:26 -0600
> 
>                          Max       Max/Min        Avg      Total 
> Time (sec):           1.018e+00      1.00000   1.018e+00
> Objects:              2.935e+03      1.00000   2.935e+03
> Flops:                4.957e+08      1.00000   4.957e+08  4.957e+08
> Flops/sec:            4.868e+08      1.00000   4.868e+08  4.868e+08
> MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
> MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
> MPI Reductions:       0.000e+00      0.00000
> 
> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
>                             e.g., VecAXPY() for real vectors of length N --> 2N flops
>                             and VecAXPY() for complex vectors of length N --> 8N flops
> 
> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
>                         Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
>  0:      Main Stage: 1.0183e+00 100.0%  4.9570e+08 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0% 
> 
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on interpreting output.
> Phase summary info:
>    Count: number of times phase was executed
>    Time and Flops: Max - maximum over all processors
>                    Ratio - ratio of maximum to minimum over all processors
>    Mess: number of messages sent
>    Avg. len: average message length (bytes)
>    Reduct: number of global reductions
>    Global: entire computation
>    Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
>       %T - percent time in this phase         %F - percent flops in this phase
>       %M - percent messages in this phase     %L - percent message lengths in this phase
>       %R - percent reductions in this phase
>    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
> ------------------------------------------------------------------------------------------------------------------------
> Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
> 
> --- Event Stage 0: Main Stage
> 
> VecDot                 4 1.0 2.9564e-05 1.0 3.31e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1120
> VecDotNorm2          272 1.0 1.4565e-03 1.0 4.25e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  2920
> VecMDot              624 1.0 8.4300e-03 1.0 5.29e+06 1.0 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1  0  0  0   627
> VecNorm              565 1.0 3.8033e-03 1.0 4.38e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  1151
> VecScale              86 1.0 5.5480e-04 1.0 1.55e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   279
> VecCopy               28 1.0 5.2261e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSet             14567 1.0 1.2443e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
> VecAXPY              903 1.0 4.2996e-03 1.0 6.66e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  1550
> VecAYPX              225 1.0 1.2550e-03 1.0 8.55e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   681
> VecAXPBYCZ            42 1.0 1.7118e-04 1.0 3.45e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  2014
> VecWAXPY              70 1.0 1.9503e-04 1.0 2.98e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1528
> VecMAXPY             641 1.0 1.1136e-02 1.0 5.29e+06 1.0 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1  0  0  0   475
> VecSwap              135 1.0 4.5896e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAssemblyBegin     745 1.0 4.9477e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAssemblyEnd       745 1.0 9.2411e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecScatterBegin    40831 1.0 3.4502e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0     0
> BuildTwoSidedF       738 1.0 2.6712e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatMult              513 1.0 9.1235e-02 1.0 7.75e+07 1.0 0.0e+00 0.0e+00 0.0e+00  9 16  0  0  0   9 16  0  0  0   849
> MatSolve           13568 1.0 2.3605e-01 1.0 3.45e+08 1.0 0.0e+00 0.0e+00 0.0e+00 23 70  0  0  0  23 70  0  0  0  1460
> MatLUFactorSym        84 1.0 3.7430e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  4  0  0  0  0   4  0  0  0  0     0
> MatLUFactorNum        85 1.0 3.9623e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00  4  8  0  0  0   4  8  0  0  0  1058
> MatILUFactorSym        1 1.0 3.3617e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatScale               4 1.0 2.5511e-04 1.0 2.51e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   984
> MatAssemblyBegin     108 1.0 6.3658e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyEnd       108 1.0 2.9490e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetRow          33120 1.0 2.0157e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
> MatGetRowIJ           85 1.0 1.2145e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetSubMatrice       4 1.0 8.4379e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
> MatGetOrdering        85 1.0 7.7887e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
> MatAXPY                4 1.0 4.9596e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  5  0  0  0  0   5  0  0  0  0     0
> MatPtAP                4 1.0 4.4426e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 0.0e+00  4  1  0  0  0   4  1  0  0  0   112
> MatPtAPSymbolic        4 1.0 2.7664e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0     0
> MatPtAPNumeric         4 1.0 1.6732e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   298
> MatGetSymTrans         4 1.0 3.6621e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPGMRESOrthog        16 1.0 9.7778e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
> KSPSetUp              90 1.0 5.7650e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve               1 1.0 7.8831e-01 1.0 4.90e+08 1.0 0.0e+00 0.0e+00 0.0e+00 77 99  0  0  0  77 99  0  0  0   622
> PCSetUp               90 1.0 9.9725e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00 10  8  0  0  0  10  8  0  0  0   420
> PCSetUpOnBlocks      112 1.0 8.7547e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00  9  8  0  0  0   9  8  0  0  0   479
> PCApply               16 1.0 7.1952e-01 1.0 4.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00 71 99  0  0  0  71 99  0  0  0   680
> SNESSolve              1 1.0 7.9225e-01 1.0 4.90e+08 1.0 0.0e+00 0.0e+00 0.0e+00 78 99  0  0  0  78 99  0  0  0   619
> SNESFunctionEval       2 1.0 3.2940e-03 1.0 4.68e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    14
> SNESJacobianEval       1 1.0 4.7255e-04 1.0 4.26e+03 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     9
> ------------------------------------------------------------------------------------------------------------------------
> 
> Memory usage is given in bytes:
> 
> Object Type          Creations   Destructions     Memory  Descendants' Mem.
> Reports information only for process 0.
> 
> --- Event Stage 0: Main Stage
> 
>               Vector   971            839     15573352     0.
>       Vector Scatter   290            289       189584     0.
>            Index Set  1171            823       951928     0.
>    IS L to G Mapping   110            109      2156656     0.
>    Application Order     6              6        99952     0.
>              MatMFFD     1              1          776     0.
>               Matrix   189            189     24083332     0.
>    Matrix Null Space     4              4         2432     0.
>        Krylov Solver    90             90       122720     0.
>      DMKSP interface     1              1          648     0.
>       Preconditioner    90             90        89872     0.
>                 SNES     1              1         1328     0.
>       SNESLineSearch     1              1          984     0.
>               DMSNES     1              1          664     0.
>     Distributed Mesh     2              2         9168     0.
> Star Forest Bipartite Graph     4              4         3168     0.
>      Discrete System     2              2         1712     0.
>               Viewer     1              0            0     0.
> ========================================================================================================================
> Average time to get PetscTime(): 9.53674e-07
> #PETSc Option Table entries:
> -ib_ksp_converged_reason
> -ib_ksp_monitor_true_residual
> -ib_snes_type ksponly
> -log_summary
> -stokes_ib_pc_level_0_sub_pc_factor_nonzeros_along_diagonal
> -stokes_ib_pc_level_0_sub_pc_type ilu
> -stokes_ib_pc_level_ksp_richardson_self_scale
> -stokes_ib_pc_level_ksp_type richardson
> -stokes_ib_pc_level_pc_asm_local_type additive
> -stokes_ib_pc_level_sub_pc_factor_nonzeros_along_diagonal
> -stokes_ib_pc_level_sub_pc_type lu
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
> Configure options: --CC=mpicc --CXX=mpicxx --FC=mpif90 --with-default-arch=0 --PETSC_ARCH=linux-opt --with-debugging=0 --with-c++-support=1 --with-hypre=1 --download-hypre=1 --with-hdf5=yes --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3
> -----------------------------------------
> Libraries compiled on Thu Jan 14 01:29:56 2016 on aorta 
> Machine characteristics: Linux-3.13.0-63-generic-x86_64-with-Ubuntu-14.04-trusty
> Using PETSc directory: /not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc
> Using PETSc arch: linux-opt
> -----------------------------------------
> 
> Using C compiler: mpicc  -fPIC  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -Qunused-arguments -O3  ${COPTFLAGS} ${CFLAGS}
> Using Fortran compiler: mpif90  -fPIC -Wall -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -O3   ${FOPTFLAGS} ${FFLAGS} 
> -----------------------------------------
> 
> Using include paths: -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/include -I/not_backed_up/softwares/MPICH/include
> -----------------------------------------
> 
> Using C linker: mpicc
> Using Fortran linker: mpif90
> Using libraries: -Wl,-rpath,/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -L/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -lpetsc -Wl,-rpath,/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -L/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -lHYPRE -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -L/not_backed_up/softwares/MPICH/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpicxx -lstdc++ -llapack -lblas -lpthread -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lX11 -lm -lmpifort -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpicxx -lstdc++ -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -L/not_backed_up/softwares/MPICH/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -ldl -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -lmpi -lgcc_s -ldl 
> -----------------------------------------


From boyceg at email.unc.edu  Thu Jan 14 14:01:10 2016
From: boyceg at email.unc.edu (Griffith, Boyce Eugene)
Date: Thu, 14 Jan 2016 20:01:10 +0000
Subject: [petsc-users] HPCToolKit/HPCViewer on OS X
In-Reply-To: <3586468C-25DB-46B6-B9E6-04853D234025@mcs.anl.gov>
References: <669480D3-5CB2-4A0E-B6BD-FF4B702BB5A7@ad.unc.edu>
	<CAMYG4GnReTGLYVxZ5wr3EUmi9L+6UnZHpO5=0S60SPv3jbrzSw@mail.gmail.com>
	<6780B32F-E05B-4FDB-9671-1A0C963B3B85@ad.unc.edu>
	<2A7F8BF5-7E17-4774-A13E-E75394109F95@email.unc.edu>
	<CCFA7B8B-2682-402D-9B83-E7ECD11ED56C@ad.unc.edu>
	<3586468C-25DB-46B6-B9E6-04853D234025@mcs.anl.gov>
Message-ID: <61E5EA75-AC79-4CB4-8114-645D978EEEC6@email.unc.edu>


> On Jan 14, 2016, at 2:24 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> 
> 
>   Matt is right, there is a lot of "missing" time from the output. Please send the output from -ksp_view so we can see exactly what solver is being used. 
> 
>   From the output we have:
> 
>    Nonlinear solver 78 % of the time (so your "setup code" outside of PETSC is taking about 22% of the time)
>    Linear solver 77 % of the time (this is reasonable pretty much the entire cost of the nonlinear solve is the linear solve)
>    Time to set up the preconditioner is 19%  (10 + 9)  
>    Time of iteration in KSP 35 % (this is the sum of the vector operations and MatMult() and MatSolve())
> 
>     So 77 - (19 + 35) = 23 % unexplained time inside the linear solver (custom preconditioner???)
> 
>    Also getting the results with Instruments or HPCToolkit would be useful (so long as we don't need to install HPCTool ourselves to see the results).

Thanks, Barry (& Matt & Dave) --- This is a solver that is mixing some matrix-based stuff implemented using PETSc along with some matrix-free stuff that is built on top of SAMRAI. Amneet and I should take a look at performance off-list first.

-- Boyce

> 
> 
>   Barry
> 
>> On Jan 14, 2016, at 1:26 AM, Bhalla, Amneet Pal S <amneetb at live.unc.edu> wrote:
>> 
>> 
>> 
>>> On Jan 13, 2016, at 9:17 PM, Griffith, Boyce Eugene <boyceg at email.unc.edu> wrote:
>>> 
>>> I see one hot spot:
>> 
>> 
>> Here is with opt build
>> 
>> ************************************************************************************************************************
>> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
>> ************************************************************************************************************************
>> 
>> ---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
>> 
>> ./main2d on a linux-opt named aorta with 1 processor, by amneetb Thu Jan 14 02:24:43 2016
>> Using Petsc Development GIT revision: v3.6.3-3098-ga3ecda2  GIT Date: 2016-01-13 21:30:26 -0600
>> 
>>                         Max       Max/Min        Avg      Total 
>> Time (sec):           1.018e+00      1.00000   1.018e+00
>> Objects:              2.935e+03      1.00000   2.935e+03
>> Flops:                4.957e+08      1.00000   4.957e+08  4.957e+08
>> Flops/sec:            4.868e+08      1.00000   4.868e+08  4.868e+08
>> MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
>> MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
>> MPI Reductions:       0.000e+00      0.00000
>> 
>> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
>>                            e.g., VecAXPY() for real vectors of length N --> 2N flops
>>                            and VecAXPY() for complex vectors of length N --> 8N flops
>> 
>> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
>>                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
>> 0:      Main Stage: 1.0183e+00 100.0%  4.9570e+08 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0% 
>> 
>> ------------------------------------------------------------------------------------------------------------------------
>> See the 'Profiling' chapter of the users' manual for details on interpreting output.
>> Phase summary info:
>>   Count: number of times phase was executed
>>   Time and Flops: Max - maximum over all processors
>>                   Ratio - ratio of maximum to minimum over all processors
>>   Mess: number of messages sent
>>   Avg. len: average message length (bytes)
>>   Reduct: number of global reductions
>>   Global: entire computation
>>   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
>>      %T - percent time in this phase         %F - percent flops in this phase
>>      %M - percent messages in this phase     %L - percent message lengths in this phase
>>      %R - percent reductions in this phase
>>   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
>> ------------------------------------------------------------------------------------------------------------------------
>> Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
>>                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>> ------------------------------------------------------------------------------------------------------------------------
>> 
>> --- Event Stage 0: Main Stage
>> 
>> VecDot                 4 1.0 2.9564e-05 1.0 3.31e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1120
>> VecDotNorm2          272 1.0 1.4565e-03 1.0 4.25e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  2920
>> VecMDot              624 1.0 8.4300e-03 1.0 5.29e+06 1.0 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1  0  0  0   627
>> VecNorm              565 1.0 3.8033e-03 1.0 4.38e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  1151
>> VecScale              86 1.0 5.5480e-04 1.0 1.55e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   279
>> VecCopy               28 1.0 5.2261e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecSet             14567 1.0 1.2443e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>> VecAXPY              903 1.0 4.2996e-03 1.0 6.66e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  1550
>> VecAYPX              225 1.0 1.2550e-03 1.0 8.55e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   681
>> VecAXPBYCZ            42 1.0 1.7118e-04 1.0 3.45e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  2014
>> VecWAXPY              70 1.0 1.9503e-04 1.0 2.98e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1528
>> VecMAXPY             641 1.0 1.1136e-02 1.0 5.29e+06 1.0 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1  0  0  0   475
>> VecSwap              135 1.0 4.5896e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecAssemblyBegin     745 1.0 4.9477e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecAssemblyEnd       745 1.0 9.2411e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecScatterBegin    40831 1.0 3.4502e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0     0
>> BuildTwoSidedF       738 1.0 2.6712e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatMult              513 1.0 9.1235e-02 1.0 7.75e+07 1.0 0.0e+00 0.0e+00 0.0e+00  9 16  0  0  0   9 16  0  0  0   849
>> MatSolve           13568 1.0 2.3605e-01 1.0 3.45e+08 1.0 0.0e+00 0.0e+00 0.0e+00 23 70  0  0  0  23 70  0  0  0  1460
>> MatLUFactorSym        84 1.0 3.7430e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  4  0  0  0  0   4  0  0  0  0     0
>> MatLUFactorNum        85 1.0 3.9623e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00  4  8  0  0  0   4  8  0  0  0  1058
>> MatILUFactorSym        1 1.0 3.3617e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatScale               4 1.0 2.5511e-04 1.0 2.51e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   984
>> MatAssemblyBegin     108 1.0 6.3658e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatAssemblyEnd       108 1.0 2.9490e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatGetRow          33120 1.0 2.0157e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
>> MatGetRowIJ           85 1.0 1.2145e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatGetSubMatrice       4 1.0 8.4379e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>> MatGetOrdering        85 1.0 7.7887e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>> MatAXPY                4 1.0 4.9596e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  5  0  0  0  0   5  0  0  0  0     0
>> MatPtAP                4 1.0 4.4426e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 0.0e+00  4  1  0  0  0   4  1  0  0  0   112
>> MatPtAPSymbolic        4 1.0 2.7664e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0     0
>> MatPtAPNumeric         4 1.0 1.6732e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   298
>> MatGetSymTrans         4 1.0 3.6621e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> KSPGMRESOrthog        16 1.0 9.7778e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>> KSPSetUp              90 1.0 5.7650e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> KSPSolve               1 1.0 7.8831e-01 1.0 4.90e+08 1.0 0.0e+00 0.0e+00 0.0e+00 77 99  0  0  0  77 99  0  0  0   622
>> PCSetUp               90 1.0 9.9725e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00 10  8  0  0  0  10  8  0  0  0   420
>> PCSetUpOnBlocks      112 1.0 8.7547e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00  9  8  0  0  0   9  8  0  0  0   479
>> PCApply               16 1.0 7.1952e-01 1.0 4.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00 71 99  0  0  0  71 99  0  0  0   680
>> SNESSolve              1 1.0 7.9225e-01 1.0 4.90e+08 1.0 0.0e+00 0.0e+00 0.0e+00 78 99  0  0  0  78 99  0  0  0   619
>> SNESFunctionEval       2 1.0 3.2940e-03 1.0 4.68e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    14
>> SNESJacobianEval       1 1.0 4.7255e-04 1.0 4.26e+03 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     9
>> ------------------------------------------------------------------------------------------------------------------------
>> 
>> Memory usage is given in bytes:
>> 
>> Object Type          Creations   Destructions     Memory  Descendants' Mem.
>> Reports information only for process 0.
>> 
>> --- Event Stage 0: Main Stage
>> 
>>              Vector   971            839     15573352     0.
>>      Vector Scatter   290            289       189584     0.
>>           Index Set  1171            823       951928     0.
>>   IS L to G Mapping   110            109      2156656     0.
>>   Application Order     6              6        99952     0.
>>             MatMFFD     1              1          776     0.
>>              Matrix   189            189     24083332     0.
>>   Matrix Null Space     4              4         2432     0.
>>       Krylov Solver    90             90       122720     0.
>>     DMKSP interface     1              1          648     0.
>>      Preconditioner    90             90        89872     0.
>>                SNES     1              1         1328     0.
>>      SNESLineSearch     1              1          984     0.
>>              DMSNES     1              1          664     0.
>>    Distributed Mesh     2              2         9168     0.
>> Star Forest Bipartite Graph     4              4         3168     0.
>>     Discrete System     2              2         1712     0.
>>              Viewer     1              0            0     0.
>> ========================================================================================================================
>> Average time to get PetscTime(): 9.53674e-07
>> #PETSc Option Table entries:
>> -ib_ksp_converged_reason
>> -ib_ksp_monitor_true_residual
>> -ib_snes_type ksponly
>> -log_summary
>> -stokes_ib_pc_level_0_sub_pc_factor_nonzeros_along_diagonal
>> -stokes_ib_pc_level_0_sub_pc_type ilu
>> -stokes_ib_pc_level_ksp_richardson_self_scale
>> -stokes_ib_pc_level_ksp_type richardson
>> -stokes_ib_pc_level_pc_asm_local_type additive
>> -stokes_ib_pc_level_sub_pc_factor_nonzeros_along_diagonal
>> -stokes_ib_pc_level_sub_pc_type lu
>> #End of PETSc Option Table entries
>> Compiled without FORTRAN kernels
>> Compiled with full precision matrices (default)
>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
>> Configure options: --CC=mpicc --CXX=mpicxx --FC=mpif90 --with-default-arch=0 --PETSC_ARCH=linux-opt --with-debugging=0 --with-c++-support=1 --with-hypre=1 --download-hypre=1 --with-hdf5=yes --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3
>> -----------------------------------------
>> Libraries compiled on Thu Jan 14 01:29:56 2016 on aorta 
>> Machine characteristics: Linux-3.13.0-63-generic-x86_64-with-Ubuntu-14.04-trusty
>> Using PETSc directory: /not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc
>> Using PETSc arch: linux-opt
>> -----------------------------------------
>> 
>> Using C compiler: mpicc  -fPIC  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -Qunused-arguments -O3  ${COPTFLAGS} ${CFLAGS}
>> Using Fortran compiler: mpif90  -fPIC -Wall -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -O3   ${FOPTFLAGS} ${FFLAGS} 
>> -----------------------------------------
>> 
>> Using include paths: -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/include -I/not_backed_up/softwares/MPICH/include
>> -----------------------------------------
>> 
>> Using C linker: mpicc
>> Using Fortran linker: mpif90
>> Using libraries: -Wl,-rpath,/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -L/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -lpetsc -Wl,-rpath,/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -L/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -lHYPRE -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -L/not_backed_up/softwares/MPICH/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpicxx -lstdc++ -llapack -lblas -lpthread -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lX11 -lm -lmpifort -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpicxx -lstdc++ -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -L/not_backed_up/softwares/MPICH/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -ldl -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -lmpi -lgcc_s -ldl 
>> -----------------------------------------


From bsmith at mcs.anl.gov  Thu Jan 14 14:09:12 2016
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Thu, 14 Jan 2016 14:09:12 -0600
Subject: [petsc-users] HPCToolKit/HPCViewer on OS X
In-Reply-To: <61E5EA75-AC79-4CB4-8114-645D978EEEC6@email.unc.edu>
References: <669480D3-5CB2-4A0E-B6BD-FF4B702BB5A7@ad.unc.edu>
	<CAMYG4GnReTGLYVxZ5wr3EUmi9L+6UnZHpO5=0S60SPv3jbrzSw@mail.gmail.com>
	<6780B32F-E05B-4FDB-9671-1A0C963B3B85@ad.unc.edu>
	<2A7F8BF5-7E17-4774-A13E-E75394109F95@email.unc.edu>
	<CCFA7B8B-2682-402D-9B83-E7ECD11ED56C@ad.unc.edu>
	<3586468C-25DB-46B6-B9E6-04853D234025@mcs.anl.gov>
	<61E5EA75-AC79-4CB4-8114-645D978EEEC6@email.unc.edu>
Message-ID: <D93D18EF-CBBD-4D12-8CFC-178E7F773749@mcs.anl.gov>


> On Jan 14, 2016, at 2:01 PM, Griffith, Boyce Eugene <boyceg at email.unc.edu> wrote:
> 
>> 
>> On Jan 14, 2016, at 2:24 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>> 
>> 
>>  Matt is right, there is a lot of "missing" time from the output. Please send the output from -ksp_view so we can see exactly what solver is being used. 
>> 
>>  From the output we have:
>> 
>>   Nonlinear solver 78 % of the time (so your "setup code" outside of PETSC is taking about 22% of the time)
>>   Linear solver 77 % of the time (this is reasonable pretty much the entire cost of the nonlinear solve is the linear solve)
>>   Time to set up the preconditioner is 19%  (10 + 9)  
>>   Time of iteration in KSP 35 % (this is the sum of the vector operations and MatMult() and MatSolve())
>> 
>>    So 77 - (19 + 35) = 23 % unexplained time inside the linear solver (custom preconditioner???)
>> 
>>   Also getting the results with Instruments or HPCToolkit would be useful (so long as we don't need to install HPCTool ourselves to see the results).
> 
> Thanks, Barry (& Matt & Dave) --- This is a solver that is mixing some matrix-based stuff implemented using PETSc along with some matrix-free stuff that is built on top of SAMRAI. Amneet and I should take a look at performance off-list first.

   Just put an PetscLogEvent() in (or several) to track that part. Plus put an event or two outside the SNESSolve to track the outside PETSc setup time.

   The PETSc time looks reasonable at most I can only image any optimizations we could do bringing it down a small percentage.


   Barry

> 
> -- Boyce
> 
>> 
>> 
>>  Barry
>> 
>>> On Jan 14, 2016, at 1:26 AM, Bhalla, Amneet Pal S <amneetb at live.unc.edu> wrote:
>>> 
>>> 
>>> 
>>>> On Jan 13, 2016, at 9:17 PM, Griffith, Boyce Eugene <boyceg at email.unc.edu> wrote:
>>>> 
>>>> I see one hot spot:
>>> 
>>> 
>>> Here is with opt build
>>> 
>>> ************************************************************************************************************************
>>> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
>>> ************************************************************************************************************************
>>> 
>>> ---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
>>> 
>>> ./main2d on a linux-opt named aorta with 1 processor, by amneetb Thu Jan 14 02:24:43 2016
>>> Using Petsc Development GIT revision: v3.6.3-3098-ga3ecda2  GIT Date: 2016-01-13 21:30:26 -0600
>>> 
>>>                        Max       Max/Min        Avg      Total 
>>> Time (sec):           1.018e+00      1.00000   1.018e+00
>>> Objects:              2.935e+03      1.00000   2.935e+03
>>> Flops:                4.957e+08      1.00000   4.957e+08  4.957e+08
>>> Flops/sec:            4.868e+08      1.00000   4.868e+08  4.868e+08
>>> MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
>>> MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
>>> MPI Reductions:       0.000e+00      0.00000
>>> 
>>> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
>>>                           e.g., VecAXPY() for real vectors of length N --> 2N flops
>>>                           and VecAXPY() for complex vectors of length N --> 8N flops
>>> 
>>> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
>>>                       Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
>>> 0:      Main Stage: 1.0183e+00 100.0%  4.9570e+08 100.0%  0.000e+00   0.0%  0.000e+00        0.0% 0.000e+00   0.0% 
>>> 
>>> ------------------------------------------------------------------------------------------------------------------------
>>> See the 'Profiling' chapter of the users' manual for details on interpreting output.
>>> Phase summary info:
>>>  Count: number of times phase was executed
>>>  Time and Flops: Max - maximum over all processors
>>>                  Ratio - ratio of maximum to minimum over all processors
>>>  Mess: number of messages sent
>>>  Avg. len: average message length (bytes)
>>>  Reduct: number of global reductions
>>>  Global: entire computation
>>>  Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
>>>     %T - percent time in this phase         %F - percent flops in this phase
>>>     %M - percent messages in this phase     %L - percent message lengths in this phase
>>>     %R - percent reductions in this phase
>>>  Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
>>> ------------------------------------------------------------------------------------------------------------------------
>>> Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
>>>                  Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>>> ------------------------------------------------------------------------------------------------------------------------
>>> 
>>> --- Event Stage 0: Main Stage
>>> 
>>> VecDot                 4 1.0 2.9564e-05 1.0 3.31e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0 0  1120
>>> VecDotNorm2          272 1.0 1.4565e-03 1.0 4.25e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1 0  0  0  2920
>>> VecMDot              624 1.0 8.4300e-03 1.0 5.29e+06 1.0 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1  0 0  0   627
>>> VecNorm              565 1.0 3.8033e-03 1.0 4.38e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0 0  0  1151
>>> VecScale              86 1.0 5.5480e-04 1.0 1.55e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0 0  0   279
>>> VecCopy               28 1.0 5.2261e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0 0  0     0
>>> VecSet             14567 1.0 1.2443e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0 0  0     0
>>> VecAXPY              903 1.0 4.2996e-03 1.0 6.66e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0 0  0  1550
>>> VecAYPX              225 1.0 1.2550e-03 1.0 8.55e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0 0  0   681
>>> VecAXPBYCZ            42 1.0 1.7118e-04 1.0 3.45e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0 0  0  0  2014
>>> VecWAXPY              70 1.0 1.9503e-04 1.0 2.98e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0 0  0  0  1528
>>> VecMAXPY             641 1.0 1.1136e-02 1.0 5.29e+06 1.0 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1 0  0  0   475
>>> VecSwap              135 1.0 4.5896e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0 0  0     0
>>> VecAssemblyBegin     745 1.0 4.9477e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0 0  0  0  0     0
>>> VecAssemblyEnd       745 1.0 9.2411e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0 0  0  0  0     0
>>> VecScatterBegin    40831 1.0 3.4502e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3 0  0  0  0     0
>>> BuildTwoSidedF       738 1.0 2.6712e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0 0  0  0     0
>>> MatMult              513 1.0 9.1235e-02 1.0 7.75e+07 1.0 0.0e+00 0.0e+00 0.0e+00  9 16  0  0  0   9 16  0 0  0   849
>>> MatSolve           13568 1.0 2.3605e-01 1.0 3.45e+08 1.0 0.0e+00 0.0e+00 0.0e+00 23 70  0  0  0  23 70 0  0  0  1460
>>> MatLUFactorSym        84 1.0 3.7430e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  4  0  0  0  0   4  0 0  0  0     0
>>> MatLUFactorNum        85 1.0 3.9623e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00  4  8  0  0  0   4  8 0  0  0  1058
>>> MatILUFactorSym        1 1.0 3.3617e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0 0  0  0     0
>>> MatScale               4 1.0 2.5511e-04 1.0 2.51e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0 0   984
>>> MatAssemblyBegin     108 1.0 6.3658e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0 0  0  0  0     0
>>> MatAssemblyEnd       108 1.0 2.9490e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0 0  0  0  0     0
>>> MatGetRow          33120 1.0 2.0157e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0 0  0  0     0
>>> MatGetRowIJ           85 1.0 1.2145e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0 0  0  0     0
>>> MatGetSubMatrice       4 1.0 8.4379e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0 0  0  0     0
>>> MatGetOrdering        85 1.0 7.7887e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0 0  0  0     0
>>> MatAXPY                4 1.0 4.9596e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  5  0  0  0  0   5  0  0 0  0     0
>>> MatPtAP                4 1.0 4.4426e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 0.0e+00  4  1  0  0  0   4  1  0  0 0   112
>>> MatPtAPSymbolic        4 1.0 2.7664e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0 0  0  0     0
>>> MatPtAPNumeric         4 1.0 1.6732e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1 0  0  0   298
>>> MatGetSymTrans         4 1.0 3.6621e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0 0  0  0     0
>>> KSPGMRESOrthog        16 1.0 9.7778e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1 0  0  0  0     0
>>> KSPSetUp              90 1.0 5.7650e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0 0  0     0
>>> KSPSolve               1 1.0 7.8831e-01 1.0 4.90e+08 1.0 0.0e+00 0.0e+00 0.0e+00 77 99  0  0  0  77 99 0  0  0   622
>>> PCSetUp               90 1.0 9.9725e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00 10  8  0  0  0  10  8  0 0  0   420
>>> PCSetUpOnBlocks      112 1.0 8.7547e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00  9  8  0  0  0   9 8  0  0  0   479
>>> PCApply               16 1.0 7.1952e-01 1.0 4.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00 71 99  0  0  0  71 99 0  0  0   680
>>> SNESSolve              1 1.0 7.9225e-01 1.0 4.90e+08 1.0 0.0e+00 0.0e+00 0.0e+00 78 99  0  0  0  78 99 0  0  0   619
>>> SNESFunctionEval       2 1.0 3.2940e-03 1.0 4.68e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0 0  0  0    14
>>> SNESJacobianEval       1 1.0 4.7255e-04 1.0 4.26e+03 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0 0  0  0     9
>>> ------------------------------------------------------------------------------------------------------------------------
>>> 
>>> Memory usage is given in bytes:
>>> 
>>> Object Type          Creations   Destructions     Memory  Descendants' Mem.
>>> Reports information only for process 0.
>>> 
>>> --- Event Stage 0: Main Stage
>>> 
>>>             Vector   971            839     15573352     0.
>>>     Vector Scatter   290            289       189584     0.
>>>          Index Set  1171            823       951928     0.
>>>  IS L to G Mapping   110            109      2156656     0.
>>>  Application Order     6              6        99952     0.
>>>            MatMFFD     1              1          776     0.
>>>             Matrix   189            189     24083332     0.
>>>  Matrix Null Space     4              4         2432     0.
>>>      Krylov Solver    90             90       122720     0.
>>>    DMKSP interface     1              1          648     0.
>>>     Preconditioner    90             90        89872     0.
>>>               SNES     1              1         1328     0.
>>>     SNESLineSearch     1              1          984     0.
>>>             DMSNES     1              1          664     0.
>>>   Distributed Mesh     2              2         9168     0.
>>> Star Forest Bipartite Graph     4              4         3168     0.
>>>    Discrete System     2              2         1712     0.
>>>             Viewer     1              0            0     0.
>>> ========================================================================================================================
>>> Average time to get PetscTime(): 9.53674e-07
>>> #PETSc Option Table entries:
>>> -ib_ksp_converged_reason
>>> -ib_ksp_monitor_true_residual
>>> -ib_snes_type ksponly
>>> -log_summary
>>> -stokes_ib_pc_level_0_sub_pc_factor_nonzeros_along_diagonal
>>> -stokes_ib_pc_level_0_sub_pc_type ilu
>>> -stokes_ib_pc_level_ksp_richardson_self_scale
>>> -stokes_ib_pc_level_ksp_type richardson
>>> -stokes_ib_pc_level_pc_asm_local_type additive
>>> -stokes_ib_pc_level_sub_pc_factor_nonzeros_along_diagonal
>>> -stokes_ib_pc_level_sub_pc_type lu
>>> #End of PETSc Option Table entries
>>> Compiled without FORTRAN kernels
>>> Compiled with full precision matrices (default)
>>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
>>> Configure options: --CC=mpicc --CXX=mpicxx --FC=mpif90 --with-default-arch=0 --PETSC_ARCH=linux-opt --with-debugging=0 --with-c++-support=1 --with-hypre=1 --download-hypre=1 --with-hdf5=yes --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3
>>> -----------------------------------------
>>> Libraries compiled on Thu Jan 14 01:29:56 2016 on aorta 
>>> Machine characteristics: Linux-3.13.0-63-generic-x86_64-with-Ubuntu-14.04-trusty
>>> Using PETSc directory: /not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc
>>> Using PETSc arch: linux-opt
>>> -----------------------------------------
>>> 
>>> Using C compiler: mpicc  -fPIC  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -Qunused-arguments -O3  ${COPTFLAGS} ${CFLAGS}
>>> Using Fortran compiler: mpif90  -fPIC -Wall -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -O3   ${FOPTFLAGS} ${FFLAGS} 
>>> -----------------------------------------
>>> 
>>> Using include paths: -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/include -I/not_backed_up/softwares/MPICH/include
>>> -----------------------------------------
>>> 
>>> Using C linker: mpicc
>>> Using Fortran linker: mpif90
>>> Using libraries: -Wl,-rpath,/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -L/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -lpetsc -Wl,-rpath,/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -L/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -lHYPRE -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -L/not_backed_up/softwares/MPICH/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpicxx -lstdc++ -llapack -lblas -lpthread -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lX11 -lm -lmpifort -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpicxx -lstdc++ -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -L/not_backed_up/softwares/MPICH/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -ldl -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -lmpi -lgcc_s -ldl 
>>> -----------------------------------------


From boyceg at email.unc.edu  Thu Jan 14 14:30:54 2016
From: boyceg at email.unc.edu (Griffith, Boyce Eugene)
Date: Thu, 14 Jan 2016 20:30:54 +0000
Subject: [petsc-users] HPCToolKit/HPCViewer on OS X
In-Reply-To: <D93D18EF-CBBD-4D12-8CFC-178E7F773749@mcs.anl.gov>
References: <669480D3-5CB2-4A0E-B6BD-FF4B702BB5A7@ad.unc.edu>
	<CAMYG4GnReTGLYVxZ5wr3EUmi9L+6UnZHpO5=0S60SPv3jbrzSw@mail.gmail.com>
	<6780B32F-E05B-4FDB-9671-1A0C963B3B85@ad.unc.edu>
	<2A7F8BF5-7E17-4774-A13E-E75394109F95@email.unc.edu>
	<CCFA7B8B-2682-402D-9B83-E7ECD11ED56C@ad.unc.edu>
	<3586468C-25DB-46B6-B9E6-04853D234025@mcs.anl.gov>
	<61E5EA75-AC79-4CB4-8114-645D978EEEC6@email.unc.edu>
	<D93D18EF-CBBD-4D12-8CFC-178E7F773749@mcs.anl.gov>
Message-ID: <376E2C56-9E41-4508-BAE6-9920526820BC@email.unc.edu>


On Jan 14, 2016, at 3:09 PM, Barry Smith <bsmith at mcs.anl.gov<mailto:bsmith at mcs.anl.gov>> wrote:


On Jan 14, 2016, at 2:01 PM, Griffith, Boyce Eugene <boyceg at email.unc.edu<mailto:boyceg at email.unc.edu>> wrote:


On Jan 14, 2016, at 2:24 PM, Barry Smith <bsmith at mcs.anl.gov<mailto:bsmith at mcs.anl.gov>> wrote:


Matt is right, there is a lot of "missing" time from the output. Please send the output from -ksp_view so we can see exactly what solver is being used.

>From the output we have:

 Nonlinear solver 78 % of the time (so your "setup code" outside of PETSC is taking about 22% of the time)
 Linear solver 77 % of the time (this is reasonable pretty much the entire cost of the nonlinear solve is the linear solve)
 Time to set up the preconditioner is 19%  (10 + 9)
 Time of iteration in KSP 35 % (this is the sum of the vector operations and MatMult() and MatSolve())

  So 77 - (19 + 35) = 23 % unexplained time inside the linear solver (custom preconditioner???)

 Also getting the results with Instruments or HPCToolkit would be useful (so long as we don't need to install HPCTool ourselves to see the results).

Thanks, Barry (& Matt & Dave) --- This is a solver that is mixing some matrix-based stuff implemented using PETSc along with some matrix-free stuff that is built on top of SAMRAI. Amneet and I should take a look at performance off-list first.

  Just put an PetscLogEvent() in (or several) to track that part. Plus put an event or two outside the SNESSolve to track the outside PETSc setup time.

  The PETSc time looks reasonable at most I can only image any optimizations we could do bringing it down a small percentage.

Here is a bit more info about what we are trying to do:

This is a Vanka-type MG preconditioner for a Stokes-like system on a structured grid. (Currently just uniform grids, but hopefully soon with AMR.) For the smoother, we are using damped Richardson + ASM with relatively small block subdomains --- e.g., all DOFs associated with 8x8 cells in 2D (~300 DOFs), or 8x8x8 in 3D (~2500 DOFs). Unfortunately, MG iteration counts really tank when using smaller subdomains.

I can't remember whether we have quantified this carefully, but PCASM seems to bog down with smaller subdomains. A question is whether there are different implementation choices that could make the case of "lots of little subdomains" run faster. But before we get to that, Amneet and I should take a more careful look at overall solver performance.

(We are also starting to play around with PCFIELDSPLIT for this problem too, although we don't have many ideas about how to handle the Schur complement.)

Thanks,

-- Boyce


  Barry


-- Boyce


Barry

On Jan 14, 2016, at 1:26 AM, Bhalla, Amneet Pal S <amneetb at live.unc.edu<mailto:amneetb at live.unc.edu>> wrote:


On Jan 13, 2016, at 9:17 PM, Griffith, Boyce Eugene <boyceg at email.unc.edu<mailto:boyceg at email.unc.edu>> wrote:

I see one hot spot:


Here is with opt build

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./main2d on a linux-opt named aorta with 1 processor, by amneetb Thu Jan 14 02:24:43 2016
Using Petsc Development GIT revision: v3.6.3-3098-ga3ecda2  GIT Date: 2016-01-13 21:30:26 -0600

                      Max       Max/Min        Avg      Total
Time (sec):           1.018e+00      1.00000   1.018e+00
Objects:              2.935e+03      1.00000   2.935e+03
Flops:                4.957e+08      1.00000   4.957e+08  4.957e+08
Flops/sec:            4.868e+08      1.00000   4.868e+08  4.868e+08
MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Reductions:       0.000e+00      0.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                         e.g., VecAXPY() for real vectors of length N --> 2N flops
                         and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                     Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total
0:      Main Stage: 1.0183e+00 100.0%  4.9570e+08 100.0%  0.000e+00   0.0%  0.000e+00        0.0% 0.000e+00   0.0%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
                Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
   %T - percent time in this phase         %F - percent flops in this phase
   %M - percent messages in this phase     %L - percent message lengths in this phase
   %R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecDot                 4 1.0 2.9564e-05 1.0 3.31e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0 0  1120
VecDotNorm2          272 1.0 1.4565e-03 1.0 4.25e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1 0  0  0  2920
VecMDot              624 1.0 8.4300e-03 1.0 5.29e+06 1.0 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1  0 0  0   627
VecNorm              565 1.0 3.8033e-03 1.0 4.38e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0 0  0  1151
VecScale              86 1.0 5.5480e-04 1.0 1.55e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0 0  0   279
VecCopy               28 1.0 5.2261e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0 0  0     0
VecSet             14567 1.0 1.2443e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0 0  0     0
VecAXPY              903 1.0 4.2996e-03 1.0 6.66e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0 0  0  1550
VecAYPX              225 1.0 1.2550e-03 1.0 8.55e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0 0  0   681
VecAXPBYCZ            42 1.0 1.7118e-04 1.0 3.45e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0 0  0  0  2014
VecWAXPY              70 1.0 1.9503e-04 1.0 2.98e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0 0  0  0  1528
VecMAXPY             641 1.0 1.1136e-02 1.0 5.29e+06 1.0 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1 0  0  0   475
VecSwap              135 1.0 4.5896e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0 0  0     0
VecAssemblyBegin     745 1.0 4.9477e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0 0  0  0  0     0
VecAssemblyEnd       745 1.0 9.2411e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0 0  0  0  0     0
VecScatterBegin    40831 1.0 3.4502e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3 0  0  0  0     0
BuildTwoSidedF       738 1.0 2.6712e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0 0  0  0     0
MatMult              513 1.0 9.1235e-02 1.0 7.75e+07 1.0 0.0e+00 0.0e+00 0.0e+00  9 16  0  0  0   9 16  0 0  0   849
MatSolve           13568 1.0 2.3605e-01 1.0 3.45e+08 1.0 0.0e+00 0.0e+00 0.0e+00 23 70  0  0  0  23 70 0  0  0  1460
MatLUFactorSym        84 1.0 3.7430e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  4  0  0  0  0   4  0 0  0  0     0
MatLUFactorNum        85 1.0 3.9623e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00  4  8  0  0  0   4  8 0  0  0  1058
MatILUFactorSym        1 1.0 3.3617e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0 0  0  0     0
MatScale               4 1.0 2.5511e-04 1.0 2.51e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0 0   984
MatAssemblyBegin     108 1.0 6.3658e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0 0  0  0  0     0
MatAssemblyEnd       108 1.0 2.9490e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0 0  0  0  0     0
MatGetRow          33120 1.0 2.0157e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0 0  0  0     0
MatGetRowIJ           85 1.0 1.2145e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0 0  0  0     0
MatGetSubMatrice       4 1.0 8.4379e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0 0  0  0     0
MatGetOrdering        85 1.0 7.7887e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0 0  0  0     0
MatAXPY                4 1.0 4.9596e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  5  0  0  0  0   5  0  0 0  0     0
MatPtAP                4 1.0 4.4426e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 0.0e+00  4  1  0  0  0   4  1  0  0 0   112
MatPtAPSymbolic        4 1.0 2.7664e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0 0  0  0     0
MatPtAPNumeric         4 1.0 1.6732e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1 0  0  0   298
MatGetSymTrans         4 1.0 3.6621e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0 0  0  0     0
KSPGMRESOrthog        16 1.0 9.7778e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1 0  0  0  0     0
KSPSetUp              90 1.0 5.7650e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0 0  0     0
KSPSolve               1 1.0 7.8831e-01 1.0 4.90e+08 1.0 0.0e+00 0.0e+00 0.0e+00 77 99  0  0  0  77 99 0  0  0   622
PCSetUp               90 1.0 9.9725e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00 10  8  0  0  0  10  8  0 0  0   420
PCSetUpOnBlocks      112 1.0 8.7547e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00  9  8  0  0  0   9 8  0  0  0   479
PCApply               16 1.0 7.1952e-01 1.0 4.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00 71 99  0  0  0  71 99 0  0  0   680
SNESSolve              1 1.0 7.9225e-01 1.0 4.90e+08 1.0 0.0e+00 0.0e+00 0.0e+00 78 99  0  0  0  78 99 0  0  0   619
SNESFunctionEval       2 1.0 3.2940e-03 1.0 4.68e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0 0  0  0    14
SNESJacobianEval       1 1.0 4.7255e-04 1.0 4.26e+03 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0 0  0  0     9
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

           Vector   971            839     15573352     0.
   Vector Scatter   290            289       189584     0.
        Index Set  1171            823       951928     0.
IS L to G Mapping   110            109      2156656     0.
Application Order     6              6        99952     0.
          MatMFFD     1              1          776     0.
           Matrix   189            189     24083332     0.
Matrix Null Space     4              4         2432     0.
    Krylov Solver    90             90       122720     0.
  DMKSP interface     1              1          648     0.
   Preconditioner    90             90        89872     0.
             SNES     1              1         1328     0.
   SNESLineSearch     1              1          984     0.
           DMSNES     1              1          664     0.
 Distributed Mesh     2              2         9168     0.
Star Forest Bipartite Graph     4              4         3168     0.
  Discrete System     2              2         1712     0.
           Viewer     1              0            0     0.
========================================================================================================================
Average time to get PetscTime(): 9.53674e-07
#PETSc Option Table entries:
-ib_ksp_converged_reason
-ib_ksp_monitor_true_residual
-ib_snes_type ksponly
-log_summary
-stokes_ib_pc_level_0_sub_pc_factor_nonzeros_along_diagonal
-stokes_ib_pc_level_0_sub_pc_type ilu
-stokes_ib_pc_level_ksp_richardson_self_scale
-stokes_ib_pc_level_ksp_type richardson
-stokes_ib_pc_level_pc_asm_local_type additive
-stokes_ib_pc_level_sub_pc_factor_nonzeros_along_diagonal
-stokes_ib_pc_level_sub_pc_type lu
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --CC=mpicc --CXX=mpicxx --FC=mpif90 --with-default-arch=0 --PETSC_ARCH=linux-opt --with-debugging=0 --with-c++-support=1 --with-hypre=1 --download-hypre=1 --with-hdf5=yes --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3
-----------------------------------------
Libraries compiled on Thu Jan 14 01:29:56 2016 on aorta
Machine characteristics: Linux-3.13.0-63-generic-x86_64-with-Ubuntu-14.04-trusty
Using PETSc directory: /not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc
Using PETSc arch: linux-opt
-----------------------------------------

Using C compiler: mpicc  -fPIC  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -Qunused-arguments -O3  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: mpif90  -fPIC -Wall -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -O3   ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------

Using include paths: -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/include -I/not_backed_up/softwares/MPICH/include
-----------------------------------------

Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -L/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -lpetsc -Wl,-rpath,/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -L/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -lHYPRE -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -L/not_backed_up/softwares/MPICH/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpicxx -lstdc++ -llapack -lblas -lpthread -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lX11 -lm -lmpifort -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpicxx -lstdc++ -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -L/not_backed_up/softwares/MPICH/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -ldl -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -lmpi -lgcc_s -ldl
-----------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160114/4904c043/attachment-0001.html>

From song.gao2 at mail.mcgill.ca  Thu Jan 14 15:01:00 2016
From: song.gao2 at mail.mcgill.ca (Song Gao)
Date: Thu, 14 Jan 2016 16:01:00 -0500
Subject: [petsc-users] Profile a matrix-free solver.
Message-ID: <CAJitgPUfGD-zr8vRKVG7bX=Rfo-9DBMUJ8yeTc7HD27k4c=oOQ@mail.gmail.com>

Hello,

I am profiling a finite element Navier-Stokes solver. It uses the
Jacobian-free Newton Krylov method and a custom preconditoner LU-SGS (a
matrix-free version of Symmetic Gauss-Seidel ). The log summary is
attached. Four events are registered.  compute_rhs is compute rhs (used by
MatMult_MFFD). SURFINT and VOLINT are parts of compute_rhs. LU-SGS is the
custom preconditioner. I didn't call PetscLogFlops so these flops are
zeros.

 I'm wondering, is the percent time of the events reasonable in the table?
I see 69% time is spent on  matmult_mffd. Is it expected in matrix-free
method? What might be a good starting point of profiling this solver? Thank
you in advance.


Song Gao
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160114/87145b98/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: log_summary
Type: application/octet-stream
Size: 9058 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160114/87145b98/attachment.obj>

From bsmith at mcs.anl.gov  Thu Jan 14 15:13:40 2016
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Thu, 14 Jan 2016 15:13:40 -0600
Subject: [petsc-users] HPCToolKit/HPCViewer on OS X
In-Reply-To: <376E2C56-9E41-4508-BAE6-9920526820BC@email.unc.edu>
References: <669480D3-5CB2-4A0E-B6BD-FF4B702BB5A7@ad.unc.edu>
	<CAMYG4GnReTGLYVxZ5wr3EUmi9L+6UnZHpO5=0S60SPv3jbrzSw@mail.gmail.com>
	<6780B32F-E05B-4FDB-9671-1A0C963B3B85@ad.unc.edu>
	<2A7F8BF5-7E17-4774-A13E-E75394109F95@email.unc.edu>
	<CCFA7B8B-2682-402D-9B83-E7ECD11ED56C@ad.unc.edu>
	<3586468C-25DB-46B6-B9E6-04853D234025@mcs.anl.gov>
	<61E5EA75-AC79-4CB4-8114-645D978EEEC6@email.unc.edu>
	<D93D18EF-CBBD-4D12-8CFC-178E7F773749@mcs.anl.gov>
	<376E2C56-9E41-4508-BAE6-9920526820BC@email.unc.edu>
Message-ID: <C487D7CE-555D-4775-846B-33E15F47ECF2@mcs.anl.gov>


> On Jan 14, 2016, at 2:30 PM, Griffith, Boyce Eugene <boyceg at email.unc.edu> wrote:
> 
>> 
>> On Jan 14, 2016, at 3:09 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>> 
>>> 
>>> On Jan 14, 2016, at 2:01 PM, Griffith, Boyce Eugene <boyceg at email.unc.edu> wrote:
>>> 
>>>> 
>>>> On Jan 14, 2016, at 2:24 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>> 
>>>> 
>>>> Matt is right, there is a lot of "missing" time from the output. Please send the output from -ksp_view so we can see exactly what solver is being used. 
>>>> 
>>>> From the output we have:
>>>> 
>>>>  Nonlinear solver 78 % of the time (so your "setup code" outside of PETSC is taking about 22% of the time)
>>>>  Linear solver 77 % of the time (this is reasonable pretty much the entire cost of the nonlinear solve is the linear solve)
>>>>  Time to set up the preconditioner is 19%  (10 + 9)  
>>>>  Time of iteration in KSP 35 % (this is the sum of the vector operations and MatMult() and MatSolve())
>>>> 
>>>>   So 77 - (19 + 35) = 23 % unexplained time inside the linear solver (custom preconditioner???)
>>>> 
>>>>  Also getting the results with Instruments or HPCToolkit would be useful (so long as we don't need to install HPCTool ourselves to see the results).
>>> 
>>> Thanks, Barry (& Matt & Dave) --- This is a solver that is mixing some matrix-based stuff implemented using PETSc along with some matrix-free stuff that is built on top of SAMRAI. Amneet and I should take a look at performance off-list first.
>> 
>>   Just put an PetscLogEvent() in (or several) to track that part. Plus put an event or two outside the SNESSolve to track the outside PETSc setup time.
>> 
>>   The PETSc time looks reasonable at most I can only image any optimizations we could do bringing it down a small percentage.
> 
> Here is a bit more info about what we are trying to do:
> 
> This is a Vanka-type MG preconditioner for a Stokes-like system on a structured grid. (Currently just uniform grids, but hopefully soon with AMR.) For the smoother, we are using damped Richardson + ASM with relatively small block subdomains --- e.g., all DOFs associated with 8x8 cells in 2D (~300 DOFs), or 8x8x8 in 3D (~2500 DOFs). Unfortunately, MG iteration counts really tank when using smaller subdomains.
> 
> I can't remember whether we have quantified this carefully, but PCASM seems to bog down with smaller subdomains. A question is whether there are different implementation choices that could make the case of "lots of little subdomains" run faster. 

   This is possibly somewhere where WE (PETSc) could perhaps due a better job. When originally written we definitely were biased to a small number of large subdomains so things like getting the sub matrices (and even iterating over them) could possibly be optimized when there are many. However, as you note, in the current runs this is definitely not the issue.

  Barry

> But before we get to that, Amneet and I should take a more careful look at overall solver performance.
> 
> (We are also starting to play around with PCFIELDSPLIT for this problem too, although we don't have many ideas about how to handle the Schur complement.)
> 
> Thanks,
> 
> -- Boyce
> 
>> 
>> 
>>   Barry
>> 
>>> 
>>> -- Boyce
>>> 
>>>> 
>>>> 
>>>> Barry
>>>> 
>>>>> On Jan 14, 2016, at 1:26 AM, Bhalla, Amneet Pal S <amneetb at live.unc.edu> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Jan 13, 2016, at 9:17 PM, Griffith, Boyce Eugene <boyceg at email.unc.edu> wrote:
>>>>>> 
>>>>>> I see one hot spot:
>>>>> 
>>>>> 
>>>>> Here is with opt build
>>>>> 
>>>>> ************************************************************************************************************************
>>>>> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
>>>>> ************************************************************************************************************************
>>>>> 
>>>>> ---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
>>>>> 
>>>>> ./main2d on a linux-opt named aorta with 1 processor, by amneetb Thu Jan 14 02:24:43 2016
>>>>> Using Petsc Development GIT revision: v3.6.3-3098-ga3ecda2  GIT Date: 2016-01-13 21:30:26 -0600
>>>>> 
>>>>>                       Max       Max/Min        Avg      Total 
>>>>> Time (sec):           1.018e+00      1.00000   1.018e+00
>>>>> Objects:              2.935e+03      1.00000   2.935e+03
>>>>> Flops:                4.957e+08      1.00000   4.957e+08  4.957e+08
>>>>> Flops/sec:            4.868e+08      1.00000   4.868e+08  4.868e+08
>>>>> MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
>>>>> MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
>>>>> MPI Reductions:       0.000e+00      0.00000
>>>>> 
>>>>> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
>>>>>                          e.g., VecAXPY() for real vectors of length N --> 2N flops
>>>>>                          and VecAXPY() for complex vectors of length N --> 8N flops
>>>>> 
>>>>> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
>>>>>                      Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
>>>>> 0:      Main Stage: 1.0183e+00 100.0%  4.9570e+08 100.0%  0.000e+00   0.0%  0.000e+00        0.0% 0.000e+00   0.0% 
>>>>> 
>>>>> ------------------------------------------------------------------------------------------------------------------------
>>>>> See the 'Profiling' chapter of the users' manual for details on interpreting output.
>>>>> Phase summary info:
>>>>> Count: number of times phase was executed
>>>>> Time and Flops: Max - maximum over all processors
>>>>>                 Ratio - ratio of maximum to minimum over all processors
>>>>> Mess: number of messages sent
>>>>> Avg. len: average message length (bytes)
>>>>> Reduct: number of global reductions
>>>>> Global: entire computation
>>>>> Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
>>>>>    %T - percent time in this phase         %F - percent flops in this phase
>>>>>    %M - percent messages in this phase     %L - percent message lengths in this phase
>>>>>    %R - percent reductions in this phase
>>>>> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
>>>>> ------------------------------------------------------------------------------------------------------------------------
>>>>> Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
>>>>>                 Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>>>>> ------------------------------------------------------------------------------------------------------------------------
>>>>> 
>>>>> --- Event Stage 0: Main Stage
>>>>> 
>>>>> VecDot                 4 1.0 2.9564e-05 1.0 3.31e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0 0  1120
>>>>> VecDotNorm2          272 1.0 1.4565e-03 1.0 4.25e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1 0  0  0  2920
>>>>> VecMDot              624 1.0 8.4300e-03 1.0 5.29e+06 1.0 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1  0 0  0   627
>>>>> VecNorm              565 1.0 3.8033e-03 1.0 4.38e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0 0  0  1151
>>>>> VecScale              86 1.0 5.5480e-04 1.0 1.55e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0 0  0   279
>>>>> VecCopy               28 1.0 5.2261e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0 0  0     0
>>>>> VecSet             14567 1.0 1.2443e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0 0  0     0
>>>>> VecAXPY              903 1.0 4.2996e-03 1.0 6.66e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0 0  0  1550
>>>>> VecAYPX              225 1.0 1.2550e-03 1.0 8.55e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0 0  0   681
>>>>> VecAXPBYCZ            42 1.0 1.7118e-04 1.0 3.45e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0 0  0  0  2014
>>>>> VecWAXPY              70 1.0 1.9503e-04 1.0 2.98e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0 0  0  0  1528
>>>>> VecMAXPY             641 1.0 1.1136e-02 1.0 5.29e+06 1.0 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1 0  0  0   475
>>>>> VecSwap              135 1.0 4.5896e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0 0  0     0
>>>>> VecAssemblyBegin     745 1.0 4.9477e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0 0  0  0  0     0
>>>>> VecAssemblyEnd       745 1.0 9.2411e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0 0  0  0  0     0
>>>>> VecScatterBegin    40831 1.0 3.4502e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3 0  0  0  0     0
>>>>> BuildTwoSidedF       738 1.0 2.6712e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0 0  0  0     0
>>>>> MatMult              513 1.0 9.1235e-02 1.0 7.75e+07 1.0 0.0e+00 0.0e+00 0.0e+00  9 16  0  0  0   9 16  0 0  0   849
>>>>> MatSolve           13568 1.0 2.3605e-01 1.0 3.45e+08 1.0 0.0e+00 0.0e+00 0.0e+00 23 70  0  0  0  23 70 0  0  0  1460
>>>>> MatLUFactorSym        84 1.0 3.7430e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  4  0  0  0  0   4  0 0  0  0     0
>>>>> MatLUFactorNum        85 1.0 3.9623e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00  4  8  0  0  0   4  8 0  0  0  1058
>>>>> MatILUFactorSym        1 1.0 3.3617e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0 0  0  0     0
>>>>> MatScale               4 1.0 2.5511e-04 1.0 2.51e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0 0   984
>>>>> MatAssemblyBegin     108 1.0 6.3658e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0 0  0  0  0     0
>>>>> MatAssemblyEnd       108 1.0 2.9490e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0 0  0  0  0     0
>>>>> MatGetRow          33120 1.0 2.0157e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0 0  0  0     0
>>>>> MatGetRowIJ           85 1.0 1.2145e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0 0  0  0     0
>>>>> MatGetSubMatrice       4 1.0 8.4379e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0 0  0  0     0
>>>>> MatGetOrdering        85 1.0 7.7887e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0 0  0  0     0
>>>>> MatAXPY                4 1.0 4.9596e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  5  0  0  0  0   5  0  0 0  0     0
>>>>> MatPtAP                4 1.0 4.4426e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 0.0e+00  4  1  0  0  0   4  1  0  0 0   112
>>>>> MatPtAPSymbolic        4 1.0 2.7664e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0 0  0  0     0
>>>>> MatPtAPNumeric         4 1.0 1.6732e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1 0  0  0   298
>>>>> MatGetSymTrans         4 1.0 3.6621e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0 0  0  0     0
>>>>> KSPGMRESOrthog        16 1.0 9.7778e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1 0  0  0  0     0
>>>>> KSPSetUp              90 1.0 5.7650e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0 0  0     0
>>>>> KSPSolve               1 1.0 7.8831e-01 1.0 4.90e+08 1.0 0.0e+00 0.0e+00 0.0e+00 77 99  0  0  0  77 99 0  0  0   622
>>>>> PCSetUp               90 1.0 9.9725e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00 10  8  0  0  0  10  8  0 0  0   420
>>>>> PCSetUpOnBlocks      112 1.0 8.7547e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00  9  8  0  0  0   9 8  0  0  0   479
>>>>> PCApply               16 1.0 7.1952e-01 1.0 4.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00 71 99  0  0  0  71 99 0  0  0   680
>>>>> SNESSolve              1 1.0 7.9225e-01 1.0 4.90e+08 1.0 0.0e+00 0.0e+00 0.0e+00 78 99  0  0  0  78 99 0  0  0   619
>>>>> SNESFunctionEval       2 1.0 3.2940e-03 1.0 4.68e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0 0  0  0    14
>>>>> SNESJacobianEval       1 1.0 4.7255e-04 1.0 4.26e+03 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0 0  0  0     9
>>>>> ------------------------------------------------------------------------------------------------------------------------
>>>>> 
>>>>> Memory usage is given in bytes:
>>>>> 
>>>>> Object Type          Creations   Destructions     Memory  Descendants' Mem.
>>>>> Reports information only for process 0.
>>>>> 
>>>>> --- Event Stage 0: Main Stage
>>>>> 
>>>>>            Vector   971            839     15573352     0.
>>>>>    Vector Scatter   290            289       189584     0.
>>>>>         Index Set  1171            823       951928     0.
>>>>> IS L to G Mapping   110            109      2156656     0.
>>>>> Application Order     6              6        99952     0.
>>>>>           MatMFFD     1              1          776     0.
>>>>>            Matrix   189            189     24083332     0.
>>>>> Matrix Null Space     4              4         2432     0.
>>>>>     Krylov Solver    90             90       122720     0.
>>>>>   DMKSP interface     1              1          648     0.
>>>>>    Preconditioner    90             90        89872     0.
>>>>>              SNES     1              1         1328     0.
>>>>>    SNESLineSearch     1              1          984     0.
>>>>>            DMSNES     1              1          664     0.
>>>>>  Distributed Mesh     2              2         9168     0.
>>>>> Star Forest Bipartite Graph     4              4         3168     0.
>>>>>   Discrete System     2              2         1712     0.
>>>>>            Viewer     1              0            0     0.
>>>>> ========================================================================================================================
>>>>> Average time to get PetscTime(): 9.53674e-07
>>>>> #PETSc Option Table entries:
>>>>> -ib_ksp_converged_reason
>>>>> -ib_ksp_monitor_true_residual
>>>>> -ib_snes_type ksponly
>>>>> -log_summary
>>>>> -stokes_ib_pc_level_0_sub_pc_factor_nonzeros_along_diagonal
>>>>> -stokes_ib_pc_level_0_sub_pc_type ilu
>>>>> -stokes_ib_pc_level_ksp_richardson_self_scale
>>>>> -stokes_ib_pc_level_ksp_type richardson
>>>>> -stokes_ib_pc_level_pc_asm_local_type additive
>>>>> -stokes_ib_pc_level_sub_pc_factor_nonzeros_along_diagonal
>>>>> -stokes_ib_pc_level_sub_pc_type lu
>>>>> #End of PETSc Option Table entries
>>>>> Compiled without FORTRAN kernels
>>>>> Compiled with full precision matrices (default)
>>>>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
>>>>> Configure options: --CC=mpicc --CXX=mpicxx --FC=mpif90 --with-default-arch=0 --PETSC_ARCH=linux-opt --with-debugging=0 --with-c++-support=1 --with-hypre=1 --download-hypre=1 --with-hdf5=yes --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3
>>>>> -----------------------------------------
>>>>> Libraries compiled on Thu Jan 14 01:29:56 2016 on aorta 
>>>>> Machine characteristics: Linux-3.13.0-63-generic-x86_64-with-Ubuntu-14.04-trusty
>>>>> Using PETSc directory: /not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc
>>>>> Using PETSc arch: linux-opt
>>>>> -----------------------------------------
>>>>> 
>>>>> Using C compiler: mpicc  -fPIC  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -Qunused-arguments -O3  ${COPTFLAGS} ${CFLAGS}
>>>>> Using Fortran compiler: mpif90  -fPIC -Wall -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -O3   ${FOPTFLAGS} ${FFLAGS} 
>>>>> -----------------------------------------
>>>>> 
>>>>> Using include paths: -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/include -I/not_backed_up/softwares/MPICH/include
>>>>> -----------------------------------------
>>>>> 
>>>>> Using C linker: mpicc
>>>>> Using Fortran linker: mpif90
>>>>> Using libraries: -Wl,-rpath,/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -L/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -lpetsc -Wl,-rpath,/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -L/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -lHYPRE -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -L/not_backed_up/softwares/MPICH/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpicxx -lstdc++ -llapack -lblas -lpthread -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lX11 -lm -lmpifort -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpicxx -lstdc++ -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -L/not_backed_up/softwares/MPICH/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -ldl -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -lmpi -lgcc_s -ldl 
>>>>> -----------------------------------------


From bsmith at mcs.anl.gov  Thu Jan 14 15:24:08 2016
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Thu, 14 Jan 2016 15:24:08 -0600
Subject: [petsc-users] Profile a matrix-free solver.
In-Reply-To: <CAJitgPUfGD-zr8vRKVG7bX=Rfo-9DBMUJ8yeTc7HD27k4c=oOQ@mail.gmail.com>
References: <CAJitgPUfGD-zr8vRKVG7bX=Rfo-9DBMUJ8yeTc7HD27k4c=oOQ@mail.gmail.com>
Message-ID: <84EECF20-BB7A-40BD-820B-1FC44E847034@mcs.anl.gov>


   So 

    KSPSolve is 96 % and MatMult is 70 % + PCApply 24 % = 94 % so this makes sense; the solver time is essentially the 
multiply time plus the PCApply time.

compute_rhs         1823 1.0 4.2119e+02 1.0 0.00e+00 0.0 4.4e+04 5.4e+04 1.1e+04 71  0100100 39  71  0100100 39     0
LU-SGS              1647 1.0 1.3590e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 23  0  0  0  0  23  0  0  0  0     0
SURFINT             1823 1.0 1.0647e+02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 17  0  0  0  0  17  0  0  0  0     0
VOLINT              1823 1.0 2.2373e+02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 35  0  0  0  0  35  0  0  0  0     0

   Depending on the "quality" of the preconditioner (if it is really good) one expects the preconditioner time to be larger than the MatMult(). Only for simple preconditioners (like Jacobi) does one see it being much less than the MatMult().  For matrix based solvers the amount of work in  SGS is as large as the amount of work in the MatMult() if not more, so I would expect the time of the preconditioner to be higher than the time of the multiply. 

  So based on knowing almost nothing I think the MatMult_ is taking more time then it should unless you are ignoring (skipping) a lot of the terms in your matrix-free SGS; then it is probably reasonable.

  Barry


> On Jan 14, 2016, at 3:01 PM, Song Gao <song.gao2 at mail.mcgill.ca> wrote:
> 
> Hello,
> 
> I am profiling a finite element Navier-Stokes solver. It uses the Jacobian-free Newton Krylov method and a custom preconditoner LU-SGS (a matrix-free version of Symmetic Gauss-Seidel ). The log summary is attached. Four events are registered.  compute_rhs is compute rhs (used by MatMult_MFFD). SURFINT and VOLINT are parts of compute_rhs. LU-SGS is the custom preconditioner. I didn't call PetscLogFlops so these flops are zeros. 
> 
>  I'm wondering, is the percent time of the events reasonable in the table? I see 69% time is spent on  matmult_mffd. Is it expected in matrix-free method? What might be a good starting point of profiling this solver? Thank you in advance.
> 
> 
> Song Gao 
> <log_summary>


From knepley at gmail.com  Thu Jan 14 15:25:00 2016
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 14 Jan 2016 15:25:00 -0600
Subject: [petsc-users] Profile a matrix-free solver.
In-Reply-To: <CAJitgPUfGD-zr8vRKVG7bX=Rfo-9DBMUJ8yeTc7HD27k4c=oOQ@mail.gmail.com>
References: <CAJitgPUfGD-zr8vRKVG7bX=Rfo-9DBMUJ8yeTc7HD27k4c=oOQ@mail.gmail.com>
Message-ID: <CAMYG4GmKroAnVjbYoDMBkqzN+wi4e=5TJm2Mr_aw3SWNRvKNog@mail.gmail.com>

On Thu, Jan 14, 2016 at 3:01 PM, Song Gao <song.gao2 at mail.mcgill.ca> wrote:

> Hello,
>
> I am profiling a finite element Navier-Stokes solver. It uses the
> Jacobian-free Newton Krylov method and a custom preconditoner LU-SGS (a
> matrix-free version of Symmetic Gauss-Seidel ). The log summary is
> attached. Four events are registered.  compute_rhs is compute rhs (used by
> MatMult_MFFD). SURFINT and VOLINT are parts of compute_rhs. LU-SGS is the
> custom preconditioner. I didn't call PetscLogFlops so these flops are
> zeros.
>
>  I'm wondering, is the percent time of the events reasonable in the table?
> I see 69% time is spent on  matmult_mffd. Is it expected in matrix-free
> method? What might be a good starting point of profiling this solver? Thank
> you in advance.
>

The way I read this, you are taking about 23 iterates/solve, and most of
your work is residual computation which should
be highly parallelizable/vectorizable. This seems great to me.

   Matt


>
>
> Song Gao
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160114/ec20c66b/attachment.html>

From amneetb at live.unc.edu  Thu Jan 14 18:31:58 2016
From: amneetb at live.unc.edu (Bhalla, Amneet Pal S)
Date: Fri, 15 Jan 2016 00:31:58 +0000
Subject: [petsc-users] HPCToolKit/HPCViewer on OS X
In-Reply-To: <3586468C-25DB-46B6-B9E6-04853D234025@mcs.anl.gov>
References: <669480D3-5CB2-4A0E-B6BD-FF4B702BB5A7@ad.unc.edu>
	<CAMYG4GnReTGLYVxZ5wr3EUmi9L+6UnZHpO5=0S60SPv3jbrzSw@mail.gmail.com>
	<6780B32F-E05B-4FDB-9671-1A0C963B3B85@ad.unc.edu>
	<2A7F8BF5-7E17-4774-A13E-E75394109F95@email.unc.edu>
	<CCFA7B8B-2682-402D-9B83-E7ECD11ED56C@ad.unc.edu>
	<3586468C-25DB-46B6-B9E6-04853D234025@mcs.anl.gov>
Message-ID: <8B3D6FE1-2EE7-4B5A-AA88-E1C18CBD61C4@ad.unc.edu>


On Jan 14, 2016, at 11:24 AM, Barry Smith <bsmith at mcs.anl.gov<mailto:bsmith at mcs.anl.gov>> wrote:

Also getting the results with Instruments or HPCToolkit would be useful (so long as we don't need to install HPCTool ourselves to see the results).

@Barry ? Attached is the
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160115/1f3c2cc2/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: main2dProfiling.numbers
Type: application/octet-stream
Size: 207814 bytes
Desc: main2dProfiling.numbers
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160115/1f3c2cc2/attachment-0001.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160115/1f3c2cc2/attachment-0001.htm>

From amneetb at live.unc.edu  Thu Jan 14 18:36:14 2016
From: amneetb at live.unc.edu (Bhalla, Amneet Pal S)
Date: Fri, 15 Jan 2016 00:36:14 +0000
Subject: [petsc-users] HPCToolKit/HPCViewer on OS X
In-Reply-To: <8B3D6FE1-2EE7-4B5A-AA88-E1C18CBD61C4@ad.unc.edu>
References: <669480D3-5CB2-4A0E-B6BD-FF4B702BB5A7@ad.unc.edu>
	<CAMYG4GnReTGLYVxZ5wr3EUmi9L+6UnZHpO5=0S60SPv3jbrzSw@mail.gmail.com>
	<6780B32F-E05B-4FDB-9671-1A0C963B3B85@ad.unc.edu>
	<2A7F8BF5-7E17-4774-A13E-E75394109F95@email.unc.edu>
	<CCFA7B8B-2682-402D-9B83-E7ECD11ED56C@ad.unc.edu>
	<3586468C-25DB-46B6-B9E6-04853D234025@mcs.anl.gov>
	<8B3D6FE1-2EE7-4B5A-AA88-E1C18CBD61C4@ad.unc.edu>
Message-ID: <B07B1759-A937-4EBB-B317-74A1069591BA@ad.unc.edu>

And the PETSc log summary for comparison

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./main2d on a linux-opt named aorta with 1 processor, by amneetb Thu Jan 14 19:34:38 2016
Using Petsc Development GIT revision: v3.6.3-3098-ga3ecda2  GIT Date: 2016-01-13 21:30:26 -0600

                         Max       Max/Min        Avg      Total
Time (sec):           6.223e-01      1.00000   6.223e-01
Objects:              2.618e+03      1.00000   2.618e+03
Flops:                1.948e+08      1.00000   1.948e+08  1.948e+08
Flops/sec:            3.129e+08      1.00000   3.129e+08  3.129e+08
MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Reductions:       0.000e+00      0.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total
 0:      Main Stage: 6.2232e-01 100.0%  1.9476e+08 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecDot                 4 1.0 2.9087e-05 1.0 3.31e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1139
VecDotNorm2          180 1.0 1.0626e-03 1.0 3.17e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0  2983
VecMDot              288 1.0 3.8970e-03 1.0 2.38e+06 1.0 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1  0  0  0   611
VecNorm              113 1.0 9.6560e-04 1.0 1.36e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   140
VecScale              66 1.0 4.0913e-04 1.0 1.21e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   295
VecCopy               24 1.0 3.8338e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet             12855 1.0 1.0173e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
VecAXPY              607 1.0 2.9583e-03 1.0 4.97e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  3  0  0  0   0  3  0  0  0  1680
VecAYPX              169 1.0 8.6975e-04 1.0 6.41e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   737
VecAXPBYCZ            34 1.0 1.1325e-04 1.0 2.77e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  2443
VecWAXPY              54 1.0 1.4043e-04 1.0 2.30e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1637
VecMAXPY             301 1.0 5.6567e-03 1.0 2.38e+06 1.0 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1  0  0  0   421
VecSwap              103 1.0 3.2711e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyBegin     561 1.0 3.5629e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAssemblyEnd       561 1.0 6.4468e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin    18427 1.0 1.5277e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
BuildTwoSidedF       554 1.0 1.9150e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatMult              361 1.0 6.2765e-02 1.0 5.80e+07 1.0 0.0e+00 0.0e+00 0.0e+00 10 30  0  0  0  10 30  0  0  0   924
MatSolve            6108 1.0 6.3529e-02 1.0 9.53e+07 1.0 0.0e+00 0.0e+00 0.0e+00 10 49  0  0  0  10 49  0  0  0  1500
MatLUFactorSym        85 1.0 2.0353e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0     0
MatLUFactorNum        85 1.0 2.2882e-02 1.0 2.28e+07 1.0 0.0e+00 0.0e+00 0.0e+00  4 12  0  0  0   4 12  0  0  0   995
MatScale               4 1.0 2.2912e-04 1.0 2.51e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1096
MatAssemblyBegin     108 1.0 6.7949e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd       108 1.0 2.9209e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetRow          33120 1.0 2.0407e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0     0
MatGetRowIJ           85 1.0 1.2467e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetSubMatrice       4 1.0 8.2304e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
MatGetOrdering        85 1.0 7.8776e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
MatAXPY                4 1.0 4.9517e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  8  0  0  0  0   8  0  0  0  0     0
MatPtAP                4 1.0 4.4372e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 0.0e+00  7  3  0  0  0   7  3  0  0  0   112
MatPtAPSymbolic        4 1.0 2.7586e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  4  0  0  0  0   4  0  0  0  0     0
MatPtAPNumeric         4 1.0 1.6756e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 0.0e+00  3  3  0  0  0   3  3  0  0  0   298
MatGetSymTrans         4 1.0 3.6120e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPGMRESOrthog        12 1.0 4.9458e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
KSPSetUp              90 1.0 5.6815e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 3.8819e-01 1.0 1.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00 62 97  0  0  0  62 97  0  0  0   488
PCSetUp               90 1.0 6.4402e-02 1.0 2.28e+07 1.0 0.0e+00 0.0e+00 0.0e+00 10 12  0  0  0  10 12  0  0  0   354
PCSetUpOnBlocks       84 1.0 5.2499e-02 1.0 2.28e+07 1.0 0.0e+00 0.0e+00 0.0e+00  8 12  0  0  0   8 12  0  0  0   434
PCApply               12 1.0 3.4369e-01 1.0 1.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00 55 97  0  0  0  55 97  0  0  0   549
SNESSolve              1 1.0 3.9208e-01 1.0 1.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00 63 97  0  0  0  63 97  0  0  0   483
SNESFunctionEval       2 1.0 3.2527e-03 1.0 4.68e+04 1.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0    14
SNESJacobianEval       1 1.0 4.6706e-04 1.0 4.26e+03 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     9
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector   739            639      9087400     0.
      Vector Scatter   290            289       189584     0.
           Index Set  1086            738       885136     0.
   IS L to G Mapping   110            109      2156656     0.
   Application Order     6              6        99952     0.
             MatMFFD     1              1          776     0.
              Matrix   189            189     19106368     0.
   Matrix Null Space     4              4         2432     0.
       Krylov Solver    90             90       122720     0.
     DMKSP interface     1              1          648     0.
      Preconditioner    90             90        89864     0.
                SNES     1              1         1328     0.
      SNESLineSearch     1              1          984     0.
              DMSNES     1              1          664     0.
    Distributed Mesh     2              2         9168     0.
Star Forest Bipartite Graph     4              4         3168     0.
     Discrete System     2              2         1712     0.
              Viewer     1              0            0     0.
========================================================================================================================
Average time to get PetscTime(): 7.15256e-07
#PETSc Option Table entries:
-ib_ksp_converged_reason
-ib_ksp_monitor_true_residual
-ib_snes_type ksponly
-log_summary
-stokes_ib_pc_level_ksp_richardson_self_scale
-stokes_ib_pc_level_ksp_type richardson
-stokes_ib_pc_level_pc_asm_local_type additive
-stokes_ib_pc_level_pc_asm_type interpolate
-stokes_ib_pc_level_sub_pc_factor_shift_type nonzero
-stokes_ib_pc_level_sub_pc_type lu
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --CC=mpicc --CXX=mpicxx --FC=mpif90 --with-default-arch=0 --PETSC_ARCH=linux-opt --with-debugging=0 --with-c++-support=1 --with-hypre=1 --download-hypre=1 --with-hdf5=yes --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3
-----------------------------------------
Libraries compiled on Thu Jan 14 01:29:56 2016 on aorta
Machine characteristics: Linux-3.13.0-63-generic-x86_64-with-Ubuntu-14.04-trusty
Using PETSc directory: /not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc
Using PETSc arch: linux-opt
-----------------------------------------

Using C compiler: mpicc  -fPIC  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -Qunused-arguments -O3  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: mpif90  -fPIC -Wall -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -O3   ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------

Using include paths: -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/include -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/include -I/not_backed_up/softwares/MPICH/include
-----------------------------------------

Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -L/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -lpetsc -Wl,-rpath,/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -L/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib -lHYPRE -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -L/not_backed_up/softwares/MPICH/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpicxx -lstdc++ -llapack -lblas -lpthread -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lX11 -lm -lmpifort -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpicxx -lstdc++ -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -L/not_backed_up/softwares/MPICH/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -ldl -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -lmpi -lgcc_s -ldl
-----------------------------------------


On Jan 14, 2016, at 4:31 PM, Bhalla, Amneet Pal Singh <amneetb at ad.unc.edu<mailto:amneetb at ad.unc.edu>> wrote:


On Jan 14, 2016, at 11:24 AM, Barry Smith <bsmith at mcs.anl.gov<mailto:bsmith at mcs.anl.gov>> wrote:

Also getting the results with Instruments or HPCToolkit would be useful (so long as we don't need to install HPCTool ourselves to see the results).

@Barry ? Attached is the
<main2dProfiling.numbers>
output from HPCToolkit profiler for all the operations done in solving 1 timestep Stokes+IB simulation.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160115/48a75119/attachment-0001.html>

From bsmith at mcs.anl.gov  Thu Jan 14 21:16:57 2016
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Thu, 14 Jan 2016 21:16:57 -0600
Subject: [petsc-users] HPCToolKit/HPCViewer on OS X
In-Reply-To: <8B3D6FE1-2EE7-4B5A-AA88-E1C18CBD61C4@ad.unc.edu>
References: <669480D3-5CB2-4A0E-B6BD-FF4B702BB5A7@ad.unc.edu>
	<CAMYG4GnReTGLYVxZ5wr3EUmi9L+6UnZHpO5=0S60SPv3jbrzSw@mail.gmail.com>
	<6780B32F-E05B-4FDB-9671-1A0C963B3B85@ad.unc.edu>
	<2A7F8BF5-7E17-4774-A13E-E75394109F95@email.unc.edu>
	<CCFA7B8B-2682-402D-9B83-E7ECD11ED56C@ad.unc.edu>
	<3586468C-25DB-46B6-B9E6-04853D234025@mcs.anl.gov>
	<8B3D6FE1-2EE7-4B5A-AA88-E1C18CBD61C4@ad.unc.edu>
Message-ID: <CD52AF52-6A5E-45CE-9158-21416953A7B4@mcs.anl.gov>


   Ok, thanks. From the PETSc side of things this doesn't tell us anything new but does show the "missing" time in the solver (attached).


> On Jan 14, 2016, at 6:31 PM, Bhalla, Amneet Pal S <amneetb at live.unc.edu> wrote:
> 
> 
> 
>> On Jan 14, 2016, at 11:24 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>> 
>> Also getting the results with Instruments or HPCToolkit would be useful (so long as we don't need to install HPCTool ourselves to see the results).
> 
> @Barry ? Attached is the 
> <main2dProfiling.numbers>
> output from HPCToolkit profiler for all the operations done in solving 1 timestep Stokes+IB simulation. 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160114/6a302018/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Untitled.png
Type: image/png
Size: 11090 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160114/6a302018/attachment.png>

From praveenpetsc at gmail.com  Fri Jan 15 00:18:29 2016
From: praveenpetsc at gmail.com (praveen kumar)
Date: Fri, 15 Jan 2016 11:48:29 +0530
Subject: [petsc-users] undefined reference error in make test
In-Reply-To: <CAMYG4GkueaKnfkRbx29qfhpd_wSgoX7G0nLNis54v=orLPhr7g@mail.gmail.com>
References: <CAJC+_cNaeW1N=XaeLzCJ0nOYK-_RPE22Kq6Wusn3OVH2jLRmvQ@mail.gmail.com>
	<CAMYG4GkueaKnfkRbx29qfhpd_wSgoX7G0nLNis54v=orLPhr7g@mail.gmail.com>
Message-ID: <CAJC+_cOGsads2mCMZH_nP6kk0uD1UwsXjr6zbic1FFbvYQdaVw@mail.gmail.com>

I?m struggling to figure out *undefined reference to `vectorset_*. I?ve
included both  petscvec.h and petscvec.h90 but the error appears again.
I?m attaching makefile and code. any help will be appreciated.

On Thu, Jan 14, 2016 at 9:39 PM, Matthew Knepley <knepley at gmail.com> wrote:

> On Thu, Jan 14, 2016 at 12:03 AM, praveen kumar <praveenpetsc at gmail.com>
> wrote:
>
>> I?ve written a fortan code (F90)  for domain decomposition.* I've
>> specified **the paths of include files and libraries, but the
>> compiler/linker still *
>>
>>
>> *complained about undefined references.undefined reference to
>> `vectorset_'*
>>
>
> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecSet.html
>
>
>>
>> *undefined reference to `dmdagetlocalinfo_'*
>>
>
> This function is not supported in Fortran since it takes a structure.
>
>   Thanks,
>
>     Matt
>
>
>> I?m attaching makefile and code. any help will be appreciated.
>>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160115/25b12153/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: makefile
Type: application/octet-stream
Size: 477 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160115/25b12153/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.F90
Type: text/x-fortran
Size: 3139 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160115/25b12153/attachment.bin>

From balay at mcs.anl.gov  Fri Jan 15 00:35:09 2016
From: balay at mcs.anl.gov (Satish Balay)
Date: Fri, 15 Jan 2016 00:35:09 -0600
Subject: [petsc-users] undefined reference error in make test
In-Reply-To: <CAJC+_cOGsads2mCMZH_nP6kk0uD1UwsXjr6zbic1FFbvYQdaVw@mail.gmail.com>
References: <CAJC+_cNaeW1N=XaeLzCJ0nOYK-_RPE22Kq6Wusn3OVH2jLRmvQ@mail.gmail.com>
	<CAMYG4GkueaKnfkRbx29qfhpd_wSgoX7G0nLNis54v=orLPhr7g@mail.gmail.com>
	<CAJC+_cOGsads2mCMZH_nP6kk0uD1UwsXjr6zbic1FFbvYQdaVw@mail.gmail.com>
Message-ID: <alpine.LFD.2.20.1601150034080.17141@asterix>

Matt already responded to this. You should be using VecSet() - not VectorSet().
I'm not sure where you got VectorSet() from...

> > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecSet.html

Satish

On Fri, 15 Jan 2016, praveen kumar wrote:

> I?m struggling to figure out *undefined reference to `vectorset_*. I?ve
> included both  petscvec.h and petscvec.h90 but the error appears again.
> I?m attaching makefile and code. any help will be appreciated.
> 
> On Thu, Jan 14, 2016 at 9:39 PM, Matthew Knepley <knepley at gmail.com> wrote:
> 
> > On Thu, Jan 14, 2016 at 12:03 AM, praveen kumar <praveenpetsc at gmail.com>
> > wrote:
> >
> >> I?ve written a fortan code (F90)  for domain decomposition.* I've
> >> specified **the paths of include files and libraries, but the
> >> compiler/linker still *
> >>
> >>
> >> *complained about undefined references.undefined reference to
> >> `vectorset_'*
> >>
> >
> > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecSet.html
> >
> >
> >>
> >> *undefined reference to `dmdagetlocalinfo_'*
> >>
> >
> > This function is not supported in Fortran since it takes a structure.
> >
> >   Thanks,
> >
> >     Matt
> >
> >
> >> I?m attaching makefile and code. any help will be appreciated.
> >>
> >
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> > experiments is infinitely more interesting than any results to which their
> > experiments lead.
> > -- Norbert Wiener
> >
> 

From praveenpetsc at gmail.com  Fri Jan 15 00:41:42 2016
From: praveenpetsc at gmail.com (praveen kumar)
Date: Fri, 15 Jan 2016 12:11:42 +0530
Subject: [petsc-users] undefined reference error in make test
In-Reply-To: <alpine.LFD.2.20.1601150034080.17141@asterix>
References: <CAJC+_cNaeW1N=XaeLzCJ0nOYK-_RPE22Kq6Wusn3OVH2jLRmvQ@mail.gmail.com>
	<CAMYG4GkueaKnfkRbx29qfhpd_wSgoX7G0nLNis54v=orLPhr7g@mail.gmail.com>
	<CAJC+_cOGsads2mCMZH_nP6kk0uD1UwsXjr6zbic1FFbvYQdaVw@mail.gmail.com>
	<alpine.LFD.2.20.1601150034080.17141@asterix>
Message-ID: <CAJC+_cOxxAKBX2G_9-NDGX_Rw+6zrNpy+fkxXvu6s_2KmNwqyg@mail.gmail.com>

Thanks a lot Satish.

On Fri, Jan 15, 2016 at 12:05 PM, Satish Balay <balay at mcs.anl.gov> wrote:

> Matt already responded to this. You should be using VecSet() - not
> VectorSet().
> I'm not sure where you got VectorSet() from...
>
> > >
> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecSet.html
>
> Satish
>
> On Fri, 15 Jan 2016, praveen kumar wrote:
>
> > I?m struggling to figure out *undefined reference to `vectorset_*. I?ve
> > included both  petscvec.h and petscvec.h90 but the error appears again.
> > I?m attaching makefile and code. any help will be appreciated.
> >
> > On Thu, Jan 14, 2016 at 9:39 PM, Matthew Knepley <knepley at gmail.com>
> wrote:
> >
> > > On Thu, Jan 14, 2016 at 12:03 AM, praveen kumar <
> praveenpetsc at gmail.com>
> > > wrote:
> > >
> > >> I?ve written a fortan code (F90)  for domain decomposition.* I've
> > >> specified **the paths of include files and libraries, but the
> > >> compiler/linker still *
> > >>
> > >>
> > >> *complained about undefined references.undefined reference to
> > >> `vectorset_'*
> > >>
> > >
> > >
> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecSet.html
> > >
> > >
> > >>
> > >> *undefined reference to `dmdagetlocalinfo_'*
> > >>
> > >
> > > This function is not supported in Fortran since it takes a structure.
> > >
> > >   Thanks,
> > >
> > >     Matt
> > >
> > >
> > >> I?m attaching makefile and code. any help will be appreciated.
> > >>
> > >
> > >
> > >
> > > --
> > > What most experimenters take for granted before they begin their
> > > experiments is infinitely more interesting than any results to which
> their
> > > experiments lead.
> > > -- Norbert Wiener
> > >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160115/4e7b7ec9/attachment-0001.html>

From song.gao2 at mail.mcgill.ca  Fri Jan 15 10:52:29 2016
From: song.gao2 at mail.mcgill.ca (Song Gao)
Date: Fri, 15 Jan 2016 11:52:29 -0500
Subject: [petsc-users] Profile a matrix-free solver.
In-Reply-To: <84EECF20-BB7A-40BD-820B-1FC44E847034@mcs.anl.gov>
References: <CAJitgPUfGD-zr8vRKVG7bX=Rfo-9DBMUJ8yeTc7HD27k4c=oOQ@mail.gmail.com>
	<84EECF20-BB7A-40BD-820B-1FC44E847034@mcs.anl.gov>
Message-ID: <CAJitgPX8D7WHAVo8-X5T1MLJttj3S1SPEV=35WzFaOWXaT6sEg@mail.gmail.com>

Hello, Barry,

Thanks for your prompt reply.  I ran the matrix-based solver with
matrix-based SGS precondioner. I see your point. The profiling table is
below and attached.

So Matmult takes 4% time and PCApply takes 43% time.

 MatMult              636 1.0 9.0361e+00 1.0 9.21e+09 1.0 7.6e+03 1.1e+04
0.0e+00  4 85 52 17  0   4 85 52 17  0  3980
 PCApply              636 1.0 8.7006e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
1.9e+03 43  0  0  0 24  43  0  0  0 24     0


The way I see it, the matrix-free solver spends most of the time (70%) on
matmult or equivalently rhs evaluation. Every KSP iteration, one rhs
evaluation is performed. This is much more costly than a matrix vector
product in a matrix-based solver. Perhaps this is expected in matrix-free
solver.

I will start look at the rhs evaluation since it takes the most time.

Thanks.
Song Gao


2016-01-14 16:24 GMT-05:00 Barry Smith <bsmith at mcs.anl.gov>:

>
>    So
>
>     KSPSolve is 96 % and MatMult is 70 % + PCApply 24 % = 94 % so this
> makes sense; the solver time is essentially the
> multiply time plus the PCApply time.
>
> compute_rhs         1823 1.0 4.2119e+02 1.0 0.00e+00 0.0 4.4e+04 5.4e+04
> 1.1e+04 71  0100100 39  71  0100100 39     0
> LU-SGS              1647 1.0 1.3590e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 23  0  0  0  0  23  0  0  0  0     0
> SURFINT             1823 1.0 1.0647e+02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 17  0  0  0  0  17  0  0  0  0     0
> VOLINT              1823 1.0 2.2373e+02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 35  0  0  0  0  35  0  0  0  0     0
>
>    Depending on the "quality" of the preconditioner (if it is really good)
> one expects the preconditioner time to be larger than the MatMult(). Only
> for simple preconditioners (like Jacobi) does one see it being much less
> than the MatMult().  For matrix based solvers the amount of work in  SGS is
> as large as the amount of work in the MatMult() if not more, so I would
> expect the time of the preconditioner to be higher than the time of the
> multiply.
>
>   So based on knowing almost nothing I think the MatMult_ is taking more
> time then it should unless you are ignoring (skipping) a lot of the terms
> in your matrix-free SGS; then it is probably reasonable.
>
>   Barry
>
>
>
> > On Jan 14, 2016, at 3:01 PM, Song Gao <song.gao2 at mail.mcgill.ca> wrote:
> >
> > Hello,
> >
> > I am profiling a finite element Navier-Stokes solver. It uses the
> Jacobian-free Newton Krylov method and a custom preconditoner LU-SGS (a
> matrix-free version of Symmetic Gauss-Seidel ). The log summary is
> attached. Four events are registered.  compute_rhs is compute rhs (used by
> MatMult_MFFD). SURFINT and VOLINT are parts of compute_rhs. LU-SGS is the
> custom preconditioner. I didn't call PetscLogFlops so these flops are zeros.
> >
> >  I'm wondering, is the percent time of the events reasonable in the
> table? I see 69% time is spent on  matmult_mffd. Is it expected in
> matrix-free method? What might be a good starting point of profiling this
> solver? Thank you in advance.
> >
> >
> > Song Gao
> > <log_summary>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160115/35c76544/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: log_summary_matrix_based_version
Type: application/octet-stream
Size: 9473 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160115/35c76544/attachment.obj>

From bsmith at mcs.anl.gov  Fri Jan 15 13:42:34 2016
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Fri, 15 Jan 2016 13:42:34 -0600
Subject: [petsc-users] Profile a matrix-free solver.
In-Reply-To: <CAJitgPX8D7WHAVo8-X5T1MLJttj3S1SPEV=35WzFaOWXaT6sEg@mail.gmail.com>
References: <CAJitgPUfGD-zr8vRKVG7bX=Rfo-9DBMUJ8yeTc7HD27k4c=oOQ@mail.gmail.com>
	<84EECF20-BB7A-40BD-820B-1FC44E847034@mcs.anl.gov>
	<CAJitgPX8D7WHAVo8-X5T1MLJttj3S1SPEV=35WzFaOWXaT6sEg@mail.gmail.com>
Message-ID: <97BFBC85-B01B-4A64-B162-05F340E9BCD6@mcs.anl.gov>


> On Jan 15, 2016, at 10:52 AM, Song Gao <song.gao2 at mail.mcgill.ca> wrote:
> 
> Hello, Barry,
> 
> Thanks for your prompt reply.  I ran the matrix-based solver with matrix-based SGS precondioner. I see your point. The profiling table is below and attached. 
> 
> So Matmult takes 4% time and PCApply takes 43% time. 
> 
>  MatMult              636 1.0 9.0361e+00 1.0 9.21e+09 1.0 7.6e+03 1.1e+04 0.0e+00  4 85 52 17  0   4 85 52 17  0  3980
>  PCApply              636 1.0 8.7006e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.9e+03 43  0  0  0 24  43  0  0  0 24     0
> 
> 
> The way I see it, the matrix-free solver spends most of the time (70%) on matmult or equivalently rhs evaluation. Every KSP iteration, one rhs evaluation is performed. This is much more costly than a matrix vector product in a matrix-based solver. 

  Sure,   but if the matrix-free SGS mimics all the work of the right hand side function evaluation (which is has to if it truly is a a SGS sweep and not some approximation (where you drop certain terms in the right hand side function when you compute the SGS)) then the matrix-free SGS should be at least as expensive as the right hand side evaluation.

   Barry


My guess is your SGS drops some terms so is only and approximation, but is still good enough as a preconditioner.

> Perhaps this is expected in matrix-free solver.
> 
> I will start look at the rhs evaluation since it takes the most time. 
> 
> Thanks.
> Song Gao
> 
>  
> 
> 2016-01-14 16:24 GMT-05:00 Barry Smith <bsmith at mcs.anl.gov>:
> 
>    So
> 
>     KSPSolve is 96 % and MatMult is 70 % + PCApply 24 % = 94 % so this makes sense; the solver time is essentially the
> multiply time plus the PCApply time.
> 
> compute_rhs         1823 1.0 4.2119e+02 1.0 0.00e+00 0.0 4.4e+04 5.4e+04 1.1e+04 71  0100100 39  71  0100100 39     0
> LU-SGS              1647 1.0 1.3590e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 23  0  0  0  0  23  0  0  0  0     0
> SURFINT             1823 1.0 1.0647e+02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 17  0  0  0  0  17  0  0  0  0     0
> VOLINT              1823 1.0 2.2373e+02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 35  0  0  0  0  35  0  0  0  0     0
> 
>    Depending on the "quality" of the preconditioner (if it is really good) one expects the preconditioner time to be larger than the MatMult(). Only for simple preconditioners (like Jacobi) does one see it being much less than the MatMult().  For matrix based solvers the amount of work in  SGS is as large as the amount of work in the MatMult() if not more, so I would expect the time of the preconditioner to be higher than the time of the multiply.
> 
>   So based on knowing almost nothing I think the MatMult_ is taking more time then it should unless you are ignoring (skipping) a lot of the terms in your matrix-free SGS; then it is probably reasonable.
> 
>   Barry
> 
> 
> 
> > On Jan 14, 2016, at 3:01 PM, Song Gao <song.gao2 at mail.mcgill.ca> wrote:
> >
> > Hello,
> >
> > I am profiling a finite element Navier-Stokes solver. It uses the Jacobian-free Newton Krylov method and a custom preconditoner LU-SGS (a matrix-free version of Symmetic Gauss-Seidel ). The log summary is attached. Four events are registered.  compute_rhs is compute rhs (used by MatMult_MFFD). SURFINT and VOLINT are parts of compute_rhs. LU-SGS is the custom preconditioner. I didn't call PetscLogFlops so these flops are zeros.
> >
> >  I'm wondering, is the percent time of the events reasonable in the table? I see 69% time is spent on  matmult_mffd. Is it expected in matrix-free method? What might be a good starting point of profiling this solver? Thank you in advance.
> >
> >
> > Song Gao
> > <log_summary>
> 
> 
> <log_summary_matrix_based_version>


From jed at jedbrown.org  Fri Jan 15 13:56:19 2016
From: jed at jedbrown.org (Jed Brown)
Date: Fri, 15 Jan 2016 12:56:19 -0700
Subject: [petsc-users] Profile a matrix-free solver.
In-Reply-To: <CAMYG4GmKroAnVjbYoDMBkqzN+wi4e=5TJm2Mr_aw3SWNRvKNog@mail.gmail.com>
References: <CAJitgPUfGD-zr8vRKVG7bX=Rfo-9DBMUJ8yeTc7HD27k4c=oOQ@mail.gmail.com>
	<CAMYG4GmKroAnVjbYoDMBkqzN+wi4e=5TJm2Mr_aw3SWNRvKNog@mail.gmail.com>
Message-ID: <874mee62ho.fsf@jedbrown.org>

Matthew Knepley <knepley at gmail.com> writes:
> The way I read this, you are taking about 23 iterates/solve, and most of
> your work is residual computation which should
> be highly parallelizable/vectorizable. This seems great to me.

This in the sense that it's up to you to determine whether your
matrix-free residual and preconditioning code is fast.  This profile
merely says that almost all of the run-time is in *your code*.  If your
code is fast, then this is good performance.  If you can use a different
algorithm to converge in fewer iterations, or a different representation
to apply the operator faster, then you could do better.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 818 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160115/cafbc2fc/attachment.pgp>

From song.gao2 at mail.mcgill.ca  Fri Jan 15 14:33:54 2016
From: song.gao2 at mail.mcgill.ca (Song Gao)
Date: Fri, 15 Jan 2016 15:33:54 -0500
Subject: [petsc-users] Profile a matrix-free solver.
In-Reply-To: <97BFBC85-B01B-4A64-B162-05F340E9BCD6@mcs.anl.gov>
References: <CAJitgPUfGD-zr8vRKVG7bX=Rfo-9DBMUJ8yeTc7HD27k4c=oOQ@mail.gmail.com>
	<84EECF20-BB7A-40BD-820B-1FC44E847034@mcs.anl.gov>
	<CAJitgPX8D7WHAVo8-X5T1MLJttj3S1SPEV=35WzFaOWXaT6sEg@mail.gmail.com>
	<97BFBC85-B01B-4A64-B162-05F340E9BCD6@mcs.anl.gov>
Message-ID: <CAJitgPX2yoYPPZra_v7S4N5_AUK1=R01oN=Ofd6LyJfx6off3w@mail.gmail.com>

Yes, you are right. In matrix-free SGS, the AUSM 2nd order inviscid fluxes
are replace by a simpler first order numerical fluxes.

2016-01-15 14:42 GMT-05:00 Barry Smith <bsmith at mcs.anl.gov>:

>
> > On Jan 15, 2016, at 10:52 AM, Song Gao <song.gao2 at mail.mcgill.ca> wrote:
> >
> > Hello, Barry,
> >
> > Thanks for your prompt reply.  I ran the matrix-based solver with
> matrix-based SGS precondioner. I see your point. The profiling table is
> below and attached.
> >
> > So Matmult takes 4% time and PCApply takes 43% time.
> >
> >  MatMult              636 1.0 9.0361e+00 1.0 9.21e+09 1.0 7.6e+03
> 1.1e+04 0.0e+00  4 85 52 17  0   4 85 52 17  0  3980
> >  PCApply              636 1.0 8.7006e+01 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 1.9e+03 43  0  0  0 24  43  0  0  0 24     0
> >
> >
> > The way I see it, the matrix-free solver spends most of the time (70%)
> on matmult or equivalently rhs evaluation. Every KSP iteration, one rhs
> evaluation is performed. This is much more costly than a matrix vector
> product in a matrix-based solver.
>
>   Sure,   but if the matrix-free SGS mimics all the work of the right hand
> side function evaluation (which is has to if it truly is a a SGS sweep and
> not some approximation (where you drop certain terms in the right hand side
> function when you compute the SGS)) then the matrix-free SGS should be at
> least as expensive as the right hand side evaluation.
>
>    Barry
>
>
> My guess is your SGS drops some terms so is only and approximation, but is
> still good enough as a preconditioner.
>
> > Perhaps this is expected in matrix-free solver.
> >
> > I will start look at the rhs evaluation since it takes the most time.
> >
> > Thanks.
> > Song Gao
> >
> >
> >
> > 2016-01-14 16:24 GMT-05:00 Barry Smith <bsmith at mcs.anl.gov>:
> >
> >    So
> >
> >     KSPSolve is 96 % and MatMult is 70 % + PCApply 24 % = 94 % so this
> makes sense; the solver time is essentially the
> > multiply time plus the PCApply time.
> >
> > compute_rhs         1823 1.0 4.2119e+02 1.0 0.00e+00 0.0 4.4e+04 5.4e+04
> 1.1e+04 71  0100100 39  71  0100100 39     0
> > LU-SGS              1647 1.0 1.3590e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 23  0  0  0  0  23  0  0  0  0     0
> > SURFINT             1823 1.0 1.0647e+02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 17  0  0  0  0  17  0  0  0  0     0
> > VOLINT              1823 1.0 2.2373e+02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 35  0  0  0  0  35  0  0  0  0     0
> >
> >    Depending on the "quality" of the preconditioner (if it is really
> good) one expects the preconditioner time to be larger than the MatMult().
> Only for simple preconditioners (like Jacobi) does one see it being much
> less than the MatMult().  For matrix based solvers the amount of work in
> SGS is as large as the amount of work in the MatMult() if not more, so I
> would expect the time of the preconditioner to be higher than the time of
> the multiply.
> >
> >   So based on knowing almost nothing I think the MatMult_ is taking more
> time then it should unless you are ignoring (skipping) a lot of the terms
> in your matrix-free SGS; then it is probably reasonable.
> >
> >   Barry
> >
> >
> >
> > > On Jan 14, 2016, at 3:01 PM, Song Gao <song.gao2 at mail.mcgill.ca>
> wrote:
> > >
> > > Hello,
> > >
> > > I am profiling a finite element Navier-Stokes solver. It uses the
> Jacobian-free Newton Krylov method and a custom preconditoner LU-SGS (a
> matrix-free version of Symmetic Gauss-Seidel ). The log summary is
> attached. Four events are registered.  compute_rhs is compute rhs (used by
> MatMult_MFFD). SURFINT and VOLINT are parts of compute_rhs. LU-SGS is the
> custom preconditioner. I didn't call PetscLogFlops so these flops are zeros.
> > >
> > >  I'm wondering, is the percent time of the events reasonable in the
> table? I see 69% time is spent on  matmult_mffd. Is it expected in
> matrix-free method? What might be a good starting point of profiling this
> solver? Thank you in advance.
> > >
> > >
> > > Song Gao
> > > <log_summary>
> >
> >
> > <log_summary_matrix_based_version>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160115/9d66bff7/attachment.html>

From song.gao2 at mail.mcgill.ca  Fri Jan 15 14:34:58 2016
From: song.gao2 at mail.mcgill.ca (Song Gao)
Date: Fri, 15 Jan 2016 15:34:58 -0500
Subject: [petsc-users] Profile a matrix-free solver.
In-Reply-To: <874mee62ho.fsf@jedbrown.org>
References: <CAJitgPUfGD-zr8vRKVG7bX=Rfo-9DBMUJ8yeTc7HD27k4c=oOQ@mail.gmail.com>
	<CAMYG4GmKroAnVjbYoDMBkqzN+wi4e=5TJm2Mr_aw3SWNRvKNog@mail.gmail.com>
	<874mee62ho.fsf@jedbrown.org>
Message-ID: <CAJitgPV3w-6cg_ATY8aLD-=Rwi60M4UMhN0VKbBG+EOHEAYZpQ@mail.gmail.com>

Thanks. I'll try to improve "my code"

2016-01-15 14:56 GMT-05:00 Jed Brown <jed at jedbrown.org>:

> Matthew Knepley <knepley at gmail.com> writes:
> > The way I read this, you are taking about 23 iterates/solve, and most of
> > your work is residual computation which should
> > be highly parallelizable/vectorizable. This seems great to me.
>
> This in the sense that it's up to you to determine whether your
> matrix-free residual and preconditioning code is fast.  This profile
> merely says that almost all of the run-time is in *your code*.  If your
> code is fast, then this is good performance.  If you can use a different
> algorithm to converge in fewer iterations, or a different representation
> to apply the operator faster, then you could do better.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160115/95cf10cf/attachment.html>

From bsmith at mcs.anl.gov  Fri Jan 15 14:38:43 2016
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Fri, 15 Jan 2016 14:38:43 -0600
Subject: [petsc-users] Profile a matrix-free solver.
In-Reply-To: <CAJitgPV3w-6cg_ATY8aLD-=Rwi60M4UMhN0VKbBG+EOHEAYZpQ@mail.gmail.com>
References: <CAJitgPUfGD-zr8vRKVG7bX=Rfo-9DBMUJ8yeTc7HD27k4c=oOQ@mail.gmail.com>
	<CAMYG4GmKroAnVjbYoDMBkqzN+wi4e=5TJm2Mr_aw3SWNRvKNog@mail.gmail.com>
	<874mee62ho.fsf@jedbrown.org>
	<CAJitgPV3w-6cg_ATY8aLD-=Rwi60M4UMhN0VKbBG+EOHEAYZpQ@mail.gmail.com>
Message-ID: <D0A959D4-EAFD-4A0B-B0A1-C4D880E839B9@mcs.anl.gov>


> On Jan 15, 2016, at 2:34 PM, Song Gao <song.gao2 at mail.mcgill.ca> wrote:
> 
> Thanks. I'll try to improve "my code" 

  Here you can benefit from Instruments since it can show line by line and loop level hotspots in your compute rhs

  Barry

> 
> 2016-01-15 14:56 GMT-05:00 Jed Brown <jed at jedbrown.org>:
> Matthew Knepley <knepley at gmail.com> writes:
> > The way I read this, you are taking about 23 iterates/solve, and most of
> > your work is residual computation which should
> > be highly parallelizable/vectorizable. This seems great to me.
> 
> This in the sense that it's up to you to determine whether your
> matrix-free residual and preconditioning code is fast.  This profile
> merely says that almost all of the run-time is in *your code*.  If your
> code is fast, then this is good performance.  If you can use a different
> algorithm to converge in fewer iterations, or a different representation
> to apply the operator faster, then you could do better.
> 


From ling.zou at inl.gov  Fri Jan 15 14:39:40 2016
From: ling.zou at inl.gov (Zou (Non-US), Ling)
Date: Fri, 15 Jan 2016 13:39:40 -0700
Subject: [petsc-users] Profile a matrix-free solver.
In-Reply-To: <CAJitgPV3w-6cg_ATY8aLD-=Rwi60M4UMhN0VKbBG+EOHEAYZpQ@mail.gmail.com>
References: <CAJitgPUfGD-zr8vRKVG7bX=Rfo-9DBMUJ8yeTc7HD27k4c=oOQ@mail.gmail.com>
	<CAMYG4GmKroAnVjbYoDMBkqzN+wi4e=5TJm2Mr_aw3SWNRvKNog@mail.gmail.com>
	<874mee62ho.fsf@jedbrown.org>
	<CAJitgPV3w-6cg_ATY8aLD-=Rwi60M4UMhN0VKbBG+EOHEAYZpQ@mail.gmail.com>
Message-ID: <CAJWxZDTjmx-nsU=hK=RfxH=uyFMjaqovQzmKgADzN6QxC9Oqrw@mail.gmail.com>

Hi Song, I wonder if you have a reference paper on the preconditioning
algorithm you are working on, i.e., using the 1st order flux for
preconditioning purpose when your 'true' fluxes are evaluated using the 2nd
order AUSM scheme.

Best,

Ling

On Fri, Jan 15, 2016 at 1:34 PM, Song Gao <song.gao2 at mail.mcgill.ca> wrote:

> Thanks. I'll try to improve "my code"
>
> 2016-01-15 14:56 GMT-05:00 Jed Brown <jed at jedbrown.org>:
>
>> Matthew Knepley <knepley at gmail.com> writes:
>> > The way I read this, you are taking about 23 iterates/solve, and most of
>> > your work is residual computation which should
>> > be highly parallelizable/vectorizable. This seems great to me.
>>
>> This in the sense that it's up to you to determine whether your
>> matrix-free residual and preconditioning code is fast.  This profile
>> merely says that almost all of the run-time is in *your code*.  If your
>> code is fast, then this is good performance.  If you can use a different
>> algorithm to converge in fewer iterations, or a different representation
>> to apply the operator faster, then you could do better.
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160115/1bb7b5c7/attachment-0001.html>

From song.gao2 at mail.mcgill.ca  Fri Jan 15 14:58:20 2016
From: song.gao2 at mail.mcgill.ca (Song Gao)
Date: Fri, 15 Jan 2016 15:58:20 -0500
Subject: [petsc-users] Profile a matrix-free solver.
In-Reply-To: <CAJWxZDTjmx-nsU=hK=RfxH=uyFMjaqovQzmKgADzN6QxC9Oqrw@mail.gmail.com>
References: <CAJitgPUfGD-zr8vRKVG7bX=Rfo-9DBMUJ8yeTc7HD27k4c=oOQ@mail.gmail.com>
	<CAMYG4GmKroAnVjbYoDMBkqzN+wi4e=5TJm2Mr_aw3SWNRvKNog@mail.gmail.com>
	<874mee62ho.fsf@jedbrown.org>
	<CAJitgPV3w-6cg_ATY8aLD-=Rwi60M4UMhN0VKbBG+EOHEAYZpQ@mail.gmail.com>
	<CAJWxZDTjmx-nsU=hK=RfxH=uyFMjaqovQzmKgADzN6QxC9Oqrw@mail.gmail.com>
Message-ID: <CAJitgPUnCUrTEeR=Pp4ambuRFUK3n2LBsqa0pgS8kuPnXb2i0A@mail.gmail.com>

Yes.
http://www.sciencedirect.com/science/article/pii/S0021999198960764

It's on page 668 equation 4.6.

Thanks

2016-01-15 15:39 GMT-05:00 Zou (Non-US), Ling <ling.zou at inl.gov>:

> Hi Song, I wonder if you have a reference paper on the preconditioning
> algorithm you are working on, i.e., using the 1st order flux for
> preconditioning purpose when your 'true' fluxes are evaluated using the 2nd
> order AUSM scheme.
>
> Best,
>
> Ling
>
> On Fri, Jan 15, 2016 at 1:34 PM, Song Gao <song.gao2 at mail.mcgill.ca>
> wrote:
>
>> Thanks. I'll try to improve "my code"
>>
>> 2016-01-15 14:56 GMT-05:00 Jed Brown <jed at jedbrown.org>:
>>
>>> Matthew Knepley <knepley at gmail.com> writes:
>>> > The way I read this, you are taking about 23 iterates/solve, and most
>>> of
>>> > your work is residual computation which should
>>> > be highly parallelizable/vectorizable. This seems great to me.
>>>
>>> This in the sense that it's up to you to determine whether your
>>> matrix-free residual and preconditioning code is fast.  This profile
>>> merely says that almost all of the run-time is in *your code*.  If your
>>> code is fast, then this is good performance.  If you can use a different
>>> algorithm to converge in fewer iterations, or a different representation
>>> to apply the operator faster, then you could do better.
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160115/b6469a7e/attachment.html>

From ling.zou at inl.gov  Fri Jan 15 15:02:42 2016
From: ling.zou at inl.gov (Zou (Non-US), Ling)
Date: Fri, 15 Jan 2016 14:02:42 -0700
Subject: [petsc-users] Profile a matrix-free solver.
In-Reply-To: <CAJitgPUnCUrTEeR=Pp4ambuRFUK3n2LBsqa0pgS8kuPnXb2i0A@mail.gmail.com>
References: <CAJitgPUfGD-zr8vRKVG7bX=Rfo-9DBMUJ8yeTc7HD27k4c=oOQ@mail.gmail.com>
	<CAMYG4GmKroAnVjbYoDMBkqzN+wi4e=5TJm2Mr_aw3SWNRvKNog@mail.gmail.com>
	<874mee62ho.fsf@jedbrown.org>
	<CAJitgPV3w-6cg_ATY8aLD-=Rwi60M4UMhN0VKbBG+EOHEAYZpQ@mail.gmail.com>
	<CAJWxZDTjmx-nsU=hK=RfxH=uyFMjaqovQzmKgADzN6QxC9Oqrw@mail.gmail.com>
	<CAJitgPUnCUrTEeR=Pp4ambuRFUK3n2LBsqa0pgS8kuPnXb2i0A@mail.gmail.com>
Message-ID: <CAJWxZDQ7Sg2g8e-b6+tBTYZ7a03YKJdPQd2nMNsB019=ZLv8EA@mail.gmail.com>

Thank you very much!

Ling

On Fri, Jan 15, 2016 at 1:58 PM, Song Gao <song.gao2 at mail.mcgill.ca> wrote:

> Yes.
> http://www.sciencedirect.com/science/article/pii/S0021999198960764
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.sciencedirect.com_science_article_pii_S0021999198960764&d=BQMFaQ&c=54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r=kuHHom1yjd94zUrBWecnYg&m=Z6JmjYd7euHq--9l5maVEB6hJHAkMcMNyc8rnVhgDC8&s=wfmA2PxK68XsQhc1PzBC4HCsxtwNAhBAmxe5VvnLj8Q&e=>
>
> It's on page 668 equation 4.6.
>
> Thanks
>
> 2016-01-15 15:39 GMT-05:00 Zou (Non-US), Ling <ling.zou at inl.gov>:
>
>> Hi Song, I wonder if you have a reference paper on the preconditioning
>> algorithm you are working on, i.e., using the 1st order flux for
>> preconditioning purpose when your 'true' fluxes are evaluated using the 2nd
>> order AUSM scheme.
>>
>> Best,
>>
>> Ling
>>
>> On Fri, Jan 15, 2016 at 1:34 PM, Song Gao <song.gao2 at mail.mcgill.ca>
>> wrote:
>>
>>> Thanks. I'll try to improve "my code"
>>>
>>> 2016-01-15 14:56 GMT-05:00 Jed Brown <jed at jedbrown.org>:
>>>
>>>> Matthew Knepley <knepley at gmail.com> writes:
>>>> > The way I read this, you are taking about 23 iterates/solve, and most
>>>> of
>>>> > your work is residual computation which should
>>>> > be highly parallelizable/vectorizable. This seems great to me.
>>>>
>>>> This in the sense that it's up to you to determine whether your
>>>> matrix-free residual and preconditioning code is fast.  This profile
>>>> merely says that almost all of the run-time is in *your code*.  If your
>>>> code is fast, then this is good performance.  If you can use a different
>>>> algorithm to converge in fewer iterations, or a different representation
>>>> to apply the operator faster, then you could do better.
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160115/4f0b18f2/attachment.html>

From amneetb at live.unc.edu  Fri Jan 15 17:33:21 2016
From: amneetb at live.unc.edu (Bhalla, Amneet Pal S)
Date: Fri, 15 Jan 2016 23:33:21 +0000
Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private()
Message-ID: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu>


Hi Barry,

In our code at each timestep we build MG level smoothers using PETSc KSP solvers. We are using a PETSc function KSPSetFromOptions()
after we set some default values to the KSP. However, the profiler is showing that PetscOptionsFindPair_Private() is taking about 14% of total runtime.
We ran the code for 100 timesteps, and preconditioner is built everytime step. I am posting a sequence of calls to KSPSolve_Richardson that shows
getting PETScOptions adds up to a lot of cost

"KSPSolve_Richardson" 	       8.80e+05 12.7% 	
   "PCApplyBAorAB" 	               5.25e+05  7.6% 	
     "PCApply" 	                               4.85e+05  7.0% 	
       "PCApply_ASM" 	               4.85e+05  7.0% 	
         "KSPSolve" 	                       4.53e+05  6.6% 	
           "KSPSolve_PREONLY"                      2.06e+05  3.0% 	
           "PetscObjectViewFromOptions" 	       3.19e+04  0.5% 	
           "PetscObjectViewFromOptions" 	       2.39e+04  0.3% 	
           "PetscOptionsGetBool" 	               2.39e+04  0.3% 	
           "PetscOptionsHasName" 	               2.39e+04  0.3% 	
          "PetscOptionsGetBool" 	               2.39e+04  0.3% 	
          "PetscOptionsHasName" 	               1.60e+04  0.2% 	
          "PetscOptionsGetBool" 	               1.60e+04  0.2% 	
          "PetscObjectViewFromOptions" 	       1.60e+04  0.2% 	
          "KSPReasonViewFromOptions" 	       1.60e+04  0.2% 	
          "PetscOptionsGetBool" 	               1.56e+04  0.2% 	
          "PetscObjectViewFromOptions" 	        7.98e+03  0.1% 	
          "KSPSetUpOnBlocks" 	                        7.98e+03  0.1% 	
          "PetscOptionsGetBool" 	                7.98e+03  0.1% 	
          "VecSet" 	                                        7.97e+03  0.1% 	
          "PetscObjectViewFromOptions" 	        7.92e+03  0.1%

Do you have some suggestions as to doing it in a fast way -- maybe parsing options only once in the simulation and making populating KSP 
options essentially a no-op?

Thanks,
--Amneet

From jed at jedbrown.org  Fri Jan 15 17:53:11 2016
From: jed at jedbrown.org (Jed Brown)
Date: Fri, 15 Jan 2016 16:53:11 -0700
Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private()
In-Reply-To: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu>
References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu>
Message-ID: <8760yu4cyg.fsf@jedbrown.org>

"Bhalla, Amneet Pal S" <amneetb at live.unc.edu> writes:

> Hi Barry,
>
> In our code at each timestep we build MG level smoothers using PETSc
> KSP solvers. 

Do you need to create new objects versus merely resetting them?  I
suspect that calling KSPReset() between timesteps instead of creating a
new object and calling KSPSetFromOptions() will fix your performance
woes.

> We are using a PETSc function KSPSetFromOptions() after we set some
> default values to the KSP. However, the profiler is showing that
> PetscOptionsFindPair_Private() is taking about 14% of total runtime.
> We ran the code for 100 timesteps, and preconditioner is built
> everytime step. I am posting a sequence of calls to
> KSPSolve_Richardson that shows getting PETScOptions adds up to a lot
> of cost
>
> "KSPSolve_Richardson" 	       8.80e+05 12.7% 	
>    "PCApplyBAorAB" 	               5.25e+05  7.6% 	
>      "PCApply" 	                               4.85e+05  7.0% 	
>        "PCApply_ASM" 	               4.85e+05  7.0% 	
>          "KSPSolve" 	                       4.53e+05  6.6% 	
>            "KSPSolve_PREONLY"                      2.06e+05  3.0% 	
>            "PetscObjectViewFromOptions" 	       3.19e+04  0.5% 	
>            "PetscObjectViewFromOptions" 	       2.39e+04  0.3% 	
>            "PetscOptionsGetBool" 	               2.39e+04  0.3% 	
>            "PetscOptionsHasName" 	               2.39e+04  0.3% 	
>           "PetscOptionsGetBool" 	               2.39e+04  0.3% 	
>           "PetscOptionsHasName" 	               1.60e+04  0.2% 	
>           "PetscOptionsGetBool" 	               1.60e+04  0.2% 	
>           "PetscObjectViewFromOptions" 	       1.60e+04  0.2% 	
>           "KSPReasonViewFromOptions" 	       1.60e+04  0.2% 	
>           "PetscOptionsGetBool" 	               1.56e+04  0.2% 	
>           "PetscObjectViewFromOptions" 	        7.98e+03  0.1% 	
>           "KSPSetUpOnBlocks" 	                        7.98e+03  0.1% 	
>           "PetscOptionsGetBool" 	                7.98e+03  0.1% 	
>           "VecSet" 	                                        7.97e+03  0.1% 	
>           "PetscObjectViewFromOptions" 	        7.92e+03  0.1%
>
> Do you have some suggestions as to doing it in a fast way -- maybe parsing options only once in the simulation and making populating KSP 
> options essentially a no-op?
>
> Thanks,
> --Amneet
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 818 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160115/ec12edb9/attachment.pgp>

From bsmith at mcs.anl.gov  Fri Jan 15 17:59:21 2016
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Fri, 15 Jan 2016 17:59:21 -0600
Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private()
In-Reply-To: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu>
References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu>
Message-ID: <6C48C329-D686-42F8-8C04-20B59485D742@mcs.anl.gov>


  Amneet,

    Thanks for bringing this to our attention. The long term design goal in PETSc is that the PetscOptions... calls are all made from the XXXSetFromOptions() calls and not within the numerical solver portions. Unfortunately this is not as easy to do as it might seem; hence there are a bunch of them scattered within the solver portions. 

   In particular the worse culprit is KSPSolve(). You can run the following experiment: edit src/ksp/ksp/interface/itfunc.c and locate the function KSPSolve() now comment out all the lines with the work Option in them (I count about 17 of them) now do make gnumake in that directory (of course with optimized build) then rerun your exact same code that you report for from below.

   How much faster is the total time and how much percentage are the troublesome Options calls now? In other words how much does this change help? A dramatic difference would motivate us to fix this problem sooner rather than later.

  Barry

> On Jan 15, 2016, at 5:33 PM, Bhalla, Amneet Pal S <amneetb at live.unc.edu> wrote:
> 
> 
> Hi Barry,
> 
> In our code at each timestep we build MG level smoothers using PETSc KSP solvers. We are using a PETSc function KSPSetFromOptions()
> after we set some default values to the KSP. However, the profiler is showing that PetscOptionsFindPair_Private() is taking about 14% of total runtime.
> We ran the code for 100 timesteps, and preconditioner is built everytime step. I am posting a sequence of calls to KSPSolve_Richardson that shows
> getting PETScOptions adds up to a lot of cost
> 
> "KSPSolve_Richardson" 	       8.80e+05 12.7% 	
>   "PCApplyBAorAB" 	               5.25e+05  7.6% 	
>     "PCApply" 	                               4.85e+05  7.0% 	
>       "PCApply_ASM" 	               4.85e+05  7.0% 	
>         "KSPSolve" 	                       4.53e+05  6.6% 	
>           "KSPSolve_PREONLY"                      2.06e+05  3.0% 	
>           "PetscObjectViewFromOptions" 	       3.19e+04  0.5% 	
>           "PetscObjectViewFromOptions" 	       2.39e+04  0.3% 	
>           "PetscOptionsGetBool" 	               2.39e+04  0.3% 	
>           "PetscOptionsHasName" 	               2.39e+04  0.3% 	
>          "PetscOptionsGetBool" 	               2.39e+04  0.3% 	
>          "PetscOptionsHasName" 	               1.60e+04  0.2% 	
>          "PetscOptionsGetBool" 	               1.60e+04  0.2% 	
>          "PetscObjectViewFromOptions" 	       1.60e+04  0.2% 	
>          "KSPReasonViewFromOptions" 	       1.60e+04  0.2% 	
>          "PetscOptionsGetBool" 	               1.56e+04  0.2% 	
>          "PetscObjectViewFromOptions" 	        7.98e+03  0.1% 	
>          "KSPSetUpOnBlocks" 	                        7.98e+03  0.1% 	
>          "PetscOptionsGetBool" 	                7.98e+03  0.1% 	
>          "VecSet" 	                                        7.97e+03  0.1% 	
>          "PetscObjectViewFromOptions" 	        7.92e+03  0.1%
> 
> Do you have some suggestions as to doing it in a fast way -- maybe parsing options only once in the simulation and making populating KSP 
> options essentially a no-op?
> 
> Thanks,
> --Amneet


From amneetb at live.unc.edu  Fri Jan 15 18:15:00 2016
From: amneetb at live.unc.edu (Bhalla, Amneet Pal S)
Date: Sat, 16 Jan 2016 00:15:00 +0000
Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private()
In-Reply-To: <8760yu4cyg.fsf@jedbrown.org>
References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu>
	<8760yu4cyg.fsf@jedbrown.org>
Message-ID: <8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu>


On Jan 15, 2016, at 3:53 PM, Jed Brown <jed at jedbrown.org<mailto:jed at jedbrown.org>> wrote:

Do you need to create new objects versus merely resetting them?  I
suspect that calling KSPReset() between timesteps instead of creating a
new object and calling KSPSetFromOptions() will fix your performance
woes.

We definitely need to destroy Mat associated with KSP everytime. This is a dynamic fluid-structure
interaction problem on AMR grid, where the Cartesian grid and the structure moves at every timestep.
Is it possible to reset a KSP with a different Mat? Are you suggesting to call KSPCreate() and
KSPSetFromOptions() only once at the beginning of simulation, KSPReset() after every timestep,
and KSPDestroy() at the end of the simulation?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160116/118bd89d/attachment.html>

From knepley at gmail.com  Fri Jan 15 19:40:51 2016
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 15 Jan 2016 19:40:51 -0600
Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private()
In-Reply-To: <8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu>
References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu>
	<8760yu4cyg.fsf@jedbrown.org>
	<8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu>
Message-ID: <CAMYG4GkiCBwfv9siK7uXTf+H79ZoxSvPo1R082cVUQeB9XuNUg@mail.gmail.com>

On Fri, Jan 15, 2016 at 6:15 PM, Bhalla, Amneet Pal S <amneetb at live.unc.edu>
wrote:

>
>
> On Jan 15, 2016, at 3:53 PM, Jed Brown <jed at jedbrown.org> wrote:
>
> Do you need to create new objects versus merely resetting them?  I
> suspect that calling KSPReset() between timesteps instead of creating a
> new object and calling KSPSetFromOptions() will fix your performance
> woes.
>
>
> We definitely need to destroy Mat associated with KSP everytime. This is a
> dynamic fluid-structure
> interaction problem on AMR grid, where the Cartesian grid and the
> structure moves at every timestep.
> Is it possible to reset a KSP with a different Mat? Are you suggesting to
> call KSPCreate() and
> KSPSetFromOptions() only once at the beginning of simulation, KSPReset()
> after every timestep,
> and KSPDestroy() at the end of the simulation?
>

That is how KSPReset() is supposed to work, but no one is really exercising
it now. I am inclined to try
Barry's experiment first, since this may have bugs that we have not yet
discovered. However, Jed is
correct that this is probably the best design.

  Matt

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160115/74791588/attachment.html>

From bsmith at mcs.anl.gov  Fri Jan 15 20:26:17 2016
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Fri, 15 Jan 2016 20:26:17 -0600
Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private()
In-Reply-To: <8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu>
References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu>
	<8760yu4cyg.fsf@jedbrown.org>
	<8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu>
Message-ID: <91D183F4-2BCE-4A06-A13A-8330619E4207@mcs.anl.gov>


  SNES/KSPReset() destroys all the vectors and matrices but keeps all the options that have been set for the object. So using it saves rebuilding those objects. For large systems Reset would save only a trivial amount of rebuild.

> On Jan 15, 2016, at 6:15 PM, Bhalla, Amneet Pal S <amneetb at live.unc.edu> wrote:
> 
> 
> 
>> On Jan 15, 2016, at 3:53 PM, Jed Brown <jed at jedbrown.org> wrote:
>> 
>> Do you need to create new objects versus merely resetting them?  I
>> suspect that calling KSPReset() between timesteps instead of creating a
>> new object and calling KSPSetFromOptions() will fix your performance
>> woes.
> 
> We definitely need to destroy Mat associated with KSP everytime. This is a dynamic fluid-structure 
> interaction problem on AMR grid, where the Cartesian grid and the structure moves at every timestep. 
> Is it possible to reset a KSP with a different Mat? Are you suggesting to call KSPCreate() and 
> KSPSetFromOptions() only once at the beginning of simulation, KSPReset() after every timestep,
> and KSPDestroy() at the end of the simulation? 


From amneetb at live.unc.edu  Fri Jan 15 23:34:22 2016
From: amneetb at live.unc.edu (Bhalla, Amneet Pal S)
Date: Sat, 16 Jan 2016 05:34:22 +0000
Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private()
In-Reply-To: <CAMYG4GkiCBwfv9siK7uXTf+H79ZoxSvPo1R082cVUQeB9XuNUg@mail.gmail.com>
References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu>
	<8760yu4cyg.fsf@jedbrown.org>
	<8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu>
	<CAMYG4GkiCBwfv9siK7uXTf+H79ZoxSvPo1R082cVUQeB9XuNUg@mail.gmail.com>
Message-ID: <906F4402-B8CD-46A5-8858-DE1517F29E46@ad.unc.edu>


On Jan 15, 2016, at 5:40 PM, Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>> wrote:

I am inclined to try
Barry's experiment first, since this may have bugs that we have not yet discovered.

Ok, I tried Barry?s suggestion. The runtime for PetscOptionsFindPair_Private() fell from 14% to mere 1.6%.
If I am getting it right, it?s the petsc options in the KSPSolve() that is sucking up nontrivial amount of time (14 - 1.6)
and not KSPSetFromOptions() itself (1.6%).

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160116/03acb5a9/attachment.html>

From boyceg at email.unc.edu  Sat Jan 16 07:12:03 2016
From: boyceg at email.unc.edu (Griffith, Boyce Eugene)
Date: Sat, 16 Jan 2016 13:12:03 +0000
Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private()
In-Reply-To: <906F4402-B8CD-46A5-8858-DE1517F29E46@ad.unc.edu>
References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu>
	<8760yu4cyg.fsf@jedbrown.org>
	<8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu>
	<CAMYG4GkiCBwfv9siK7uXTf+H79ZoxSvPo1R082cVUQeB9XuNUg@mail.gmail.com>
	<906F4402-B8CD-46A5-8858-DE1517F29E46@ad.unc.edu>
Message-ID: <4B163265-8984-454E-B88D-6343BDCDD5E4@email.unc.edu>


On Jan 16, 2016, at 12:34 AM, Bhalla, Amneet Pal S <amneetb at live.unc.edu<mailto:amneetb at live.unc.edu>> wrote:


On Jan 15, 2016, at 5:40 PM, Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>> wrote:

I am inclined to try
Barry's experiment first, since this may have bugs that we have not yet discovered.

Ok, I tried Barry?s suggestion. The runtime for PetscOptionsFindPair_Private() fell from 14% to mere 1.6%.
If I am getting it right, it?s the petsc options in the KSPSolve() that is sucking up nontrivial amount of time (14 - 1.6)
and not KSPSetFromOptions() itself (1.6%).

Barry / Matt / Jed, if we were using KSPReset here and reusing KSPs, would that also bypass these calls to PetscOptionsXXX?

Thanks,

-- Boyce
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160116/34bec8b1/attachment-0001.html>

From bsmith at mcs.anl.gov  Sat Jan 16 12:20:56 2016
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Sat, 16 Jan 2016 12:20:56 -0600
Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private()
In-Reply-To: <4B163265-8984-454E-B88D-6343BDCDD5E4@email.unc.edu>
References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu>
	<8760yu4cyg.fsf@jedbrown.org>
	<8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu>
	<CAMYG4GkiCBwfv9siK7uXTf+H79ZoxSvPo1R082cVUQeB9XuNUg@mail.gmail.com>
	<906F4402-B8CD-46A5-8858-DE1517F29E46@ad.unc.edu>
	<4B163265-8984-454E-B88D-6343BDCDD5E4@email.unc.edu>
Message-ID: <A580A7C8-A1B3-403A-A9BB-71E5957ACC79@mcs.anl.gov>


> On Jan 16, 2016, at 7:12 AM, Griffith, Boyce Eugene <boyceg at email.unc.edu> wrote:
> 
> 
>> On Jan 16, 2016, at 12:34 AM, Bhalla, Amneet Pal S <amneetb at live.unc.edu> wrote:
>> 
>> 
>> 
>>> On Jan 15, 2016, at 5:40 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>> 
>>> I am inclined to try
>>> Barry's experiment first, since this may have bugs that we have not yet discovered.
>> 
>> Ok, I tried Barry?s suggestion. The runtime for PetscOptionsFindPair_Private() fell from 14% to mere 1.6%.
>> If I am getting it right, it?s the petsc options in the KSPSolve() that is sucking up nontrivial amount of time (14 - 1.6)
>> and not KSPSetFromOptions() itself (1.6%). 
> 
> Barry / Matt / Jed, if we were using KSPReset here and reusing KSPs, would that also bypass these calls to PetscOptionsXXX?

  No that is a different issue.

   In the short term I recommend when running optimized/production you work with a PETSc with those Options checking in KSPSolve commented out, you don't use them anyways*.  Since you are using ASM with many subdomains there are many "fast" calls to KSPSolve which is why for your particular case the the PetscOptionsFindPair_Private takes so much time.

  Now that you have eliminated this issue I would be very interested in seeing the HPCToolKit or Instruments profiling of the code to see  hot spots in the PETSc solver configuration you are using. Thanks 

   Barry

* Eventually we'll switch to a KSPPreSolveMonitorSet() and KSPPostSolveMonitorSet() model to eliminate this overhead but still have the functionality.

> 
> Thanks,
> 
> -- Boyce


From amneetb at live.unc.edu  Sat Jan 16 15:00:08 2016
From: amneetb at live.unc.edu (Bhalla, Amneet Pal S)
Date: Sat, 16 Jan 2016 21:00:08 +0000
Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private()
In-Reply-To: <A580A7C8-A1B3-403A-A9BB-71E5957ACC79@mcs.anl.gov>
References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu>
	<8760yu4cyg.fsf@jedbrown.org>
	<8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu>
	<CAMYG4GkiCBwfv9siK7uXTf+H79ZoxSvPo1R082cVUQeB9XuNUg@mail.gmail.com>
	<906F4402-B8CD-46A5-8858-DE1517F29E46@ad.unc.edu>
	<4B163265-8984-454E-B88D-6343BDCDD5E4@email.unc.edu>,
	<A580A7C8-A1B3-403A-A9BB-71E5957ACC79@mcs.anl.gov>
Message-ID: <AB0D809B-A9BD-4C5A-B444-087892208AFB@live.unc.edu>


--Amneet Bhalla

On Jan 16, 2016, at 10:21 AM, Barry Smith <bsmith at mcs.anl.gov<mailto:bsmith at mcs.anl.gov>> wrote:


On Jan 16, 2016, at 7:12 AM, Griffith, Boyce Eugene <boyceg at email.unc.edu<mailto:boyceg at email.unc.edu>> wrote:


On Jan 16, 2016, at 12:34 AM, Bhalla, Amneet Pal S <amneetb at live.unc.edu<mailto:amneetb at live.unc.edu>> wrote:


On Jan 15, 2016, at 5:40 PM, Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>> wrote:

I am inclined to try
Barry's experiment first, since this may have bugs that we have not yet discovered.

Ok, I tried Barry's suggestion. The runtime for PetscOptionsFindPair_Private() fell from 14% to mere 1.6%.
If I am getting it right, it's the petsc options in the KSPSolve() that is sucking up nontrivial amount of time (14 - 1.6)
and not KSPSetFromOptions() itself (1.6%).

Barry / Matt / Jed, if we were using KSPReset here and reusing KSPs, would that also bypass these calls to PetscOptionsXXX?

 No that is a different issue.

  In the short term I recommend when running optimized/production you work with a PETSc with those Options checking in KSPSolve commented out, you don't use them anyways*.  Since you are using ASM with many subdomains there are many "fast" calls to KSPSolve which is why for your particular case the the PetscOptionsFindPair_Private takes so much time.

 Now that you have eliminated this issue I would be very interested in seeing the HPCToolKit or Instruments profiling of the code to see  hot spots in the PETSc solver configuration you are using. Thanks

Barry --- the best way and the least back and forth way would be if I can send you the files (maybe off-list) that you can view in HPCViewer, which is a light weight java script app. You can view which the calling context (which petsc function calls which internal petsc routine) in a cascade form. If I send you an excel sheet, it would be in a flat view and not that useful for serious profiling.

Let me know if you would like to try that.

  Barry

* Eventually we'll switch to a KSPPreSolveMonitorSet() and KSPPostSolveMonitorSet() model to eliminate this overhead but still have the functionality.


Thanks,

-- Boyce

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160116/ece0342e/attachment.html>

From boyceg at email.unc.edu  Sat Jan 16 15:04:37 2016
From: boyceg at email.unc.edu (Griffith, Boyce Eugene)
Date: Sat, 16 Jan 2016 21:04:37 +0000
Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private()
In-Reply-To: <AB0D809B-A9BD-4C5A-B444-087892208AFB@live.unc.edu>
References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu>
	<8760yu4cyg.fsf@jedbrown.org>
	<8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu>
	<CAMYG4GkiCBwfv9siK7uXTf+H79ZoxSvPo1R082cVUQeB9XuNUg@mail.gmail.com>
	<906F4402-B8CD-46A5-8858-DE1517F29E46@ad.unc.edu>
	<4B163265-8984-454E-B88D-6343BDCDD5E4@email.unc.edu>
	<A580A7C8-A1B3-403A-A9BB-71E5957ACC79@mcs.anl.gov>
	<AB0D809B-A9BD-4C5A-B444-087892208AFB@live.unc.edu>
Message-ID: <CE7C123F-DE24-45D6-9930-F2A733E26DC8@email.unc.edu>


On Jan 16, 2016, at 4:00 PM, Bhalla, Amneet Pal S <amneetb at live.unc.edu<mailto:amneetb at live.unc.edu>> wrote:


--Amneet Bhalla

On Jan 16, 2016, at 10:21 AM, Barry Smith <bsmith at mcs.anl.gov<mailto:bsmith at mcs.anl.gov>> wrote:


On Jan 16, 2016, at 7:12 AM, Griffith, Boyce Eugene <boyceg at email.unc.edu<mailto:boyceg at email.unc.edu>> wrote:


On Jan 16, 2016, at 12:34 AM, Bhalla, Amneet Pal S <amneetb at live.unc.edu<mailto:amneetb at live.unc.edu>> wrote:


On Jan 15, 2016, at 5:40 PM, Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>> wrote:

I am inclined to try
Barry's experiment first, since this may have bugs that we have not yet discovered.

Ok, I tried Barry?s suggestion. The runtime for PetscOptionsFindPair_Private() fell from 14% to mere 1.6%.
If I am getting it right, it?s the petsc options in the KSPSolve() that is sucking up nontrivial amount of time (14 - 1.6)
and not KSPSetFromOptions() itself (1.6%).

Barry / Matt / Jed, if we were using KSPReset here and reusing KSPs, would that also bypass these calls to PetscOptionsXXX?

 No that is a different issue.

  In the short term I recommend when running optimized/production you work with a PETSc with those Options checking in KSPSolve commented out, you don't use them anyways*.  Since you are using ASM with many subdomains there are many "fast" calls to KSPSolve which is why for your particular case the the PetscOptionsFindPair_Private takes so much time.

 Now that you have eliminated this issue I would be very interested in seeing the HPCToolKit or Instruments profiling of the code to see  hot spots in the PETSc solver configuration you are using. Thanks

Barry --- the best way and the least back and forth way would be if I can send you the files (maybe off-list) that you can view in HPCViewer, which is a light weight java script app. You can view which the calling context (which petsc function calls which internal petsc routine) in a cascade form. If I send you an excel sheet, it would be in a flat view and not that useful for serious profiling.

Amneet, can you just run with OS X Instruments, which Barry already knows how to use (right Barry?)? :-)

Thanks,

-- Boyce


Let me know if you would like to try that.

  Barry

* Eventually we'll switch to a KSPPreSolveMonitorSet() and KSPPostSolveMonitorSet() model to eliminate this overhead but still have the functionality.


Thanks,

-- Boyce


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160116/ba68dace/attachment.html>

From amneetb at live.unc.edu  Sat Jan 16 15:06:46 2016
From: amneetb at live.unc.edu (Bhalla, Amneet Pal S)
Date: Sat, 16 Jan 2016 21:06:46 +0000
Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private()
In-Reply-To: <CE7C123F-DE24-45D6-9930-F2A733E26DC8@email.unc.edu>
References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu>
	<8760yu4cyg.fsf@jedbrown.org>
	<8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu>
	<CAMYG4GkiCBwfv9siK7uXTf+H79ZoxSvPo1R082cVUQeB9XuNUg@mail.gmail.com>
	<906F4402-B8CD-46A5-8858-DE1517F29E46@ad.unc.edu>
	<4B163265-8984-454E-B88D-6343BDCDD5E4@email.unc.edu>
	<A580A7C8-A1B3-403A-A9BB-71E5957ACC79@mcs.anl.gov>
	<AB0D809B-A9BD-4C5A-B444-087892208AFB@live.unc.edu>,
	<CE7C123F-DE24-45D6-9930-F2A733E26DC8@email.unc.edu>
Message-ID: <C32D52A1-24DE-4DF7-B1B3-85CE22C7DF51@live.unc.edu>

Does Instruments save results somewhere (like in a cascade view) that I can send to Barry?

--Amneet Bhalla

On Jan 16, 2016, at 1:04 PM, Griffith, Boyce Eugene <boyceg at email.unc.edu<mailto:boyceg at email.unc.edu>> wrote:


On Jan 16, 2016, at 4:00 PM, Bhalla, Amneet Pal S <amneetb at live.unc.edu<mailto:amneetb at live.unc.edu>> wrote:


--Amneet Bhalla

On Jan 16, 2016, at 10:21 AM, Barry Smith <bsmith at mcs.anl.gov<mailto:bsmith at mcs.anl.gov>> wrote:


On Jan 16, 2016, at 7:12 AM, Griffith, Boyce Eugene <boyceg at email.unc.edu<mailto:boyceg at email.unc.edu>> wrote:


On Jan 16, 2016, at 12:34 AM, Bhalla, Amneet Pal S <amneetb at live.unc.edu<mailto:amneetb at live.unc.edu>> wrote:


On Jan 15, 2016, at 5:40 PM, Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>> wrote:

I am inclined to try
Barry's experiment first, since this may have bugs that we have not yet discovered.

Ok, I tried Barry's suggestion. The runtime for PetscOptionsFindPair_Private() fell from 14% to mere 1.6%.
If I am getting it right, it's the petsc options in the KSPSolve() that is sucking up nontrivial amount of time (14 - 1.6)
and not KSPSetFromOptions() itself (1.6%).

Barry / Matt / Jed, if we were using KSPReset here and reusing KSPs, would that also bypass these calls to PetscOptionsXXX?

 No that is a different issue.

  In the short term I recommend when running optimized/production you work with a PETSc with those Options checking in KSPSolve commented out, you don't use them anyways*.  Since you are using ASM with many subdomains there are many "fast" calls to KSPSolve which is why for your particular case the the PetscOptionsFindPair_Private takes so much time.

 Now that you have eliminated this issue I would be very interested in seeing the HPCToolKit or Instruments profiling of the code to see  hot spots in the PETSc solver configuration you are using. Thanks

Barry --- the best way and the least back and forth way would be if I can send you the files (maybe off-list) that you can view in HPCViewer, which is a light weight java script app. You can view which the calling context (which petsc function calls which internal petsc routine) in a cascade form. If I send you an excel sheet, it would be in a flat view and not that useful for serious profiling.

Amneet, can you just run with OS X Instruments, which Barry already knows how to use (right Barry?)? :-)

Thanks,

-- Boyce


Let me know if you would like to try that.

  Barry

* Eventually we'll switch to a KSPPreSolveMonitorSet() and KSPPostSolveMonitorSet() model to eliminate this overhead but still have the functionality.


Thanks,

-- Boyce


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160116/4dc61cd0/attachment-0001.html>

From boyceg at email.unc.edu  Sat Jan 16 15:10:56 2016
From: boyceg at email.unc.edu (Griffith, Boyce Eugene)
Date: Sat, 16 Jan 2016 21:10:56 +0000
Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private()
In-Reply-To: <C32D52A1-24DE-4DF7-B1B3-85CE22C7DF51@live.unc.edu>
References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu>
	<8760yu4cyg.fsf@jedbrown.org>
	<8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu>
	<CAMYG4GkiCBwfv9siK7uXTf+H79ZoxSvPo1R082cVUQeB9XuNUg@mail.gmail.com>
	<906F4402-B8CD-46A5-8858-DE1517F29E46@ad.unc.edu>
	<4B163265-8984-454E-B88D-6343BDCDD5E4@email.unc.edu>
	<A580A7C8-A1B3-403A-A9BB-71E5957ACC79@mcs.anl.gov>
	<AB0D809B-A9BD-4C5A-B444-087892208AFB@live.unc.edu>
	<CE7C123F-DE24-45D6-9930-F2A733E26DC8@email.unc.edu>
	<C32D52A1-24DE-4DF7-B1B3-85CE22C7DF51@live.unc.edu>
Message-ID: <258B10DE-29A8-4C89-95D1-9544748819C0@email.unc.edu>


On Jan 16, 2016, at 4:06 PM, Bhalla, Amneet Pal S <amneetb at live.unc.edu<mailto:amneetb at live.unc.edu>> wrote:

Does Instruments save results somewhere (like in a cascade view) that I can send to Barry?

Yes --- "save as..." will save the current trace, and then you can open it back up.

-- Boyce

--Amneet Bhalla

On Jan 16, 2016, at 1:04 PM, Griffith, Boyce Eugene <boyceg at email.unc.edu<mailto:boyceg at email.unc.edu>> wrote:


On Jan 16, 2016, at 4:00 PM, Bhalla, Amneet Pal S <amneetb at live.unc.edu<mailto:amneetb at live.unc.edu>> wrote:


--Amneet Bhalla

On Jan 16, 2016, at 10:21 AM, Barry Smith <bsmith at mcs.anl.gov<mailto:bsmith at mcs.anl.gov>> wrote:


On Jan 16, 2016, at 7:12 AM, Griffith, Boyce Eugene <boyceg at email.unc.edu<mailto:boyceg at email.unc.edu>> wrote:


On Jan 16, 2016, at 12:34 AM, Bhalla, Amneet Pal S <amneetb at live.unc.edu<mailto:amneetb at live.unc.edu>> wrote:


On Jan 15, 2016, at 5:40 PM, Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>> wrote:

I am inclined to try
Barry's experiment first, since this may have bugs that we have not yet discovered.

Ok, I tried Barry?s suggestion. The runtime for PetscOptionsFindPair_Private() fell from 14% to mere 1.6%.
If I am getting it right, it?s the petsc options in the KSPSolve() that is sucking up nontrivial amount of time (14 - 1.6)
and not KSPSetFromOptions() itself (1.6%).

Barry / Matt / Jed, if we were using KSPReset here and reusing KSPs, would that also bypass these calls to PetscOptionsXXX?

 No that is a different issue.

  In the short term I recommend when running optimized/production you work with a PETSc with those Options checking in KSPSolve commented out, you don't use them anyways*.  Since you are using ASM with many subdomains there are many "fast" calls to KSPSolve which is why for your particular case the the PetscOptionsFindPair_Private takes so much time.

 Now that you have eliminated this issue I would be very interested in seeing the HPCToolKit or Instruments profiling of the code to see  hot spots in the PETSc solver configuration you are using. Thanks

Barry --- the best way and the least back and forth way would be if I can send you the files (maybe off-list) that you can view in HPCViewer, which is a light weight java script app. You can view which the calling context (which petsc function calls which internal petsc routine) in a cascade form. If I send you an excel sheet, it would be in a flat view and not that useful for serious profiling.

Amneet, can you just run with OS X Instruments, which Barry already knows how to use (right Barry?)? :-)

Thanks,

-- Boyce


Let me know if you would like to try that.

  Barry

* Eventually we'll switch to a KSPPreSolveMonitorSet() and KSPPostSolveMonitorSet() model to eliminate this overhead but still have the functionality.


Thanks,

-- Boyce


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160116/477290e9/attachment.html>

From bsmith at mcs.anl.gov  Sat Jan 16 15:13:38 2016
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Sat, 16 Jan 2016 15:13:38 -0600
Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private()
In-Reply-To: <258B10DE-29A8-4C89-95D1-9544748819C0@email.unc.edu>
References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu>
	<8760yu4cyg.fsf@jedbrown.org>
	<8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu>
	<CAMYG4GkiCBwfv9siK7uXTf+H79ZoxSvPo1R082cVUQeB9XuNUg@mail.gmail.com>
	<906F4402-B8CD-46A5-8858-DE1517F29E46@ad.unc.edu>
	<4B163265-8984-454E-B88D-6343BDCDD5E4@email.unc.edu>
	<A580A7C8-A1B3-403A-A9BB-71E5957ACC79@mcs.anl.gov>
	<AB0D809B-A9BD-4C5A-B444-087892208AFB@live.unc.edu>
	<CE7C123F-DE24-45D6-9930-F2A733E26DC8@email.unc.edu>
	<C32D52A1-24DE-4DF7-B1B3-85CE22C7DF51@live.unc.edu>
	<258B10DE-29A8-4C89-95D1-9544748819C0@email.unc.edu>
Message-ID: <EED1EBCB-85EF-40E7-B7EA-D924D4CC1B8F@mcs.anl.gov>


> On Jan 16, 2016, at 3:10 PM, Griffith, Boyce Eugene <boyceg at email.unc.edu> wrote:
> 
> 
>> On Jan 16, 2016, at 4:06 PM, Bhalla, Amneet Pal S <amneetb at live.unc.edu> wrote:
>> 
>> Does Instruments save results somewhere (like in a cascade view) that I can send to Barry?
> 
> Yes --- "save as..." will save the current trace, and then you can open it back up.

  Either way is fine so long as I don't have to install a ton of stuff; which it sounds like I won't.

  Barry

> 
> -- Boyce
> 
>> --Amneet Bhalla 
>> 
>> On Jan 16, 2016, at 1:04 PM, Griffith, Boyce Eugene <boyceg at email.unc.edu> wrote:
>> 
>>> 
>>>> On Jan 16, 2016, at 4:00 PM, Bhalla, Amneet Pal S <amneetb at live.unc.edu> wrote:
>>>> 
>>>> 
>>>> 
>>>> --Amneet Bhalla 
>>>> 
>>>> On Jan 16, 2016, at 10:21 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>> 
>>>>> 
>>>>>> On Jan 16, 2016, at 7:12 AM, Griffith, Boyce Eugene <boyceg at email.unc.edu> wrote:
>>>>>> 
>>>>>> 
>>>>>>> On Jan 16, 2016, at 12:34 AM, Bhalla, Amneet Pal S <amneetb at live.unc.edu> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On Jan 15, 2016, at 5:40 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>>>>>>> 
>>>>>>>> I am inclined to try
>>>>>>>> Barry's experiment first, since this may have bugs that we have not yet discovered.
>>>>>>> 
>>>>>>> Ok, I tried Barry?s suggestion. The runtime for PetscOptionsFindPair_Private() fell from 14% to mere 1.6%.
>>>>>>> If I am getting it right, it?s the petsc options in the KSPSolve() that is sucking up nontrivial amount of time (14 - 1.6)
>>>>>>> and not KSPSetFromOptions() itself (1.6%). 
>>>>>> 
>>>>>> Barry / Matt / Jed, if we were using KSPReset here and reusing KSPs, would that also bypass these calls to PetscOptionsXXX?
>>>>> 
>>>>>  No that is a different issue.
>>>>> 
>>>>>   In the short term I recommend when running optimized/production you work with a PETSc with those Options checking in KSPSolve commented out, you don't use them anyways*.  Since you are using ASM with many subdomains there are many "fast" calls to KSPSolve which is why for your particular case the the PetscOptionsFindPair_Private takes so much time.
>>>>> 
>>>>>  Now that you have eliminated this issue I would be very interested in seeing the HPCToolKit or Instruments profiling of the code to see  hot spots in the PETSc solver configuration you are using. Thanks 
>>>> 
>>>> Barry --- the best way and the least back and forth way would be if I can send you the files (maybe off-list) that you can view in HPCViewer, which is a light weight java script app. You can view which the calling context (which petsc function calls which internal petsc routine) in a cascade form. If I send you an excel sheet, it would be in a flat view and not that useful for serious profiling.
>>> 
>>> Amneet, can you just run with OS X Instruments, which Barry already knows how to use (right Barry?)? :-)
>>> 
>>> Thanks,
>>> 
>>> -- Boyce
>>> 
>>>> 
>>>> Let me know if you would like to try that. 
>>>>> 
>>>>>   Barry
>>>>> 
>>>>> * Eventually we'll switch to a KSPPreSolveMonitorSet() and KSPPostSolveMonitorSet() model to eliminate this overhead but still have the functionality.
>>>>> 
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> -- Boyce
>>>>> 
>>> 
> 


From knepley at gmail.com  Sat Jan 16 08:41:44 2016
From: knepley at gmail.com (Matthew Knepley)
Date: Sat, 16 Jan 2016 08:41:44 -0600
Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private()
In-Reply-To: <4B163265-8984-454E-B88D-6343BDCDD5E4@email.unc.edu>
References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu>
	<8760yu4cyg.fsf@jedbrown.org>
	<8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu>
	<CAMYG4GkiCBwfv9siK7uXTf+H79ZoxSvPo1R082cVUQeB9XuNUg@mail.gmail.com>
	<906F4402-B8CD-46A5-8858-DE1517F29E46@ad.unc.edu>
	<4B163265-8984-454E-B88D-6343BDCDD5E4@email.unc.edu>
Message-ID: <CAMYG4G=syPqAN3a2ud-LzMg5PvQLAj+70CbkT3TFxtJJ6NFysw@mail.gmail.com>

On Sat, Jan 16, 2016 at 7:12 AM, Griffith, Boyce Eugene <
boyceg at email.unc.edu> wrote:

>
> On Jan 16, 2016, at 12:34 AM, Bhalla, Amneet Pal S <amneetb at live.unc.edu>
> wrote:
>
>
>
> On Jan 15, 2016, at 5:40 PM, Matthew Knepley <knepley at gmail.com> wrote:
>
> I am inclined to try
> Barry's experiment first, since this may have bugs that we have not yet
> discovered.
>
>
> Ok, I tried Barry?s suggestion. The runtime for
> PetscOptionsFindPair_Private() fell from 14% to mere 1.6%.
> If I am getting it right, it?s the petsc options in the KSPSolve() that is
> sucking up nontrivial amount of time (14 - 1.6)
> and not KSPSetFromOptions() itself (1.6%).
>
>
> Barry / Matt / Jed, if we were using KSPReset here and reusing KSPs, would
> that also bypass these calls to PetscOptionsXXX?
>

No, we have to fix KSPSolve(). We will do it right now.

  Thanks,

    Matt


> Thanks,
>
> -- Boyce
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160116/9310d2c4/attachment.html>

From amneetb at live.unc.edu  Sat Jan 16 17:09:26 2016
From: amneetb at live.unc.edu (Bhalla, Amneet Pal S)
Date: Sat, 16 Jan 2016 23:09:26 +0000
Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private()
In-Reply-To: <EED1EBCB-85EF-40E7-B7EA-D924D4CC1B8F@mcs.anl.gov>
References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu>
	<8760yu4cyg.fsf@jedbrown.org>
	<8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu>
	<CAMYG4GkiCBwfv9siK7uXTf+H79ZoxSvPo1R082cVUQeB9XuNUg@mail.gmail.com>
	<906F4402-B8CD-46A5-8858-DE1517F29E46@ad.unc.edu>
	<4B163265-8984-454E-B88D-6343BDCDD5E4@email.unc.edu>
	<A580A7C8-A1B3-403A-A9BB-71E5957ACC79@mcs.anl.gov>
	<AB0D809B-A9BD-4C5A-B444-087892208AFB@live.unc.edu>
	<CE7C123F-DE24-45D6-9930-F2A733E26DC8@email.unc.edu>
	<C32D52A1-24DE-4DF7-B1B3-85CE22C7DF51@live.unc.edu>
	<258B10DE-29A8-4C89-95D1-9544748819C0@email.unc.edu>
	<EED1EBCB-85EF-40E7-B7EA-D924D4CC1B8F@mcs.anl.gov>
Message-ID: <281383E4-00B6-4911-8184-D82E04F88EF3@ad.unc.edu>


On Jan 16, 2016, at 1:13 PM, Barry Smith <bsmith at mcs.anl.gov<mailto:bsmith at mcs.anl.gov>> wrote:

Either way is fine so long as I don't have to install a ton of stuff; which it sounds like I won?t.

http://hpctoolkit.org/download/hpcviewer/

Unzip HPCViewer for MacOSX with command line and drag the unzipped folder to Applications. You will be able to
fire HPCViewer from LaunchPad. Point it to this attached directory. You will be able to see three different kind of profiling
under Calling Context View, Callers View and Flat View.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160116/c8911b9f/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hpctoolkit-main2d-database.zip
Type: application/zip
Size: 1076038 bytes
Desc: hpctoolkit-main2d-database.zip
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160116/c8911b9f/attachment-0001.zip>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160116/c8911b9f/attachment-0001.htm>

From bsmith at mcs.anl.gov  Sat Jan 16 17:46:17 2016
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Sat, 16 Jan 2016 17:46:17 -0600
Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private()
In-Reply-To: <281383E4-00B6-4911-8184-D82E04F88EF3@ad.unc.edu>
References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu>
	<8760yu4cyg.fsf@jedbrown.org>
	<8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu>
	<CAMYG4GkiCBwfv9siK7uXTf+H79ZoxSvPo1R082cVUQeB9XuNUg@mail.gmail.com>
	<906F4402-B8CD-46A5-8858-DE1517F29E46@ad.unc.edu>
	<4B163265-8984-454E-B88D-6343BDCDD5E4@email.unc.edu>
	<A580A7C8-A1B3-403A-A9BB-71E5957ACC79@mcs.anl.gov>
	<AB0D809B-A9BD-4C5A-B444-087892208AFB@live.unc.edu>
	<CE7C123F-DE24-45D6-9930-F2A733E26DC8@email.unc.edu>
	<C32D52A1-24DE-4DF7-B1B3-85CE22C7DF51@live.unc.edu>
	<258B10DE-29A8-4C89-95D1-9544748819C0@email.unc.edu>
	<EED1EBCB-85EF-40E7-B7EA-D924D4CC1B8F@mcs.anl.gov>
	<281383E4-00B6-4911-8184-D82E04F88EF3@ad.unc.edu>
Message-ID: <B117C939-A217-4D1F-82C1-268DFC442047@mcs.anl.gov>


  Just as I feared. HPC software with bad dependencies, oh well charging ahead anyways


> On Jan 16, 2016, at 5:09 PM, Bhalla, Amneet Pal S <amneetb at live.unc.edu> wrote:
> 
> 
> 
> 
>> On Jan 16, 2016, at 1:13 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>> 
>> Either way is fine so long as I don't have to install a ton of stuff; which it sounds like I won?t.
> 
> http://hpctoolkit.org/download/hpcviewer/
> 
> Unzip HPCViewer for MacOSX with command line and drag the unzipped folder to Applications. You will be able to 
> fire HPCViewer from LaunchPad. Point it to this attached directory. You will be able to see three different kind of profiling
> under Calling Context View, Callers View and Flat View.   
> 
> 
> 
> <hpctoolkit-main2d-database.zip>
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160116/6cf16f46/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Untitled.png
Type: image/png
Size: 748263 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160116/6cf16f46/attachment-0001.png>

From amneetb at live.unc.edu  Sat Jan 16 17:58:51 2016
From: amneetb at live.unc.edu (Bhalla, Amneet Pal S)
Date: Sat, 16 Jan 2016 23:58:51 +0000
Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private()
In-Reply-To: <B117C939-A217-4D1F-82C1-268DFC442047@mcs.anl.gov>
References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu>
	<8760yu4cyg.fsf@jedbrown.org>
	<8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu>
	<CAMYG4GkiCBwfv9siK7uXTf+H79ZoxSvPo1R082cVUQeB9XuNUg@mail.gmail.com>
	<906F4402-B8CD-46A5-8858-DE1517F29E46@ad.unc.edu>
	<4B163265-8984-454E-B88D-6343BDCDD5E4@email.unc.edu>
	<A580A7C8-A1B3-403A-A9BB-71E5957ACC79@mcs.anl.gov>
	<AB0D809B-A9BD-4C5A-B444-087892208AFB@live.unc.edu>
	<CE7C123F-DE24-45D6-9930-F2A733E26DC8@email.unc.edu>
	<C32D52A1-24DE-4DF7-B1B3-85CE22C7DF51@live.unc.edu>
	<258B10DE-29A8-4C89-95D1-9544748819C0@email.unc.edu>
	<EED1EBCB-85EF-40E7-B7EA-D924D4CC1B8F@mcs.anl.gov>
	<281383E4-00B6-4911-8184-D82E04F88EF3@ad.unc.edu>
	<B117C939-A217-4D1F-82C1-268DFC442047@mcs.anl.gov>
Message-ID: <B4EC92DD-45AF-43E3-8758-64FB466CE302@ad.unc.edu>


On Jan 16, 2016, at 3:46 PM, Barry Smith <bsmith at mcs.anl.gov<mailto:bsmith at mcs.anl.gov>> wrote:

Just as I feared. HPC software with bad dependencies, oh well charging ahead anyways

Hmm... I have latest Java on my system. Can you try downloading it on a different browser (say Chrome)? Probably Safari is
trying to unzip the file itself. You need to unzip by command line as Justin suggested.


[cid:97DF211D-D876-4B67-B6E7-E002DA2EE95A]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160116/c1e68a40/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen Shot 2016-01-16 at 3.53.58 PM.png
Type: image/png
Size: 3467612 bytes
Desc: Screen Shot 2016-01-16 at 3.53.58 PM.png
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160116/c1e68a40/attachment-0001.png>

From bsmith at mcs.anl.gov  Sat Jan 16 18:00:14 2016
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Sat, 16 Jan 2016 18:00:14 -0600
Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private()
In-Reply-To: <281383E4-00B6-4911-8184-D82E04F88EF3@ad.unc.edu>
References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu>
	<8760yu4cyg.fsf@jedbrown.org>
	<8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu>
	<CAMYG4GkiCBwfv9siK7uXTf+H79ZoxSvPo1R082cVUQeB9XuNUg@mail.gmail.com>
	<906F4402-B8CD-46A5-8858-DE1517F29E46@ad.unc.edu>
	<4B163265-8984-454E-B88D-6343BDCDD5E4@email.unc.edu>
	<A580A7C8-A1B3-403A-A9BB-71E5957ACC79@mcs.anl.gov>
	<AB0D809B-A9BD-4C5A-B444-087892208AFB@live.unc.edu>
	<CE7C123F-DE24-45D6-9930-F2A733E26DC8@email.unc.edu>
	<C32D52A1-24DE-4DF7-B1B3-85CE22C7DF51@live.unc.edu>
	<258B10DE-29A8-4C89-95D1-9544748819C0@email.unc.edu>
	<EED1EBCB-85EF-40E7-B7EA-D924D4CC1B8F@mcs.anl.gov>
	<281383E4-00B6-4911-8184-D82E04F88EF3@ad.unc.edu>
Message-ID: <AF77F729-B9BE-4614-98B3-4438DEA39E44@mcs.anl.gov>


  Ok, I looked at your results in hpcviewer and don't see any surprises. The PETSc time is in the little LU factorizations, the LU solves and the matrix-vector products as it should be. Not much can be done on speeding these except running on machines with high memory bandwidth. 

   If you are using the master branch of PETSc two users gave us a nifty new profiler that is "PETSc style" but shows the hierarchy of PETSc solvers time and flop etc. You can run with -log_view :filename.xml:ascii_xml and then open the file with a browser (for example open -f Safari filename.xml) or email the file.

   Barry

> On Jan 16, 2016, at 5:09 PM, Bhalla, Amneet Pal S <amneetb at live.unc.edu> wrote:
> 
> 
> 
>> On Jan 16, 2016, at 1:13 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>> 
>> Either way is fine so long as I don't have to install a ton of stuff; which it sounds like I won?t.
> 
> http://hpctoolkit.org/download/hpcviewer/
> 
> Unzip HPCViewer for MacOSX with command line and drag the unzipped folder to Applications. You will be able to 
> fire HPCViewer from LaunchPad. Point it to this attached directory. You will be able to see three different kind of profiling
> under Calling Context View, Callers View and Flat View.   
> 
> 
> 
> <hpctoolkit-main2d-database.zip>
> 


From bsmith at mcs.anl.gov  Sat Jan 16 18:05:17 2016
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Sat, 16 Jan 2016 18:05:17 -0600
Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private()
In-Reply-To: <B4EC92DD-45AF-43E3-8758-64FB466CE302@ad.unc.edu>
References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu>
	<8760yu4cyg.fsf@jedbrown.org>
	<8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu>
	<CAMYG4GkiCBwfv9siK7uXTf+H79ZoxSvPo1R082cVUQeB9XuNUg@mail.gmail.com>
	<906F4402-B8CD-46A5-8858-DE1517F29E46@ad.unc.edu>
	<4B163265-8984-454E-B88D-6343BDCDD5E4@email.unc.edu>
	<A580A7C8-A1B3-403A-A9BB-71E5957ACC79@mcs.anl.gov>
	<AB0D809B-A9BD-4C5A-B444-087892208AFB@live.unc.edu>
	<CE7C123F-DE24-45D6-9930-F2A733E26DC8@email.unc.edu>
	<C32D52A1-24DE-4DF7-B1B3-85CE22C7DF51@live.unc.edu>
	<258B10DE-29A8-4C89-95D1-9544748819C0@email.unc.edu>
	<EED1EBCB-85EF-40E7-B7EA-D924D4CC1B8F@mcs.anl.gov>
	<281383E4-00B6-4911-8184-D82E04F88EF3@ad.unc.edu>
	<B117C939-A217-4D1F-82C1-268DFC442047@mcs.anl.gov>
	<B4EC92DD-45AF-43E3-8758-64FB466CE302@ad.unc.edu>
Message-ID: <AB847F9A-D7E7-470D-9B60-A0475CD88100@mcs.anl.gov>


  No problem, I got it installed. I just like grumble about HPC people who use Java :-)

> On Jan 16, 2016, at 5:58 PM, Bhalla, Amneet Pal S <amneetb at live.unc.edu> wrote:
> 
> 
> 
>> On Jan 16, 2016, at 3:46 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>> 
>> Just as I feared. HPC software with bad dependencies, oh well charging ahead anyways
> 
> Hmm... I have latest Java on my system. Can you try downloading it on a different browser (say Chrome)? Probably Safari is 
> trying to unzip the file itself. You need to unzip by command line as Justin suggested. 
> 
> 
> <Screen Shot 2016-01-16 at 3.53.58 PM.png>


From amneetb at live.unc.edu  Sat Jan 16 18:07:31 2016
From: amneetb at live.unc.edu (Bhalla, Amneet Pal S)
Date: Sun, 17 Jan 2016 00:07:31 +0000
Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private()
In-Reply-To: <AF77F729-B9BE-4614-98B3-4438DEA39E44@mcs.anl.gov>
References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu>
	<8760yu4cyg.fsf@jedbrown.org>
	<8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu>
	<CAMYG4GkiCBwfv9siK7uXTf+H79ZoxSvPo1R082cVUQeB9XuNUg@mail.gmail.com>
	<906F4402-B8CD-46A5-8858-DE1517F29E46@ad.unc.edu>
	<4B163265-8984-454E-B88D-6343BDCDD5E4@email.unc.edu>
	<A580A7C8-A1B3-403A-A9BB-71E5957ACC79@mcs.anl.gov>
	<AB0D809B-A9BD-4C5A-B444-087892208AFB@live.unc.edu>
	<CE7C123F-DE24-45D6-9930-F2A733E26DC8@email.unc.edu>
	<C32D52A1-24DE-4DF7-B1B3-85CE22C7DF51@live.unc.edu>
	<258B10DE-29A8-4C89-95D1-9544748819C0@email.unc.edu>
	<EED1EBCB-85EF-40E7-B7EA-D924D4CC1B8F@mcs.anl.gov>
	<281383E4-00B6-4911-8184-D82E04F88EF3@ad.unc.edu>
	<AF77F729-B9BE-4614-98B3-4438DEA39E44@mcs.anl.gov>
Message-ID: <6F735708-A361-4084-A0B6-C7C7F0AB7B27@ad.unc.edu>


On Jan 16, 2016, at 4:00 PM, Barry Smith <bsmith at mcs.anl.gov<mailto:bsmith at mcs.anl.gov>> wrote:

If you are using the master branch of PETSc two users gave us a nifty new profiler that is "PETSc style" but shows the hierarchy of PETSc solvers time and flop etc.

I am using ?next? where Matt has pushed some code for multiplicative ASM (MSM). Is it available there too?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160117/358bcd60/attachment.html>

From knepley at gmail.com  Sat Jan 16 18:08:34 2016
From: knepley at gmail.com (Matthew Knepley)
Date: Sat, 16 Jan 2016 18:08:34 -0600
Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private()
In-Reply-To: <6F735708-A361-4084-A0B6-C7C7F0AB7B27@ad.unc.edu>
References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu>
	<8760yu4cyg.fsf@jedbrown.org>
	<8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu>
	<CAMYG4GkiCBwfv9siK7uXTf+H79ZoxSvPo1R082cVUQeB9XuNUg@mail.gmail.com>
	<906F4402-B8CD-46A5-8858-DE1517F29E46@ad.unc.edu>
	<4B163265-8984-454E-B88D-6343BDCDD5E4@email.unc.edu>
	<A580A7C8-A1B3-403A-A9BB-71E5957ACC79@mcs.anl.gov>
	<AB0D809B-A9BD-4C5A-B444-087892208AFB@live.unc.edu>
	<CE7C123F-DE24-45D6-9930-F2A733E26DC8@email.unc.edu>
	<C32D52A1-24DE-4DF7-B1B3-85CE22C7DF51@live.unc.edu>
	<258B10DE-29A8-4C89-95D1-9544748819C0@email.unc.edu>
	<EED1EBCB-85EF-40E7-B7EA-D924D4CC1B8F@mcs.anl.gov>
	<281383E4-00B6-4911-8184-D82E04F88EF3@ad.unc.edu>
	<AF77F729-B9BE-4614-98B3-4438DEA39E44@mcs.anl.gov>
	<6F735708-A361-4084-A0B6-C7C7F0AB7B27@ad.unc.edu>
Message-ID: <CAMYG4GkSz9fyTxT8pk-wEq2UAnJBnHQK+CT1iQA5OCrod7yaqA@mail.gmail.com>

On Sat, Jan 16, 2016 at 6:07 PM, Bhalla, Amneet Pal S <amneetb at live.unc.edu>
wrote:

>
>
> On Jan 16, 2016, at 4:00 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
> If you are using the master branch of PETSc two users gave us a nifty new
> profiler that is "PETSc style" but shows the hierarchy of PETSc solvers
> time and flop etc.
>
>
> I am using ?next? where Matt has pushed some code for multiplicative ASM
> (MSM). Is it available there too?
>

That should now be in 'master'.

  Thanks,

    Matt

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160116/e727495e/attachment.html>

From boyceg at email.unc.edu  Sat Jan 16 20:25:50 2016
From: boyceg at email.unc.edu (Griffith, Boyce Eugene)
Date: Sun, 17 Jan 2016 02:25:50 +0000
Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private()
In-Reply-To: <AF77F729-B9BE-4614-98B3-4438DEA39E44@mcs.anl.gov>
References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu>
	<8760yu4cyg.fsf@jedbrown.org>
	<8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu>
	<CAMYG4GkiCBwfv9siK7uXTf+H79ZoxSvPo1R082cVUQeB9XuNUg@mail.gmail.com>
	<906F4402-B8CD-46A5-8858-DE1517F29E46@ad.unc.edu>
	<4B163265-8984-454E-B88D-6343BDCDD5E4@email.unc.edu>
	<A580A7C8-A1B3-403A-A9BB-71E5957ACC79@mcs.anl.gov>
	<AB0D809B-A9BD-4C5A-B444-087892208AFB@live.unc.edu>
	<CE7C123F-DE24-45D6-9930-F2A733E26DC8@email.unc.edu>
	<C32D52A1-24DE-4DF7-B1B3-85CE22C7DF51@live.unc.edu>
	<258B10DE-29A8-4C89-95D1-9544748819C0@email.unc.edu>
	<EED1EBCB-85EF-40E7-B7EA-D924D4CC1B8F@mcs.anl.gov>
	<281383E4-00B6-4911-8184-D82E04F88EF3@ad.unc.edu>
	<AF77F729-B9BE-4614-98B3-4438DEA39E44@mcs.anl.gov>
Message-ID: <0ABFB383-1822-4E5C-8A28-680E39032A4F@email.unc.edu>


> On Jan 16, 2016, at 7:00 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> 
> 
>  Ok, I looked at your results in hpcviewer and don't see any surprises. The PETSc time is in the little LU factorizations, the LU solves and the matrix-vector products as it should be. Not much can be done on speeding these except running on machines with high memory bandwidth. 

Looks like LU factorizations are about 25% for this particular case.  Many of these little subsystems are going to be identical (many will correspond to constant coefficient Stokes), and it is fairly easy to figure out which are which.  How hard would it be to modify PCASM to allow for the specification of one or more "default" KSPs that can be used for specified blocks?

Of course, we'll also look into tweaking the subdomain solves --- it may not even be necessary to do exact subdomain solves to get reasonable MG performance.

-- Boyce

>   If you are using the master branch of PETSc two users gave us a nifty new profiler that is "PETSc style" but shows the hierarchy of PETSc solvers time and flop etc. You can run with -log_view :filename.xml:ascii_xml and then open the file with a browser (for example open -f Safari filename.xml) or email the file.
> 
>   Barry
> 
>> On Jan 16, 2016, at 5:09 PM, Bhalla, Amneet Pal S <amneetb at live.unc.edu> wrote:
>> 
>> 
>> 
>>> On Jan 16, 2016, at 1:13 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>> 
>>> Either way is fine so long as I don't have to install a ton of stuff; which it sounds like I won?t.
>> 
>> http://hpctoolkit.org/download/hpcviewer/
>> 
>> Unzip HPCViewer for MacOSX with command line and drag the unzipped folder to Applications. You will be able to 
>> fire HPCViewer from LaunchPad. Point it to this attached directory. You will be able to see three different kind of profiling
>> under Calling Context View, Callers View and Flat View.   
>> 
>> 
>> 
>> <hpctoolkit-main2d-database.zip>
>> 
> 


From bsmith at mcs.anl.gov  Sat Jan 16 21:46:45 2016
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Sat, 16 Jan 2016 21:46:45 -0600
Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private()
In-Reply-To: <0ABFB383-1822-4E5C-8A28-680E39032A4F@email.unc.edu>
References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu>
	<8760yu4cyg.fsf@jedbrown.org>
	<8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu>
	<CAMYG4GkiCBwfv9siK7uXTf+H79ZoxSvPo1R082cVUQeB9XuNUg@mail.gmail.com>
	<906F4402-B8CD-46A5-8858-DE1517F29E46@ad.unc.edu>
	<4B163265-8984-454E-B88D-6343BDCDD5E4@email.unc.edu>
	<A580A7C8-A1B3-403A-A9BB-71E5957ACC79@mcs.anl.gov>
	<AB0D809B-A9BD-4C5A-B444-087892208AFB@live.unc.edu>
	<CE7C123F-DE24-45D6-9930-F2A733E26DC8@email.unc.edu>
	<C32D52A1-24DE-4DF7-B1B3-85CE22C7DF51@live.unc.edu>
	<258B10DE-29A8-4C89-95D1-9544748819C0@email.unc.edu>
	<EED1EBCB-85EF-40E7-B7EA-D924D4CC1B8F@mcs.anl.gov>
	<281383E4-00B6-4911-8184-D82E04F88EF3@ad.unc.edu>
	<AF77F729-B9BE-4614-98B3-4438DEA39E44@mcs.anl.gov>
	<0ABFB383-1822-4E5C-8A28-680E39032A4F@email.unc.edu>
Message-ID: <F483606D-F534-483F-845E-8C436D8E3B79@mcs.anl.gov>


  Boyce,

   Of course anything is possible in software. But I expect an optimization to not rebuild common submatrices/factorization requires a custom PCSetUp_ASM() rather than some PETSc option that we could add (especially if you are using Matt's PC_COMPOSITE_MULTIPLICATIVE).

   I would start by copying PCSetUp_ASM(), stripping out all the setup stuff that doesn't relate to your code and then mark identical domains so you don't need to call MatGetSubMatrices() on those domains and don't create a new KSP for each one of those subdomains (but reuses a common one). The PCApply_ASM() should be hopefully be reusable so long as you have created the full array of KSP objects (some of which will be common). If you increase the reference counts of the common KSP in 
PCSetUp_ASM() (and maybe the common sub matrices) then the PCDestroy_ASM() should also work unchanged

Good luck,

  Barry

> On Jan 16, 2016, at 8:25 PM, Griffith, Boyce Eugene <boyceg at email.unc.edu> wrote:
> 
> 
>> On Jan 16, 2016, at 7:00 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>> 
>> 
>> Ok, I looked at your results in hpcviewer and don't see any surprises. The PETSc time is in the little LU factorizations, the LU solves and the matrix-vector products as it should be. Not much can be done on speeding these except running on machines with high memory bandwidth. 
> 
> Looks like LU factorizations are about 25% for this particular case.  Many of these little subsystems are going to be identical (many will correspond to constant coefficient Stokes), and it is fairly easy to figure out which are which.  How hard would it be to modify PCASM to allow for the specification of one or more "default" KSPs that can be used for specified blocks?
> 
> Of course, we'll also look into tweaking the subdomain solves --- it may not even be necessary to do exact subdomain solves to get reasonable MG performance.
> 
> -- Boyce
> 
>>  If you are using the master branch of PETSc two users gave us a nifty new profiler that is "PETSc style" but shows the hierarchy of PETSc solvers time and flop etc. You can run with -log_view :filename.xml:ascii_xml and then open the file with a browser (for example open -f Safari filename.xml) or email the file.
>> 
>>  Barry
>> 
>>> On Jan 16, 2016, at 5:09 PM, Bhalla, Amneet Pal S <amneetb at live.unc.edu> wrote:
>>> 
>>> 
>>> 
>>>> On Jan 16, 2016, at 1:13 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>> 
>>>> Either way is fine so long as I don't have to install a ton of stuff; which it sounds like I won?t.
>>> 
>>> http://hpctoolkit.org/download/hpcviewer/
>>> 
>>> Unzip HPCViewer for MacOSX with command line and drag the unzipped folder to Applications. You will be able to 
>>> fire HPCViewer from LaunchPad. Point it to this attached directory. You will be able to see three different kind of profiling
>>> under Calling Context View, Callers View and Flat View.   
>>> 
>>> 
>>> 
>>> <hpctoolkit-main2d-database.zip>
>>> 
>> 
> 


From boyceg at email.unc.edu  Sun Jan 17 10:13:15 2016
From: boyceg at email.unc.edu (Griffith, Boyce Eugene)
Date: Sun, 17 Jan 2016 16:13:15 +0000
Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private()
In-Reply-To: <F483606D-F534-483F-845E-8C436D8E3B79@mcs.anl.gov>
References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu>
	<8760yu4cyg.fsf@jedbrown.org>
	<8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu>
	<CAMYG4GkiCBwfv9siK7uXTf+H79ZoxSvPo1R082cVUQeB9XuNUg@mail.gmail.com>
	<906F4402-B8CD-46A5-8858-DE1517F29E46@ad.unc.edu>
	<4B163265-8984-454E-B88D-6343BDCDD5E4@email.unc.edu>
	<A580A7C8-A1B3-403A-A9BB-71E5957ACC79@mcs.anl.gov>
	<AB0D809B-A9BD-4C5A-B444-087892208AFB@live.unc.edu>
	<CE7C123F-DE24-45D6-9930-F2A733E26DC8@email.unc.edu>
	<C32D52A1-24DE-4DF7-B1B3-85CE22C7DF51@live.unc.edu>
	<258B10DE-29A8-4C89-95D1-9544748819C0@email.unc.edu>
	<EED1EBCB-85EF-40E7-B7EA-D924D4CC1B8F@mcs.anl.gov>
	<281383E4-00B6-4911-8184-D82E04F88EF3@ad.unc.edu>
	<AF77F729-B9BE-4614-98B3-4438DEA39E44@mcs.anl.gov>
	<0ABFB383-1822-4E5C-8A28-680E39032A4F@email.unc.edu>
	<F483606D-F534-483F-845E-8C436D8E3B79@mcs.anl.gov>
Message-ID: <1A4122A1-E0D9-4E81-8636-D3C4163298A5@email.unc.edu>

Barry --

Another random thought --- are these smallish direct solves things that make sense to (try to) offload to a GPU?

Thanks,

-- Boyce

> On Jan 16, 2016, at 10:46 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> 
> 
>  Boyce,
> 
>   Of course anything is possible in software. But I expect an optimization to not rebuild common submatrices/factorization requires a custom PCSetUp_ASM() rather than some PETSc option that we could add (especially if you are using Matt's PC_COMPOSITE_MULTIPLICATIVE).
> 
>   I would start by copying PCSetUp_ASM(), stripping out all the setup stuff that doesn't relate to your code and then mark identical domains so you don't need to call MatGetSubMatrices() on those domains and don't create a new KSP for each one of those subdomains (but reuses a common one). The PCApply_ASM() should be hopefully be reusable so long as you have created the full array of KSP objects (some of which will be common). If you increase the reference counts of the common KSP in 
> PCSetUp_ASM() (and maybe the common sub matrices) then the PCDestroy_ASM() should also work unchanged
> 
> Good luck,
> 
>  Barry
> 
>> On Jan 16, 2016, at 8:25 PM, Griffith, Boyce Eugene <boyceg at email.unc.edu> wrote:
>> 
>> 
>>> On Jan 16, 2016, at 7:00 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>> 
>>> 
>>> Ok, I looked at your results in hpcviewer and don't see any surprises. The PETSc time is in the little LU factorizations, the LU solves and the matrix-vector products as it should be. Not much can be done on speeding these except running on machines with high memory bandwidth. 
>> 
>> Looks like LU factorizations are about 25% for this particular case.  Many of these little subsystems are going to be identical (many will correspond to constant coefficient Stokes), and it is fairly easy to figure out which are which.  How hard would it be to modify PCASM to allow for the specification of one or more "default" KSPs that can be used for specified blocks?
>> 
>> Of course, we'll also look into tweaking the subdomain solves --- it may not even be necessary to do exact subdomain solves to get reasonable MG performance.
>> 
>> -- Boyce
>> 
>>> If you are using the master branch of PETSc two users gave us a nifty new profiler that is "PETSc style" but shows the hierarchy of PETSc solvers time and flop etc. You can run with -log_view :filename.xml:ascii_xml and then open the file with a browser (for example open -f Safari filename.xml) or email the file.
>>> 
>>> Barry
>>> 
>>>> On Jan 16, 2016, at 5:09 PM, Bhalla, Amneet Pal S <amneetb at live.unc.edu> wrote:
>>>> 
>>>> 
>>>> 
>>>>> On Jan 16, 2016, at 1:13 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>>> 
>>>>> Either way is fine so long as I don't have to install a ton of stuff; which it sounds like I won?t.
>>>> 
>>>> http://hpctoolkit.org/download/hpcviewer/
>>>> 
>>>> Unzip HPCViewer for MacOSX with command line and drag the unzipped folder to Applications. You will be able to 
>>>> fire HPCViewer from LaunchPad. Point it to this attached directory. You will be able to see three different kind of profiling
>>>> under Calling Context View, Callers View and Flat View.   
>>>> 
>>>> 
>>>> 
>>>> <hpctoolkit-main2d-database.zip>
>>>> 
>>> 
>> 
> 


From knepley at gmail.com  Sun Jan 17 12:17:21 2016
From: knepley at gmail.com (Matthew Knepley)
Date: Sun, 17 Jan 2016 12:17:21 -0600
Subject: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private()
In-Reply-To: <1A4122A1-E0D9-4E81-8636-D3C4163298A5@email.unc.edu>
References: <93AA40B8-C0B4-46E6-B258-08EEA7DAE30C@ad.unc.edu>
	<8760yu4cyg.fsf@jedbrown.org>
	<8144B3B0-7761-4147-99BD-7C84230C733C@ad.unc.edu>
	<CAMYG4GkiCBwfv9siK7uXTf+H79ZoxSvPo1R082cVUQeB9XuNUg@mail.gmail.com>
	<906F4402-B8CD-46A5-8858-DE1517F29E46@ad.unc.edu>
	<4B163265-8984-454E-B88D-6343BDCDD5E4@email.unc.edu>
	<A580A7C8-A1B3-403A-A9BB-71E5957ACC79@mcs.anl.gov>
	<AB0D809B-A9BD-4C5A-B444-087892208AFB@live.unc.edu>
	<CE7C123F-DE24-45D6-9930-F2A733E26DC8@email.unc.edu>
	<C32D52A1-24DE-4DF7-B1B3-85CE22C7DF51@live.unc.edu>
	<258B10DE-29A8-4C89-95D1-9544748819C0@email.unc.edu>
	<EED1EBCB-85EF-40E7-B7EA-D924D4CC1B8F@mcs.anl.gov>
	<281383E4-00B6-4911-8184-D82E04F88EF3@ad.unc.edu>
	<AF77F729-B9BE-4614-98B3-4438DEA39E44@mcs.anl.gov>
	<0ABFB383-1822-4E5C-8A28-680E39032A4F@email.unc.edu>
	<F483606D-F534-483F-845E-8C436D8E3B79@mcs.anl.gov>
	<1A4122A1-E0D9-4E81-8636-D3C4163298A5@email.unc.edu>
Message-ID: <CAMYG4GksF8A1kif3htMv=SpMkwKfwnNj3p5T+tcK_DTg29JrMg@mail.gmail.com>

On Sun, Jan 17, 2016 at 10:13 AM, Griffith, Boyce Eugene <
boyceg at email.unc.edu> wrote:

> Barry --
>
> Another random thought --- are these smallish direct solves things that
> make sense to (try to) offload to a GPU?
>

Possibly, but the only clear-cut wins are for BLAS3, so we would need to
stack up the identical solves.

  Matt


> Thanks,
>
> -- Boyce
>
> > On Jan 16, 2016, at 10:46 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> >
> >  Boyce,
> >
> >   Of course anything is possible in software. But I expect an
> optimization to not rebuild common submatrices/factorization requires a
> custom PCSetUp_ASM() rather than some PETSc option that we could add
> (especially if you are using Matt's PC_COMPOSITE_MULTIPLICATIVE).
> >
> >   I would start by copying PCSetUp_ASM(), stripping out all the setup
> stuff that doesn't relate to your code and then mark identical domains so
> you don't need to call MatGetSubMatrices() on those domains and don't
> create a new KSP for each one of those subdomains (but reuses a common
> one). The PCApply_ASM() should be hopefully be reusable so long as you have
> created the full array of KSP objects (some of which will be common). If
> you increase the reference counts of the common KSP in
> > PCSetUp_ASM() (and maybe the common sub matrices) then the
> PCDestroy_ASM() should also work unchanged
> >
> > Good luck,
> >
> >  Barry
> >
> >> On Jan 16, 2016, at 8:25 PM, Griffith, Boyce Eugene <
> boyceg at email.unc.edu> wrote:
> >>
> >>
> >>> On Jan 16, 2016, at 7:00 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >>>
> >>>
> >>> Ok, I looked at your results in hpcviewer and don't see any surprises.
> The PETSc time is in the little LU factorizations, the LU solves and the
> matrix-vector products as it should be. Not much can be done on speeding
> these except running on machines with high memory bandwidth.
> >>
> >> Looks like LU factorizations are about 25% for this particular case.
> Many of these little subsystems are going to be identical (many will
> correspond to constant coefficient Stokes), and it is fairly easy to figure
> out which are which.  How hard would it be to modify PCASM to allow for the
> specification of one or more "default" KSPs that can be used for specified
> blocks?
> >>
> >> Of course, we'll also look into tweaking the subdomain solves --- it
> may not even be necessary to do exact subdomain solves to get reasonable MG
> performance.
> >>
> >> -- Boyce
> >>
> >>> If you are using the master branch of PETSc two users gave us a nifty
> new profiler that is "PETSc style" but shows the hierarchy of PETSc solvers
> time and flop etc. You can run with -log_view :filename.xml:ascii_xml and
> then open the file with a browser (for example open -f Safari filename.xml)
> or email the file.
> >>>
> >>> Barry
> >>>
> >>>> On Jan 16, 2016, at 5:09 PM, Bhalla, Amneet Pal S <
> amneetb at live.unc.edu> wrote:
> >>>>
> >>>>
> >>>>
> >>>>> On Jan 16, 2016, at 1:13 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >>>>>
> >>>>> Either way is fine so long as I don't have to install a ton of
> stuff; which it sounds like I won?t.
> >>>>
> >>>> http://hpctoolkit.org/download/hpcviewer/
> >>>>
> >>>> Unzip HPCViewer for MacOSX with command line and drag the unzipped
> folder to Applications. You will be able to
> >>>> fire HPCViewer from LaunchPad. Point it to this attached directory.
> You will be able to see three different kind of profiling
> >>>> under Calling Context View, Callers View and Flat View.
> >>>>
> >>>>
> >>>>
> >>>> <hpctoolkit-main2d-database.zip>
> >>>>
> >>>
> >>
> >
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160117/36c8b564/attachment.html>

From hgbk2008 at gmail.com  Mon Jan 18 08:29:30 2016
From: hgbk2008 at gmail.com (Hoang Giang Bui)
Date: Mon, 18 Jan 2016 15:29:30 +0100
Subject: [petsc-users] Why use MATMPIBAIJ?
In-Reply-To: <B1B62B13-BE6F-4B51-BC1A-F45FEF379854@mcs.anl.gov>
References: <CAP2=TMiuZV+DDPk6dqPjZ5Ndy8W5C5GaEreny7_FQuZ39i1gBw@mail.gmail.com>
	<2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov>
	<CAP2=TMhZeZ4JZ3AKARajab8CmFwdANnh0HOHqDDQRd8JfGkO3Q@mail.gmail.com>
	<DAA1C465-680F-4A5E-BA8F-9C6F7C055DAD@mcs.anl.gov>
	<CAP2=TMjwihEL1V7NPNByiMWfWfC5Q0MJyE1ByxnLokx1Df+OJA@mail.gmail.com>
	<CAJW_hKeU9MO8v+ub+5CjnaH3OpBkQNtGA8ZEiDtbYoQnSQ5rGQ@mail.gmail.com>
	<87fuy07zvi.fsf@jedbrown.org>
	<B1B62B13-BE6F-4B51-BC1A-F45FEF379854@mcs.anl.gov>
Message-ID: <CAJW_hKcyTTCK1ErPFeu4hbeW+CD=PQJd_=H3xNXdO7TzhvfQOw@mail.gmail.com>

On Thu, Jan 14, 2016 at 8:08 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
> > On Jan 14, 2016, at 12:57 PM, Jed Brown <jed at jedbrown.org> wrote:
> >
> > Hoang Giang Bui <hgbk2008 at gmail.com> writes:
> >> One more question I like to ask, which is more on the performance of the
> >> solver. That if I have a coupled problem, says the point block is [u_x
> u_y
> >> u_z p] in which entries of p block in stiffness matrix is in a much
> smaller
> >> scale than u (p~1e-6, u~1e+8), then AMG with hypre in PETSc still scale?
> >
> > You should scale the model (as Barry says).  But the names of your
> > variables suggest that the system is a saddle point problem, in which
> > case there's a good chance AMG won't work at all.  For example,
> > BoomerAMG produces a singular preconditioner in similar contexts, such
> > that the preconditioned residual drops smoothly while the true residual
> > stagnates (the equations are not solved at all).  So be vary careful if
> > you think it's "working".
>
>

Using block size 4 with the scaling, the hypre AMG does not converge. So
it's somehow right.


>    The PCFIEDSPLIT preconditioner is designed for helping to solve saddle
> point problems.
>
>
>

Does PCFIELDSPLIT support variable block size? For example using P2/P1
discretization, the number of nodes carrying [u_x u_y u_z] is different
with number of nodes carrying p. PCFieldSplitSetBlockSize would not be
correct in this case.

Giang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160118/b3bb9731/attachment.html>

From knepley at gmail.com  Mon Jan 18 08:58:10 2016
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 18 Jan 2016 08:58:10 -0600
Subject: [petsc-users] Why use MATMPIBAIJ?
In-Reply-To: <CAJW_hKcyTTCK1ErPFeu4hbeW+CD=PQJd_=H3xNXdO7TzhvfQOw@mail.gmail.com>
References: <CAP2=TMiuZV+DDPk6dqPjZ5Ndy8W5C5GaEreny7_FQuZ39i1gBw@mail.gmail.com>
	<2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov>
	<CAP2=TMhZeZ4JZ3AKARajab8CmFwdANnh0HOHqDDQRd8JfGkO3Q@mail.gmail.com>
	<DAA1C465-680F-4A5E-BA8F-9C6F7C055DAD@mcs.anl.gov>
	<CAP2=TMjwihEL1V7NPNByiMWfWfC5Q0MJyE1ByxnLokx1Df+OJA@mail.gmail.com>
	<CAJW_hKeU9MO8v+ub+5CjnaH3OpBkQNtGA8ZEiDtbYoQnSQ5rGQ@mail.gmail.com>
	<87fuy07zvi.fsf@jedbrown.org>
	<B1B62B13-BE6F-4B51-BC1A-F45FEF379854@mcs.anl.gov>
	<CAJW_hKcyTTCK1ErPFeu4hbeW+CD=PQJd_=H3xNXdO7TzhvfQOw@mail.gmail.com>
Message-ID: <CAMYG4GkedkT9weo7UCUpTXP_f0jRBUYoySKXP5=q+os2Hpm=+Q@mail.gmail.com>

On Mon, Jan 18, 2016 at 8:29 AM, Hoang Giang Bui <hgbk2008 at gmail.com> wrote:

>
>
> On Thu, Jan 14, 2016 at 8:08 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
>>
>> > On Jan 14, 2016, at 12:57 PM, Jed Brown <jed at jedbrown.org> wrote:
>> >
>> > Hoang Giang Bui <hgbk2008 at gmail.com> writes:
>> >> One more question I like to ask, which is more on the performance of
>> the
>> >> solver. That if I have a coupled problem, says the point block is [u_x
>> u_y
>> >> u_z p] in which entries of p block in stiffness matrix is in a much
>> smaller
>> >> scale than u (p~1e-6, u~1e+8), then AMG with hypre in PETSc still
>> scale?
>> >
>> > You should scale the model (as Barry says).  But the names of your
>> > variables suggest that the system is a saddle point problem, in which
>> > case there's a good chance AMG won't work at all.  For example,
>> > BoomerAMG produces a singular preconditioner in similar contexts, such
>> > that the preconditioned residual drops smoothly while the true residual
>> > stagnates (the equations are not solved at all).  So be vary careful if
>> > you think it's "working".
>>
>>
>
> Using block size 4 with the scaling, the hypre AMG does not converge. So
> it's somehow right.
>
>
>
>>    The PCFIEDSPLIT preconditioner is designed for helping to solve saddle
>> point problems.
>>
>>
>>
>
> Does PCFIELDSPLIT support variable block size? For example using P2/P1
> discretization, the number of nodes carrying [u_x u_y u_z] is different
> with number of nodes carrying p. PCFieldSplitSetBlockSize would not be
> correct in this case.
>

You misunderstand the blocking. You would put ALL velocities (P2) in one
block and ALL pressure (P1) in another.
The PCFieldSplitSetBlockSize() call is for co-located discretizations,
which P2/P2 is not.

   Matt


> Giang
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160118/3d8640ff/attachment.html>

From hgbk2008 at gmail.com  Mon Jan 18 09:42:00 2016
From: hgbk2008 at gmail.com (Hoang Giang Bui)
Date: Mon, 18 Jan 2016 16:42:00 +0100
Subject: [petsc-users] Why use MATMPIBAIJ?
In-Reply-To: <CAMYG4GkedkT9weo7UCUpTXP_f0jRBUYoySKXP5=q+os2Hpm=+Q@mail.gmail.com>
References: <CAP2=TMiuZV+DDPk6dqPjZ5Ndy8W5C5GaEreny7_FQuZ39i1gBw@mail.gmail.com>
	<2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov>
	<CAP2=TMhZeZ4JZ3AKARajab8CmFwdANnh0HOHqDDQRd8JfGkO3Q@mail.gmail.com>
	<DAA1C465-680F-4A5E-BA8F-9C6F7C055DAD@mcs.anl.gov>
	<CAP2=TMjwihEL1V7NPNByiMWfWfC5Q0MJyE1ByxnLokx1Df+OJA@mail.gmail.com>
	<CAJW_hKeU9MO8v+ub+5CjnaH3OpBkQNtGA8ZEiDtbYoQnSQ5rGQ@mail.gmail.com>
	<87fuy07zvi.fsf@jedbrown.org>
	<B1B62B13-BE6F-4B51-BC1A-F45FEF379854@mcs.anl.gov>
	<CAJW_hKcyTTCK1ErPFeu4hbeW+CD=PQJd_=H3xNXdO7TzhvfQOw@mail.gmail.com>
	<CAMYG4GkedkT9weo7UCUpTXP_f0jRBUYoySKXP5=q+os2Hpm=+Q@mail.gmail.com>
Message-ID: <CAJW_hKfFoFSiwZ7kzCcXqFm3kkhLa4PBxTvb3NHHFWNLFqLyqA@mail.gmail.com>

Why P2/P2 is not for co-located discretization? However, it's not my
question. The P2/P1 which I used generate variable block size at each node.
That was fine if I used PCFieldSplitSetIS for each components,
displacements and pressures. But how to set the block size (3) for
displacement block?

Giang

On Mon, Jan 18, 2016 at 3:58 PM, Matthew Knepley <knepley at gmail.com> wrote:

> On Mon, Jan 18, 2016 at 8:29 AM, Hoang Giang Bui <hgbk2008 at gmail.com>
> wrote:
>
>>
>>
>> On Thu, Jan 14, 2016 at 8:08 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>
>>>
>>> > On Jan 14, 2016, at 12:57 PM, Jed Brown <jed at jedbrown.org> wrote:
>>> >
>>> > Hoang Giang Bui <hgbk2008 at gmail.com> writes:
>>> >> One more question I like to ask, which is more on the performance of
>>> the
>>> >> solver. That if I have a coupled problem, says the point block is
>>> [u_x u_y
>>> >> u_z p] in which entries of p block in stiffness matrix is in a much
>>> smaller
>>> >> scale than u (p~1e-6, u~1e+8), then AMG with hypre in PETSc still
>>> scale?
>>> >
>>> > You should scale the model (as Barry says).  But the names of your
>>> > variables suggest that the system is a saddle point problem, in which
>>> > case there's a good chance AMG won't work at all.  For example,
>>> > BoomerAMG produces a singular preconditioner in similar contexts, such
>>> > that the preconditioned residual drops smoothly while the true residual
>>> > stagnates (the equations are not solved at all).  So be vary careful if
>>> > you think it's "working".
>>>
>>>
>>
>> Using block size 4 with the scaling, the hypre AMG does not converge. So
>> it's somehow right.
>>
>>
>>
>>>    The PCFIEDSPLIT preconditioner is designed for helping to solve
>>> saddle point problems.
>>>
>>>
>>>
>>
>> Does PCFIELDSPLIT support variable block size? For example using P2/P1
>> discretization, the number of nodes carrying [u_x u_y u_z] is different
>> with number of nodes carrying p. PCFieldSplitSetBlockSize would not be
>> correct in this case.
>>
>
> You misunderstand the blocking. You would put ALL velocities (P2) in one
> block and ALL pressure (P1) in another.
> The PCFieldSplitSetBlockSize() call is for co-located discretizations,
> which P2/P2 is not.
>
>    Matt
>
>
>> Giang
>>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160118/640b427c/attachment-0001.html>

From knepley at gmail.com  Mon Jan 18 09:54:56 2016
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 18 Jan 2016 09:54:56 -0600
Subject: [petsc-users] Why use MATMPIBAIJ?
In-Reply-To: <CAJW_hKfFoFSiwZ7kzCcXqFm3kkhLa4PBxTvb3NHHFWNLFqLyqA@mail.gmail.com>
References: <CAP2=TMiuZV+DDPk6dqPjZ5Ndy8W5C5GaEreny7_FQuZ39i1gBw@mail.gmail.com>
	<2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov>
	<CAP2=TMhZeZ4JZ3AKARajab8CmFwdANnh0HOHqDDQRd8JfGkO3Q@mail.gmail.com>
	<DAA1C465-680F-4A5E-BA8F-9C6F7C055DAD@mcs.anl.gov>
	<CAP2=TMjwihEL1V7NPNByiMWfWfC5Q0MJyE1ByxnLokx1Df+OJA@mail.gmail.com>
	<CAJW_hKeU9MO8v+ub+5CjnaH3OpBkQNtGA8ZEiDtbYoQnSQ5rGQ@mail.gmail.com>
	<87fuy07zvi.fsf@jedbrown.org>
	<B1B62B13-BE6F-4B51-BC1A-F45FEF379854@mcs.anl.gov>
	<CAJW_hKcyTTCK1ErPFeu4hbeW+CD=PQJd_=H3xNXdO7TzhvfQOw@mail.gmail.com>
	<CAMYG4GkedkT9weo7UCUpTXP_f0jRBUYoySKXP5=q+os2Hpm=+Q@mail.gmail.com>
	<CAJW_hKfFoFSiwZ7kzCcXqFm3kkhLa4PBxTvb3NHHFWNLFqLyqA@mail.gmail.com>
Message-ID: <CAMYG4GnpMdV3LooTx6ydOO9bDOpu-E1F0cSj-uRjyBrMbiXytA@mail.gmail.com>

On Mon, Jan 18, 2016 at 9:42 AM, Hoang Giang Bui <hgbk2008 at gmail.com> wrote:

> Why P2/P2 is not for co-located discretization? However, it's not my
> question. The P2/P1 which I used generate variable block size at each node.
> That was fine if I used PCFieldSplitSetIS for each components,
> displacements and pressures. But how to set the block size (3) for
> displacement block?
>

P2/P1 does not generate block matrices, and is not col-located, because the
variables are located at different sets of nodes.

You can use PCFieldSplitSetIS() to specify the splits. This is the right
method for P2/P1.

Setting the block size for the P2 block is not crucial. When its working we
can do that.

  Matt


> Giang
>
> On Mon, Jan 18, 2016 at 3:58 PM, Matthew Knepley <knepley at gmail.com>
> wrote:
>
>> On Mon, Jan 18, 2016 at 8:29 AM, Hoang Giang Bui <hgbk2008 at gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Thu, Jan 14, 2016 at 8:08 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>
>>>>
>>>> > On Jan 14, 2016, at 12:57 PM, Jed Brown <jed at jedbrown.org> wrote:
>>>> >
>>>> > Hoang Giang Bui <hgbk2008 at gmail.com> writes:
>>>> >> One more question I like to ask, which is more on the performance of
>>>> the
>>>> >> solver. That if I have a coupled problem, says the point block is
>>>> [u_x u_y
>>>> >> u_z p] in which entries of p block in stiffness matrix is in a much
>>>> smaller
>>>> >> scale than u (p~1e-6, u~1e+8), then AMG with hypre in PETSc still
>>>> scale?
>>>> >
>>>> > You should scale the model (as Barry says).  But the names of your
>>>> > variables suggest that the system is a saddle point problem, in which
>>>> > case there's a good chance AMG won't work at all.  For example,
>>>> > BoomerAMG produces a singular preconditioner in similar contexts, such
>>>> > that the preconditioned residual drops smoothly while the true
>>>> residual
>>>> > stagnates (the equations are not solved at all).  So be vary careful
>>>> if
>>>> > you think it's "working".
>>>>
>>>>
>>>
>>> Using block size 4 with the scaling, the hypre AMG does not converge. So
>>> it's somehow right.
>>>
>>>
>>>
>>>>    The PCFIEDSPLIT preconditioner is designed for helping to solve
>>>> saddle point problems.
>>>>
>>>>
>>>>
>>>
>>> Does PCFIELDSPLIT support variable block size? For example using P2/P1
>>> discretization, the number of nodes carrying [u_x u_y u_z] is different
>>> with number of nodes carrying p. PCFieldSplitSetBlockSize would not be
>>> correct in this case.
>>>
>>
>> You misunderstand the blocking. You would put ALL velocities (P2) in one
>> block and ALL pressure (P1) in another.
>> The PCFieldSplitSetBlockSize() call is for co-located discretizations,
>> which P2/P2 is not.
>>
>>    Matt
>>
>>
>>> Giang
>>>
>>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160118/1167d86b/attachment.html>

From jed at jedbrown.org  Mon Jan 18 10:25:42 2016
From: jed at jedbrown.org (Jed Brown)
Date: Mon, 18 Jan 2016 09:25:42 -0700
Subject: [petsc-users] Why use MATMPIBAIJ?
In-Reply-To: <CAJW_hKfFoFSiwZ7kzCcXqFm3kkhLa4PBxTvb3NHHFWNLFqLyqA@mail.gmail.com>
References: <CAP2=TMiuZV+DDPk6dqPjZ5Ndy8W5C5GaEreny7_FQuZ39i1gBw@mail.gmail.com>
	<2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov>
	<CAP2=TMhZeZ4JZ3AKARajab8CmFwdANnh0HOHqDDQRd8JfGkO3Q@mail.gmail.com>
	<DAA1C465-680F-4A5E-BA8F-9C6F7C055DAD@mcs.anl.gov>
	<CAP2=TMjwihEL1V7NPNByiMWfWfC5Q0MJyE1ByxnLokx1Df+OJA@mail.gmail.com>
	<CAJW_hKeU9MO8v+ub+5CjnaH3OpBkQNtGA8ZEiDtbYoQnSQ5rGQ@mail.gmail.com>
	<87fuy07zvi.fsf@jedbrown.org>
	<B1B62B13-BE6F-4B51-BC1A-F45FEF379854@mcs.anl.gov>
	<CAJW_hKcyTTCK1ErPFeu4hbeW+CD=PQJd_=H3xNXdO7TzhvfQOw@mail.gmail.com>
	<CAMYG4GkedkT9weo7UCUpTXP_f0jRBUYoySKXP5=q+os2Hpm=+Q@mail.gmail.com>
	<CAJW_hKfFoFSiwZ7kzCcXqFm3kkhLa4PBxTvb3NHHFWNLFqLyqA@mail.gmail.com>
Message-ID: <87si1ug8hl.fsf@jedbrown.org>

Hoang Giang Bui <hgbk2008 at gmail.com> writes:

> Why P2/P2 is not for co-located discretization? 

Matt typed "P2/P2" when me meant "P2/P1".
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 818 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160118/7b83f5eb/attachment.pgp>

From tabrezali at gmail.com  Tue Jan 19 17:07:12 2016
From: tabrezali at gmail.com (Tabrez Ali)
Date: Tue, 19 Jan 2016 17:07:12 -0600
Subject: [petsc-users] external packages
Message-ID: <569EC1A0.3020603@gmail.com>

Hello

W.r.t. to external packages, does "--download-xyz=yes" implicitly means 
"--with-xyz=1"

Also, is "--download-xyz=yes" exactly same as "--download-xyz"

Regards,

Tabrez

From jed at jedbrown.org  Tue Jan 19 17:10:56 2016
From: jed at jedbrown.org (Jed Brown)
Date: Tue, 19 Jan 2016 16:10:56 -0700
Subject: [petsc-users] external packages
In-Reply-To: <569EC1A0.3020603@gmail.com>
References: <569EC1A0.3020603@gmail.com>
Message-ID: <8737ttdv27.fsf@jedbrown.org>

Tabrez Ali <tabrezali at gmail.com> writes:

> Hello
>
> W.r.t. to external packages, does "--download-xyz=yes" implicitly means 
> "--with-xyz=1"

Yes.

> Also, is "--download-xyz=yes" exactly same as "--download-xyz"

Yes, though --download-xyz=/path/to/xyz.tar.gz has semantic meaning.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 818 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160119/f2a9a9c5/attachment.pgp>

From salazardetro1 at llnl.gov  Wed Jan 20 11:35:29 2016
From: salazardetro1 at llnl.gov (Salazar De Troya, Miguel)
Date: Wed, 20 Jan 2016 17:35:29 +0000
Subject: [petsc-users] Preconditioners for KSPSolveTranspose() in linear
	elasticity
Message-ID: <D2C50552.DF4%salazardetro1@llnl.gov>

Hello

I am trying to speed up a two dimensional linear elasticity problem with isotropic and heterogeneous properties. It is a topology optimization problem, therefore some regions have an almost zero stiffness whereas other regions have a higher value, making the matrix ill-conditioned. So far, from having searched mail lists on similar problems, I have come up with the following CL options to pass to the petsc solver (two dimensional problem):

-ksp_type cg -pc_type fieldsplit -pc_fieldsplit_block_size 2 -fieldsplit_pc_type hypre -fieldsplit_pc_hypre_type boomeramg -fieldsplit_pc_hypre_boomeramg_strong_threshold 0.7 -pc_fieldsplit_0 0,1 -pc_fieldsplit_type symmetric_multiplicative -ksp_atol 1e-10

It works reasonably well and shows similar number of iterations for different levels of refinement. However, it does not converge when I use the same options for KSPSolveTranspose(). I obtain DIVERGED_INDEFINITE_PC after three iterations. I believe this has to do with the field split, but I do not where to start. I am using libMesh which interfaces with petsc through the file petsc_linear solver.C (http://libmesh.github.io/doxygen/classlibMesh_1_1PetscLinearSolver.html#a4e66cc138b52e80e93a75e55315245ee) The KSPSolveTranspose() is called in adjoint_solve(). Changing that to KSPSolve() solves the issue and to me it is not a problem because my matrix is symmetric, but I don?t want to have to change it in the libMesh source code. So the question is, why do those CL options not work for the KSPSolveTranspose() despite having a symmetric matrix?

Thanks


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160120/eb87a5af/attachment.html>

From knepley at gmail.com  Wed Jan 20 11:47:59 2016
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 20 Jan 2016 11:47:59 -0600
Subject: [petsc-users] Preconditioners for KSPSolveTranspose() in linear
	elasticity
In-Reply-To: <D2C50552.DF4%salazardetro1@llnl.gov>
References: <D2C50552.DF4%salazardetro1@llnl.gov>
Message-ID: <CAMYG4Gk4LuhxV6nJtnk=EhY4=f-bf=6TKef61WiqYFKe7VZ+sA@mail.gmail.com>

On Wed, Jan 20, 2016 at 11:35 AM, Salazar De Troya, Miguel <
salazardetro1 at llnl.gov> wrote:

> Hello
>
> I am trying to speed up a two dimensional linear elasticity problem with
> isotropic and heterogeneous properties. It is a topology optimization
> problem, therefore some regions have an almost zero stiffness whereas other
> regions have a higher value, making the matrix ill-conditioned. So far,
> from having searched mail lists on similar problems, I have come up with
> the following CL options to pass to the petsc solver (two dimensional
> problem):
>
> -ksp_type cg -pc_type fieldsplit -pc_fieldsplit_block_size 2
> -fieldsplit_pc_type hypre -fieldsplit_pc_hypre_type boomeramg
> -fieldsplit_pc_hypre_boomeramg_strong_threshold 0.7 -pc_fieldsplit_0 0,1
> -pc_fieldsplit_type symmetric_multiplicative -ksp_atol 1e-10
>
> It works reasonably well and shows similar number of iterations for
> different levels of refinement. However, it does not converge when I use
> the same options for KSPSolveTranspose(). I obtain DIVERGED_INDEFINITE_PC
> after three iterations. I believe this has to do with the field split,
> but I do not where to start. I am using libMesh which interfaces with petsc
> through the file petsc_linear solver.C (
> http://libmesh.github.io/doxygen/classlibMesh_1_1PetscLinearSolver.html#a4e66cc138b52e80e93a75e55315245ee)
> The KSPSolveTranspose() is called in adjoint_solve(). Changing that to
> KSPSolve() solves the issue and to me it is not a problem because my matrix
> is symmetric, but I don?t want to have to change it in the libMesh source
> code. So the question is, why do those CL options not work for the
> KSPSolveTranspose() despite having a symmetric matrix?
>

1) Are you sure the matrix itself is symmetric? It could have boundary
conditions that break this symmetry.

2) This sounds like a bug in PCApplyTranspose_FieldSplit() since I am
almost certain it is not tested.

3) Can you send the matrix and rhs? This should be easy by using MatView()
for a binary viewer.

  Thanks,

    Matt


> Thanks
>
-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160120/c959ecdf/attachment.html>

From salazardetro1 at llnl.gov  Wed Jan 20 12:22:19 2016
From: salazardetro1 at llnl.gov (Salazar De Troya, Miguel)
Date: Wed, 20 Jan 2016 18:22:19 +0000
Subject: [petsc-users] Preconditioners for KSPSolveTranspose() in linear
 elasticity
In-Reply-To: <CAMYG4Gk4LuhxV6nJtnk=EhY4=f-bf=6TKef61WiqYFKe7VZ+sA@mail.gmail.com>
References: <D2C50552.DF4%salazardetro1@llnl.gov>
	<CAMYG4Gk4LuhxV6nJtnk=EhY4=f-bf=6TKef61WiqYFKe7VZ+sA@mail.gmail.com>
Message-ID: <D2C50F1E.E00%salazardetro1@llnl.gov>

I am not 100% confident because I am not sure how libMesh handles the Dirichlet boundary conditions, I checked it with MatIsSymmetric() and obtained 1 as the response. Please find attached the binary files (I set up the viewer with PetscViewerSetFormat(viewer, PETSC_VIEWER_DEFAULT); hope that?s the right way)

Thanks

From: Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>>
Date: Wednesday, January 20, 2016 at 9:47 AM
To: Miguel Salazar <salazardetro1 at llnl.gov<mailto:salazardetro1 at llnl.gov>>
Cc: "petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>" <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Preconditioners for KSPSolveTranspose() in linear elasticity

On Wed, Jan 20, 2016 at 11:35 AM, Salazar De Troya, Miguel <salazardetro1 at llnl.gov<mailto:salazardetro1 at llnl.gov>> wrote:
Hello

I am trying to speed up a two dimensional linear elasticity problem with isotropic and heterogeneous properties. It is a topology optimization problem, therefore some regions have an almost zero stiffness whereas other regions have a higher value, making the matrix ill-conditioned. So far, from having searched mail lists on similar problems, I have come up with the following CL options to pass to the petsc solver (two dimensional problem):

-ksp_type cg -pc_type fieldsplit -pc_fieldsplit_block_size 2 -fieldsplit_pc_type hypre -fieldsplit_pc_hypre_type boomeramg -fieldsplit_pc_hypre_boomeramg_strong_threshold 0.7 -pc_fieldsplit_0 0,1 -pc_fieldsplit_type symmetric_multiplicative -ksp_atol 1e-10

It works reasonably well and shows similar number of iterations for different levels of refinement. However, it does not converge when I use the same options for KSPSolveTranspose(). I obtain DIVERGED_INDEFINITE_PC after three iterations. I believe this has to do with the field split, but I do not where to start. I am using libMesh which interfaces with petsc through the file petsc_linear solver.C (http://libmesh.github.io/doxygen/classlibMesh_1_1PetscLinearSolver.html#a4e66cc138b52e80e93a75e55315245ee) The KSPSolveTranspose() is called in adjoint_solve(). Changing that to KSPSolve() solves the issue and to me it is not a problem because my matrix is symmetric, but I don?t want to have to change it in the libMesh source code. So the question is, why do those CL options not work for the KSPSolveTranspose() despite having a symmetric matrix?

1) Are you sure the matrix itself is symmetric? It could have boundary conditions that break this symmetry.

2) This sounds like a bug in PCApplyTranspose_FieldSplit() since I am almost certain it is not tested.

3) Can you send the matrix and rhs? This should be easy by using MatView() for a binary viewer.

  Thanks,

    Matt

Thanks
--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160120/e3d8c0e6/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rhs_vector
Type: application/octet-stream
Size: 105624 bytes
Desc: rhs_vector
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160120/e3d8c0e6/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rhs_vector.info
Type: application/octet-stream
Size: 22 bytes
Desc: rhs_vector.info
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160120/e3d8c0e6/attachment-0005.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: stiffness_matrix
Type: application/octet-stream
Size: 2846472 bytes
Desc: stiffness_matrix
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160120/e3d8c0e6/attachment-0006.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: stiffness_matrix.info
Type: application/octet-stream
Size: 22 bytes
Desc: stiffness_matrix.info
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160120/e3d8c0e6/attachment-0007.obj>

From jed at jedbrown.org  Wed Jan 20 18:36:09 2016
From: jed at jedbrown.org (Jed Brown)
Date: Wed, 20 Jan 2016 17:36:09 -0700
Subject: [petsc-users] [petsc-maint] Preconditioners for
	KSPSolveTranspose() in linear elasticity
In-Reply-To: <D2C50F1E.E00%salazardetro1@llnl.gov>
References: <D2C50552.DF4%salazardetro1@llnl.gov>
	<CAMYG4Gk4LuhxV6nJtnk=EhY4=f-bf=6TKef61WiqYFKe7VZ+sA@mail.gmail.com>
	<D2C50F1E.E00%salazardetro1@llnl.gov>
Message-ID: <87lh7jbwg6.fsf@jedbrown.org>

"Salazar De Troya, Miguel" <salazardetro1 at llnl.gov> writes:

> I am not 100% confident because I am not sure how libMesh handles the
> Dirichlet boundary conditions, 

How are you using libmesh?  Normally you write the boundary conditions.
Many of the examples use penalty conditions, which maintain symmetric
but have other problems.

> I am trying to speed up a two dimensional linear elasticity problem with isotropic and heterogeneous properties. It is a topology optimization problem, therefore some regions have an almost zero stiffness whereas other regions have a higher value, making the matrix ill-conditioned. So far, from having searched mail lists on similar problems, I have come up with the following CL options to pass to the petsc solver (two dimensional problem):
>
> -ksp_type cg -pc_type fieldsplit -pc_fieldsplit_block_size 2 -fieldsplit_pc_type hypre -fieldsplit_pc_hypre_type boomeramg -fieldsplit_pc_hypre_boomeramg_strong_threshold 0.7 -pc_fieldsplit_0 0,1

This option looks funny.  What are you trying to do here?

> -pc_fieldsplit_type symmetric_multiplicative -ksp_atol 1e-10
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 818 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160120/fa817df9/attachment.pgp>

From salazardetro1 at llnl.gov  Thu Jan 21 10:17:11 2016
From: salazardetro1 at llnl.gov (Salazar De Troya, Miguel)
Date: Thu, 21 Jan 2016 16:17:11 +0000
Subject: [petsc-users] [petsc-maint] Preconditioners for
 KSPSolveTranspose() in linear elasticity
In-Reply-To: <87lh7jbwg6.fsf@jedbrown.org>
References: <D2C50552.DF4%salazardetro1@llnl.gov>
	<CAMYG4Gk4LuhxV6nJtnk=EhY4=f-bf=6TKef61WiqYFKe7VZ+sA@mail.gmail.com>
	<D2C50F1E.E00%salazardetro1@llnl.gov> <87lh7jbwg6.fsf@jedbrown.org>
Message-ID: <D2C640AA.E9E%salazardetro1@llnl.gov>

I write the boundary conditions using their DirichletBoundary class, not
the penalty term. The options I?m using are ones that I found in the
libMesh mail list from a user who suggested them for elasticity problems.
The idea he mentioned was to use field split to separate each field of the
displacement vector solution. I honestly do not know the role of
-pc_fieldsplit_type symmetric_multiplicative, but it was working for me.


On 1/20/16, 4:36 PM, "Jed Brown" <jed at jedbrown.org> wrote:

>"Salazar De Troya, Miguel" <salazardetro1 at llnl.gov> writes:
>
>> I am not 100% confident because I am not sure how libMesh handles the
>> Dirichlet boundary conditions,
>
>How are you using libmesh?  Normally you write the boundary conditions.
>Many of the examples use penalty conditions, which maintain symmetric
>but have other problems.
>
>> I am trying to speed up a two dimensional linear elasticity problem
>>with isotropic and heterogeneous properties. It is a topology
>>optimization problem, therefore some regions have an almost zero
>>stiffness whereas other regions have a higher value, making the matrix
>>ill-conditioned. So far, from having searched mail lists on similar
>>problems, I have come up with the following CL options to pass to the
>>petsc solver (two dimensional problem):
>>
>> -ksp_type cg -pc_type fieldsplit -pc_fieldsplit_block_size 2
>>-fieldsplit_pc_type hypre -fieldsplit_pc_hypre_type boomeramg
>>-fieldsplit_pc_hypre_boomeramg_strong_threshold 0.7 -pc_fieldsplit_0 0,1
>
>This option looks funny.  What are you trying to do here?
>
>> -pc_fieldsplit_type symmetric_multiplicative -ksp_atol 1e-10


From ptbauman at gmail.com  Thu Jan 21 10:32:11 2016
From: ptbauman at gmail.com (Paul T. Bauman)
Date: Thu, 21 Jan 2016 11:32:11 -0500
Subject: [petsc-users] 3.5 -> 3.6 Change in MatOrdering Behavior
Message-ID: <CAKiVBtu0p9d5BDViAz_ybg_KrKexSkoeuQauHeAEnKqra0yCoA@mail.gmail.com>

Greetings,

We have a test that has started failing upon switching from 3.5.4 to 3.6.0
(actually went straight to 3.6.3 but checked this is repeatable with
3.6.0). I've attached the matrix generated with -mat_view binary and a
small PETSc program that runs in serial that reproduces the behavior by
loading the matrix and solving a linear system (RHS doesn't matter here).
For context, this matrix is the Jacobian of a Taylor-Hood approximation of
the axisymmetric incompressible Navier-Stokes equations for flow between
concentric cylinders (for which there is an exact solution). The matrix is
for a two element case, hopefully small enough for debugging.

Using the following command line options with the test program works with
PETSc 3.5.4 and gives a NAN residual with PETSc 3.6.0:

PETSC_OPTIONS="-pc_type asm -pc_asm_overlap 12 -sub_pc_type ilu
-sub_pc_factor_mat_ordering_type 1wd -sub_pc_factor_levels 4"

If I remove the mat ordering option, all is well again in PETSc 3.6.x:

PETSC_OPTIONS="-pc_type asm -pc_asm_overlap 12 -sub_pc_type ilu
-sub_pc_factor_levels 4"

Those options are nothing special. They were arrived at through trial/error
to get decent behavior for the solver on up to 4 processors to keep the
time to something reasonable for the test suite without getting really
fancy. Specifically, we'd noticed this mat ordering on some problems in the
test suite behaved noticeably better.

As always, thanks for your time.

Best,

Paul
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160121/bf57ea32/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.mat
Type: application/octet-stream
Size: 33024 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160121/bf57ea32/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.mat.info
Type: application/octet-stream
Size: 65 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160121/bf57ea32/attachment-0003.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_mat.c
Type: text/x-csrc
Size: 1190 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160121/bf57ea32/attachment-0001.c>

From hzhang at mcs.anl.gov  Thu Jan 21 11:16:54 2016
From: hzhang at mcs.anl.gov (Hong)
Date: Thu, 21 Jan 2016 11:16:54 -0600
Subject: [petsc-users] 3.5 -> 3.6 Change in MatOrdering Behavior
In-Reply-To: <CAKiVBtu0p9d5BDViAz_ybg_KrKexSkoeuQauHeAEnKqra0yCoA@mail.gmail.com>
References: <CAKiVBtu0p9d5BDViAz_ybg_KrKexSkoeuQauHeAEnKqra0yCoA@mail.gmail.com>
Message-ID: <CAGCphBvWjYXatfN+MasiuMx9VuiytdkA6QRRWRAXSz2bjOmVOA@mail.gmail.com>

Paul :
Using petsc-dev (we recently added feature for better displaying
convergence behavior), I found that '-sub_pc_factor_mat_ordering_type 1wd'
causes zero pivot:

./ex10 -f0 test.mat -rhs 0 -pc_type asm -pc_asm_overlap 12 -sub_pc_type ilu
-sub_pc_factor_mat_ordering_type 1wd -sub_pc_factor_levels 4
-ksp_converged_reason
Linear solve did not converge due to DIVERGED_PCSETUP_FAILED iterations 0
               PCSETUP_FAILED due to SUBPC_ERROR
Number of iterations =   0

adding option '-info |grep zero'
[0] MatPivotCheck_none(): Detected zero pivot in factorization in row 0
value 0. tolerance 2.22045e-14

or '-ksp_error_if_not_converged'
[0]PETSC ERROR: Zero pivot in LU factorization:
http://www.mcs.anl.gov/petsc/documentation/faq.html#zeropivot
[0]PETSC ERROR: Zero pivot row 0 value 0. tolerance 2.22045e-14

'-sub_pc_factor_mat_ordering_type natural' avoids it:
./ex10 -f0 test.mat -rhs 0 -pc_type asm -pc_asm_overlap 12 -sub_pc_type ilu
-sub_pc_factor_mat_ordering_type natural -sub_pc_factor_levels 4
-ksp_converged_reason
Linear solve converged due to CONVERGED_RTOL iterations 1
Number of iterations =   1
  Residual norm < 1.e-12

Hong

Greetings,
>
> We have a test that has started failing upon switching from 3.5.4 to 3.6.0
> (actually went straight to 3.6.3 but checked this is repeatable with
> 3.6.0). I've attached the matrix generated with -mat_view binary and a
> small PETSc program that runs in serial that reproduces the behavior by
> loading the matrix and solving a linear system (RHS doesn't matter here).
> For context, this matrix is the Jacobian of a Taylor-Hood approximation of
> the axisymmetric incompressible Navier-Stokes equations for flow between
> concentric cylinders (for which there is an exact solution). The matrix is
> for a two element case, hopefully small enough for debugging.
>
> Using the following command line options with the test program works with
> PETSc 3.5.4 and gives a NAN residual with PETSc 3.6.0:
>
> PETSC_OPTIONS="-pc_type asm -pc_asm_overlap 12 -sub_pc_type ilu
> -sub_pc_factor_mat_ordering_type 1wd -sub_pc_factor_levels 4"
>
> If I remove the mat ordering option, all is well again in PETSc 3.6.x:
>
> PETSC_OPTIONS="-pc_type asm -pc_asm_overlap 12 -sub_pc_type ilu
> -sub_pc_factor_levels 4"
>
> Those options are nothing special. They were arrived at through
> trial/error to get decent behavior for the solver on up to 4 processors to
> keep the time to something reasonable for the test suite without getting
> really fancy. Specifically, we'd noticed this mat ordering on some problems
> in the test suite behaved noticeably better.
>
> As always, thanks for your time.
>
> Best,
>
> Paul
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160121/8589c457/attachment.html>

From ptbauman at gmail.com  Thu Jan 21 11:34:17 2016
From: ptbauman at gmail.com (Paul T. Bauman)
Date: Thu, 21 Jan 2016 12:34:17 -0500
Subject: [petsc-users] 3.5 -> 3.6 Change in MatOrdering Behavior
In-Reply-To: <CAGCphBvWjYXatfN+MasiuMx9VuiytdkA6QRRWRAXSz2bjOmVOA@mail.gmail.com>
References: <CAKiVBtu0p9d5BDViAz_ybg_KrKexSkoeuQauHeAEnKqra0yCoA@mail.gmail.com>
	<CAGCphBvWjYXatfN+MasiuMx9VuiytdkA6QRRWRAXSz2bjOmVOA@mail.gmail.com>
Message-ID: <CAKiVBtt9zNUXnYmqazE+zUkSkRRSjzNGkt5Zv0gH294o_013qA@mail.gmail.com>

Thanks Hong,

On Thu, Jan 21, 2016 at 12:16 PM, Hong <hzhang at mcs.anl.gov> wrote:

> Paul :
> Using petsc-dev (we recently added feature for better displaying
> convergence behavior),
>

OK, good to know, thanks.


> I found that '-sub_pc_factor_mat_ordering_type 1wd' causes zero pivot:
>

I figured it was something along these lines. So, just so I'm clear, likely
this zero pivot was always there with this mat ordering (i.e. no mat
ordering bits actually changed between 3.5 and 3.6) and this is a
reflection of increased consistency checking in the newer PETSc?

Thanks much,

Paul
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160121/65a86d2d/attachment.html>

From hzhang at mcs.anl.gov  Thu Jan 21 11:51:38 2016
From: hzhang at mcs.anl.gov (Hong)
Date: Thu, 21 Jan 2016 11:51:38 -0600
Subject: [petsc-users] 3.5 -> 3.6 Change in MatOrdering Behavior
In-Reply-To: <CAKiVBtt9zNUXnYmqazE+zUkSkRRSjzNGkt5Zv0gH294o_013qA@mail.gmail.com>
References: <CAKiVBtu0p9d5BDViAz_ybg_KrKexSkoeuQauHeAEnKqra0yCoA@mail.gmail.com>
	<CAGCphBvWjYXatfN+MasiuMx9VuiytdkA6QRRWRAXSz2bjOmVOA@mail.gmail.com>
	<CAKiVBtt9zNUXnYmqazE+zUkSkRRSjzNGkt5Zv0gH294o_013qA@mail.gmail.com>
Message-ID: <CAGCphBumvSZ1Jd-C9JKhknrkKWP4AbJyq2TMi2nn+7aNhwRWAQ@mail.gmail.com>

Paul:
It might be caused by our changes in default shift strategy.
We previously used '-pc_factor_shift_type NONZERO' for ilu, then changed to
'-pc_factor_shift_type NONE'.
For your test, I get
./ex10 -f0 test.mat -rhs 0 -pc_type asm -pc_asm_overlap 12 -sub_pc_type ilu
-sub_pc_factor_mat_ordering_type 1wd -sub_pc_factor_levels 4
-ksp_converged_reason -sub_pc_factor_shift_type NONZERO

Linear solve converged due to CONVERGED_RTOL iterations 2
Number of iterations =   2
Residual norm 0.0116896

with
'-sub_pc_factor_shift_type INBLOCKS':

Linear solve converged due to CONVERGED_RTOL iterations 2
Number of iterations =   2
Residual norm 0.00603736

I guess your previous run might use one of these options.

Hong

Thanks Hong,
>
> On Thu, Jan 21, 2016 at 12:16 PM, Hong <hzhang at mcs.anl.gov> wrote:
>
>> Paul :
>> Using petsc-dev (we recently added feature for better displaying
>> convergence behavior),
>>
>
> OK, good to know, thanks.
>
>
>> I found that '-sub_pc_factor_mat_ordering_type 1wd' causes zero pivot:
>>
>
> I figured it was something along these lines. So, just so I'm clear,
> likely this zero pivot was always there with this mat ordering (i.e. no mat
> ordering bits actually changed between 3.5 and 3.6) and this is a
> reflection of increased consistency checking in the newer PETSc?
>
> Thanks much,
>
> Paul
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160121/55cda276/attachment.html>

From ptbauman at gmail.com  Thu Jan 21 11:59:55 2016
From: ptbauman at gmail.com (Paul T. Bauman)
Date: Thu, 21 Jan 2016 12:59:55 -0500
Subject: [petsc-users] 3.5 -> 3.6 Change in MatOrdering Behavior
In-Reply-To: <CAGCphBumvSZ1Jd-C9JKhknrkKWP4AbJyq2TMi2nn+7aNhwRWAQ@mail.gmail.com>
References: <CAKiVBtu0p9d5BDViAz_ybg_KrKexSkoeuQauHeAEnKqra0yCoA@mail.gmail.com>
	<CAGCphBvWjYXatfN+MasiuMx9VuiytdkA6QRRWRAXSz2bjOmVOA@mail.gmail.com>
	<CAKiVBtt9zNUXnYmqazE+zUkSkRRSjzNGkt5Zv0gH294o_013qA@mail.gmail.com>
	<CAGCphBumvSZ1Jd-C9JKhknrkKWP4AbJyq2TMi2nn+7aNhwRWAQ@mail.gmail.com>
Message-ID: <CAKiVBtswF1fhfyvyavdN9eZSxCOaJgFQaYyA3Q2oy4kOwZ3J-w@mail.gmail.com>

On Thu, Jan 21, 2016 at 12:51 PM, Hong <hzhang at mcs.anl.gov> wrote:

> Paul:
> It might be caused by our changes in default shift strategy.
> We previously used '-pc_factor_shift_type NONZERO' for ilu, then changed
> to '-pc_factor_shift_type NONE'.
> For your test, I get
> ./ex10 -f0 test.mat -rhs 0 -pc_type asm -pc_asm_overlap 12 -sub_pc_type
> ilu -sub_pc_factor_mat_ordering_type 1wd -sub_pc_factor_levels 4
> -ksp_converged_reason -sub_pc_factor_shift_type NONZERO
>

Ah! That was the change. Confirmed, adding -sub_pc_factor_shift_type
nonzero gets our test passing again with those options. Mystery solved.
Thank you very much.

Best,

Paul
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160121/babef95b/attachment.html>

From wen.zhao at outlook.fr  Thu Jan 21 17:11:03 2016
From: wen.zhao at outlook.fr (wen zhao)
Date: Fri, 22 Jan 2016 00:11:03 +0100
Subject: [petsc-users] addition of two matrix
Message-ID: <BAY180-W20DFC9F283368259957A67E7C30@phx.gbl>

Hello,

I want to add to matrix, but i haven't found a function which can do this operation. Is there existe a kind of operation can do C = A + B

Thanks 
 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160122/92ed5555/attachment.html>

From dave.mayhem23 at gmail.com  Thu Jan 21 17:43:11 2016
From: dave.mayhem23 at gmail.com (Dave May)
Date: Fri, 22 Jan 2016 00:43:11 +0100
Subject: [petsc-users] addition of two matrix
In-Reply-To: <BAY180-W20DFC9F283368259957A67E7C30@phx.gbl>
References: <BAY180-W20DFC9F283368259957A67E7C30@phx.gbl>
Message-ID: <CAJ98EDqizHDefVp_P=_4abnEhhy+SUU_06to3ecSNsphox0kzA@mail.gmail.com>

Try this

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatAXPY.html

On 22 January 2016 at 00:11, wen zhao <wen.zhao at outlook.fr> wrote:

> Hello,
>
> I want to add to matrix, but i haven't found a function which can do this
> operation. Is there existe a kind of operation can do C = A + B
>
> Thanks
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160122/83ec2cb5/attachment.html>

From amneetb at live.unc.edu  Thu Jan 21 18:52:56 2016
From: amneetb at live.unc.edu (Bhalla, Amneet Pal S)
Date: Fri, 22 Jan 2016 00:52:56 +0000
Subject: [petsc-users] Runtime for ILU(k) vs. LU
Message-ID: <2A1B4293-6A96-4533-8405-85DF4D955880@ad.unc.edu>


Hi Folks,

Is there a general rule for runtime of ILU(k) vs. LU for some higher level k? In other words, after what value of 'k' one 
would be better off using LU in the preconditioner than ILU(k).

Thanks,
--Amneet

From bsmith at mcs.anl.gov  Thu Jan 21 19:03:39 2016
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Thu, 21 Jan 2016 19:03:39 -0600
Subject: [petsc-users] Runtime for ILU(k) vs. LU
In-Reply-To: <2A1B4293-6A96-4533-8405-85DF4D955880@ad.unc.edu>
References: <2A1B4293-6A96-4533-8405-85DF4D955880@ad.unc.edu>
Message-ID: <64BB108C-E291-40C1-83D8-F136710C2B5F@mcs.anl.gov>


  If ILU(0, 1, or 2) doesn't work well then ILU( n > 2) generally doesn't work well.


> On Jan 21, 2016, at 6:52 PM, Bhalla, Amneet Pal S <amneetb at live.unc.edu> wrote:
> 
> 
> Hi Folks,
> 
> Is there a general rule for runtime of ILU(k) vs. LU for some higher level k? In other words, after what value of 'k' one 
> would be better off using LU in the preconditioner than ILU(k).
> 
> Thanks,
> --Amneet


From jed at jedbrown.org  Thu Jan 21 21:48:08 2016
From: jed at jedbrown.org (Jed Brown)
Date: Thu, 21 Jan 2016 19:48:08 -0800
Subject: [petsc-users] Preconditioners for KSPSolveTranspose() in linear
	elasticity
In-Reply-To: <D2C640AA.E9E%salazardetro1@llnl.gov>
References: <D2C50552.DF4%salazardetro1@llnl.gov>
	<CAMYG4Gk4LuhxV6nJtnk=EhY4=f-bf=6TKef61WiqYFKe7VZ+sA@mail.gmail.com>
	<D2C50F1E.E00%salazardetro1@llnl.gov> <87lh7jbwg6.fsf@jedbrown.org>
	<D2C640AA.E9E%salazardetro1@llnl.gov>
Message-ID: <87k2n28ebr.fsf@jedbrown.org>

"Salazar De Troya, Miguel" <salazardetro1 at llnl.gov> writes:

> I write the boundary conditions using their DirichletBoundary class, not
> the penalty term. 


> The options I?m using are ones that I found in the libMesh mail list
> from a user who suggested them for elasticity problems.  The idea he
> mentioned was to use field split to separate each field of the
> displacement vector solution. I honestly do not know the role of
> -pc_fieldsplit_type symmetric_multiplicative, but it was working for
> me.

I would use GAMG or ML (without fieldsplit; set a near null space)
instead of Hypre, because the algorithm is usually better for
elasticity.  Fieldsplit can work for this, but the performance degrades
for higher Poisson ratio.  In any case,

  "-pc_fieldsplit_0 0,1"

is ignored (I think) and "-pc_fieldsplit_0_fields 0,1" is likely not
what you want.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 818 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160121/a6a07690/attachment.pgp>

From hgbk2008 at gmail.com  Fri Jan 22 03:40:06 2016
From: hgbk2008 at gmail.com (Hoang Giang Bui)
Date: Fri, 22 Jan 2016 10:40:06 +0100
Subject: [petsc-users] Why use MATMPIBAIJ?
In-Reply-To: <87si1ug8hl.fsf@jedbrown.org>
References: <CAP2=TMiuZV+DDPk6dqPjZ5Ndy8W5C5GaEreny7_FQuZ39i1gBw@mail.gmail.com>
	<2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov>
	<CAP2=TMhZeZ4JZ3AKARajab8CmFwdANnh0HOHqDDQRd8JfGkO3Q@mail.gmail.com>
	<DAA1C465-680F-4A5E-BA8F-9C6F7C055DAD@mcs.anl.gov>
	<CAP2=TMjwihEL1V7NPNByiMWfWfC5Q0MJyE1ByxnLokx1Df+OJA@mail.gmail.com>
	<CAJW_hKeU9MO8v+ub+5CjnaH3OpBkQNtGA8ZEiDtbYoQnSQ5rGQ@mail.gmail.com>
	<87fuy07zvi.fsf@jedbrown.org>
	<B1B62B13-BE6F-4B51-BC1A-F45FEF379854@mcs.anl.gov>
	<CAJW_hKcyTTCK1ErPFeu4hbeW+CD=PQJd_=H3xNXdO7TzhvfQOw@mail.gmail.com>
	<CAMYG4GkedkT9weo7UCUpTXP_f0jRBUYoySKXP5=q+os2Hpm=+Q@mail.gmail.com>
	<CAJW_hKfFoFSiwZ7kzCcXqFm3kkhLa4PBxTvb3NHHFWNLFqLyqA@mail.gmail.com>
	<87si1ug8hl.fsf@jedbrown.org>
Message-ID: <CAJW_hKdZsTHkQdN_zs=Ja0e-Z7jY5EMY+QPuRBZny8pri_iYEw@mail.gmail.com>

Hi Matt
I would rather like to set the block size for block P2 too. Why?

Because in one of my test (for problem involves only [u_x u_y u_z]), the
gmres + Hypre AMG converges in 50 steps with block size 3, whereby it
increases to 140 if block size is 1 (see attached files).

This gives me the impression that AMG will give better inversion for "P2"
block if I can set its block size to 3. Of course it's still an hypothesis
but worth to try.

Another question: In one of the Petsc presentation, you said the Hypre AMG
does not scale well, because set up cost amortize the iterations. How is it
quantified? and what is the memory overhead?


Giang

On Mon, Jan 18, 2016 at 5:25 PM, Jed Brown <jed at jedbrown.org> wrote:

> Hoang Giang Bui <hgbk2008 at gmail.com> writes:
>
> > Why P2/P2 is not for co-located discretization?
>
> Matt typed "P2/P2" when me meant "P2/P1".
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160122/7f953405/attachment-0001.html>
-------------- next part --------------
Mat BlockSize: 1
  0 KSP preconditioned resid norm 1.911887586816e+01 true resid norm 1.379276869721e+08 ||r(i)||/||b|| 1.000000000000e+00
  1 KSP preconditioned resid norm 2.766603264917e+00 true resid norm 3.321478370642e+08 ||r(i)||/||b|| 2.408130262718e+00
  2 KSP preconditioned resid norm 1.341257182306e+00 true resid norm 2.284905925636e+08 ||r(i)||/||b|| 1.656597000788e+00
  3 KSP preconditioned resid norm 8.396573229986e-01 true resid norm 2.270539153896e+08 ||r(i)||/||b|| 1.646180838482e+00
  4 KSP preconditioned resid norm 7.000469584372e-01 true resid norm 8.924328074922e+07 ||r(i)||/||b|| 6.470294884832e-01
  5 KSP preconditioned resid norm 5.856685616746e-01 true resid norm 1.504438024413e+08 ||r(i)||/||b|| 1.090744039460e+00
  6 KSP preconditioned resid norm 5.206871112424e-01 true resid norm 7.058092385951e+07 ||r(i)||/||b|| 5.117241172455e-01
  7 KSP preconditioned resid norm 4.410181993512e-01 true resid norm 1.051980263461e+08 ||r(i)||/||b|| 7.627042014222e-01
  8 KSP preconditioned resid norm 3.881051205583e-01 true resid norm 4.982159043203e+07 ||r(i)||/||b|| 3.612152971297e-01
  9 KSP preconditioned resid norm 3.512930248959e-01 true resid norm 6.067212598629e+07 ||r(i)||/||b|| 4.398835891344e-01
 10 KSP preconditioned resid norm 3.271448340295e-01 true resid norm 3.424131685069e+07 ||r(i)||/||b|| 2.482555721943e-01
 11 KSP preconditioned resid norm 3.011320838979e-01 true resid norm 5.174670019159e+07 ||r(i)||/||b|| 3.751726816246e-01
 12 KSP preconditioned resid norm 2.805100118127e-01 true resid norm 2.781535129664e+07 ||r(i)||/||b|| 2.016661912286e-01
 13 KSP preconditioned resid norm 2.614209581950e-01 true resid norm 3.994710222180e+07 ||r(i)||/||b|| 2.896235201123e-01
 14 KSP preconditioned resid norm 2.453473435748e-01 true resid norm 2.418477634219e+07 ||r(i)||/||b|| 1.753438839809e-01
 15 KSP preconditioned resid norm 2.289681095906e-01 true resid norm 3.486085091310e+07 ||r(i)||/||b|| 2.527473031586e-01
 16 KSP preconditioned resid norm 2.144691019898e-01 true resid norm 2.104834554929e+07 ||r(i)||/||b|| 1.526042088530e-01
 17 KSP preconditioned resid norm 1.966530290799e-01 true resid norm 3.221759403482e+07 ||r(i)||/||b|| 2.335832256894e-01
 18 KSP preconditioned resid norm 1.787680477497e-01 true resid norm 1.985608018861e+07 ||r(i)||/||b|| 1.439600751996e-01
 19 KSP preconditioned resid norm 1.616287717763e-01 true resid norm 2.594110090381e+07 ||r(i)||/||b|| 1.880775460917e-01
 20 KSP preconditioned resid norm 1.459692725095e-01 true resid norm 1.632446483812e+07 ||r(i)||/||b|| 1.183552424933e-01
 21 KSP preconditioned resid norm 1.285628249652e-01 true resid norm 2.243797730991e+07 ||r(i)||/||b|| 1.626792836339e-01
 22 KSP preconditioned resid norm 1.108907310481e-01 true resid norm 1.478710121511e+07 ||r(i)||/||b|| 1.072090857154e-01
 23 KSP preconditioned resid norm 9.615779693990e-02 true resid norm 1.714570075368e+07 ||r(i)||/||b|| 1.243093473839e-01
 24 KSP preconditioned resid norm 8.502624437458e-02 true resid norm 1.099906702424e+07 ||r(i)||/||b|| 7.974517129734e-02
 25 KSP preconditioned resid norm 7.485831599300e-02 true resid norm 1.273774482167e+07 ||r(i)||/||b|| 9.235089126263e-02
 26 KSP preconditioned resid norm 6.498795881661e-02 true resid norm 9.054891380813e+06 ||r(i)||/||b|| 6.564955578966e-02
 27 KSP preconditioned resid norm 5.606210257311e-02 true resid norm 9.322345868566e+06 ||r(i)||/||b|| 6.758864788658e-02
 28 KSP preconditioned resid norm 4.852981804509e-02 true resid norm 6.783123934236e+06 ||r(i)||/||b|| 4.917884206677e-02
 29 KSP preconditioned resid norm 4.115275921216e-02 true resid norm 7.099683403283e+06 ||r(i)||/||b|| 5.147395391847e-02
 30 KSP preconditioned resid norm 3.475504277276e-02 true resid norm 5.078397872029e+06 ||r(i)||/||b|| 3.681927815593e-02
 31 KSP preconditioned resid norm 2.997164782126e-02 true resid norm 4.946201748538e+06 ||r(i)||/||b|| 3.586083299968e-02
 32 KSP preconditioned resid norm 2.664135997726e-02 true resid norm 3.448113390289e+06 ||r(i)||/||b|| 2.499942880204e-02
 33 KSP preconditioned resid norm 2.388699396304e-02 true resid norm 3.463856953580e+06 ||r(i)||/||b|| 2.511357240610e-02
 34 KSP preconditioned resid norm 2.137640944786e-02 true resid norm 2.686552612036e+06 ||r(i)||/||b|| 1.947797915715e-02
 35 KSP preconditioned resid norm 1.881149396168e-02 true resid norm 2.957495619033e+06 ||r(i)||/||b|| 2.144236363241e-02
 36 KSP preconditioned resid norm 1.647029407191e-02 true resid norm 2.245396553839e+06 ||r(i)||/||b|| 1.627952011037e-02
 37 KSP preconditioned resid norm 1.440370412702e-02 true resid norm 2.435890403622e+06 ||r(i)||/||b|| 1.766063404018e-02
 38 KSP preconditioned resid norm 1.276902196677e-02 true resid norm 1.729689589175e+06 ||r(i)||/||b|| 1.254055387389e-02
 39 KSP preconditioned resid norm 1.139909831257e-02 true resid norm 1.781670979589e+06 ||r(i)||/||b|| 1.291742810093e-02
 40 KSP preconditioned resid norm 1.010426972645e-02 true resid norm 1.465549222837e+06 ||r(i)||/||b|| 1.062548974039e-02
 41 KSP preconditioned resid norm 8.841216742488e-03 true resid norm 1.517327342182e+06 ||r(i)||/||b|| 1.100089021640e-02
 42 KSP preconditioned resid norm 7.597480392291e-03 true resid norm 1.309360150093e+06 ||r(i)||/||b|| 9.493091480305e-03
 43 KSP preconditioned resid norm 6.426423094622e-03 true resid norm 1.201517765632e+06 ||r(i)||/||b|| 8.711215217252e-03
 44 KSP preconditioned resid norm 5.485494418122e-03 true resid norm 9.649312672951e+05 ||r(i)||/||b|| 6.995921475073e-03
 45 KSP preconditioned resid norm 4.745918452027e-03 true resid norm 8.357247744265e+05 ||r(i)||/||b|| 6.059151666885e-03
 46 KSP preconditioned resid norm 4.198710675595e-03 true resid norm 6.521219706502e+05 ||r(i)||/||b|| 4.727999033161e-03
 47 KSP preconditioned resid norm 3.789795192481e-03 true resid norm 5.714532479962e+05 ||r(i)||/||b|| 4.143136599629e-03
 48 KSP preconditioned resid norm 3.467959153060e-03 true resid norm 4.692192176654e+05 ||r(i)||/||b|| 3.401921890856e-03
 49 KSP preconditioned resid norm 3.196981188103e-03 true resid norm 4.555886699213e+05 ||r(i)||/||b|| 3.303098021309e-03
 50 KSP preconditioned resid norm 2.942116744663e-03 true resid norm 3.798527602738e+05 ||r(i)||/||b|| 2.753999350041e-03
 51 KSP preconditioned resid norm 2.708801858465e-03 true resid norm 3.689853990680e+05 ||r(i)||/||b|| 2.675209069102e-03
 52 KSP preconditioned resid norm 2.502301741803e-03 true resid norm 3.275165211396e+05 ||r(i)||/||b|| 2.374552407349e-03
 53 KSP preconditioned resid norm 2.317949488968e-03 true resid norm 3.018851715748e+05 ||r(i)||/||b|| 2.188720612968e-03
 54 KSP preconditioned resid norm 2.156174844354e-03 true resid norm 2.823523554064e+05 ||r(i)||/||b|| 2.047104258796e-03
 55 KSP preconditioned resid norm 2.000110080039e-03 true resid norm 2.543455755483e+05 ||r(i)||/||b|| 1.844050176813e-03
 56 KSP preconditioned resid norm 1.847999471907e-03 true resid norm 2.391365810971e+05 ||r(i)||/||b|| 1.733782290901e-03
 57 KSP preconditioned resid norm 1.708100898735e-03 true resid norm 2.118002702306e+05 ||r(i)||/||b|| 1.535589227082e-03
 58 KSP preconditioned resid norm 1.573441943768e-03 true resid norm 1.999611130374e+05 ||r(i)||/||b|| 1.449753254238e-03
 59 KSP preconditioned resid norm 1.453586970061e-03 true resid norm 1.779722098624e+05 ||r(i)||/||b|| 1.290329837101e-03
 60 KSP preconditioned resid norm 1.350886322617e-03 true resid norm 1.688723404284e+05 ||r(i)||/||b|| 1.224354182511e-03
 61 KSP preconditioned resid norm 1.258682463134e-03 true resid norm 1.547529842690e+05 ||r(i)||/||b|| 1.121986365945e-03
 62 KSP preconditioned resid norm 1.169434992982e-03 true resid norm 1.472675575210e+05 ||r(i)||/||b|| 1.067715704903e-03
 63 KSP preconditioned resid norm 1.079674819468e-03 true resid norm 1.330438765481e+05 ||r(i)||/||b|| 9.645915150817e-04
 64 KSP preconditioned resid norm 9.874428413616e-04 true resid norm 1.285833489083e+05 ||r(i)||/||b|| 9.322519048287e-04
 65 KSP preconditioned resid norm 8.941059635027e-04 true resid norm 1.142253308783e+05 ||r(i)||/||b|| 8.281537477054e-04
 66 KSP preconditioned resid norm 7.872220338454e-04 true resid norm 1.102884114086e+05 ||r(i)||/||b|| 7.996103888185e-04
 67 KSP preconditioned resid norm 6.839661733511e-04 true resid norm 9.503368205957e+04 ||r(i)||/||b|| 6.890109168495e-04
 68 KSP preconditioned resid norm 5.898177057824e-04 true resid norm 8.569975158634e+04 ||r(i)||/||b|| 6.213382785407e-04
 69 KSP preconditioned resid norm 5.053678856224e-04 true resid norm 7.498934376228e+04 ||r(i)||/||b|| 5.436859372365e-04
 70 KSP preconditioned resid norm 4.320022440871e-04 true resid norm 6.418754062251e+04 ||r(i)||/||b|| 4.653709638117e-04
 71 KSP preconditioned resid norm 3.672139191641e-04 true resid norm 5.605122833701e+04 ||r(i)||/||b|| 4.063812680941e-04
 72 KSP preconditioned resid norm 3.066360391429e-04 true resid norm 4.821781492553e+04 ||r(i)||/||b|| 3.495876425107e-04
 73 KSP preconditioned resid norm 2.548299312382e-04 true resid norm 4.196880387078e+04 ||r(i)||/||b|| 3.042812128016e-04
 74 KSP preconditioned resid norm 2.118585701026e-04 true resid norm 3.526464374380e+04 ||r(i)||/||b|| 2.556748722317e-04
 75 KSP preconditioned resid norm 1.767429538449e-04 true resid norm 2.944454292582e+04 ||r(i)||/||b|| 2.134781172092e-04
 76 KSP preconditioned resid norm 1.500301319043e-04 true resid norm 2.468586338272e+04 ||r(i)||/||b|| 1.789768531950e-04
 77 KSP preconditioned resid norm 1.293962820440e-04 true resid norm 2.072070921551e+04 ||r(i)||/||b|| 1.502287877828e-04
 78 KSP preconditioned resid norm 1.105232365734e-04 true resid norm 1.780577813563e+04 ||r(i)||/||b|| 1.290950245489e-04
 79 KSP preconditioned resid norm 9.283977688287e-05 true resid norm 1.493966657631e+04 ||r(i)||/||b|| 1.083152114291e-04
 80 KSP preconditioned resid norm 7.791430766433e-05 true resid norm 1.247067574651e+04 ||r(i)||/||b|| 9.041459347486e-05
 81 KSP preconditioned resid norm 6.556651888802e-05 true resid norm 9.968297147374e+03 ||r(i)||/||b|| 7.227190831810e-05
 82 KSP preconditioned resid norm 5.537504198132e-05 true resid norm 8.629249672435e+03 ||r(i)||/||b|| 6.256357850894e-05
 83 KSP preconditioned resid norm 4.703997534876e-05 true resid norm 7.096617147931e+03 ||r(i)||/||b|| 5.145172302764e-05
 84 KSP preconditioned resid norm 4.006510714154e-05 true resid norm 6.366160074207e+03 ||r(i)||/||b|| 4.615578071352e-05
 85 KSP preconditioned resid norm 3.376683589896e-05 true resid norm 5.396248837455e+03 ||r(i)||/||b|| 3.912375358363e-05
 86 KSP preconditioned resid norm 2.809495214990e-05 true resid norm 4.909188189426e+03 ||r(i)||/||b|| 3.559247818329e-05
 87 KSP preconditioned resid norm 2.347820962340e-05 true resid norm 3.974763705693e+03 ||r(i)||/||b|| 2.881773625695e-05
 88 KSP preconditioned resid norm 1.982396512986e-05 true resid norm 3.473147974279e+03 ||r(i)||/||b|| 2.518093394100e-05
 89 KSP preconditioned resid norm 1.700659761335e-05 true resid norm 2.760227290755e+03 ||r(i)||/||b|| 2.001213354149e-05
 90 KSP preconditioned resid norm 1.499373175494e-05 true resid norm 2.344074368650e+03 ||r(i)||/||b|| 1.699495163088e-05
 91 KSP preconditioned resid norm 1.360307479804e-05 true resid norm 1.900490151078e+03 ||r(i)||/||b|| 1.377888800138e-05
 92 KSP preconditioned resid norm 1.249495433113e-05 true resid norm 1.618210433966e+03 ||r(i)||/||b|| 1.173231038300e-05
 93 KSP preconditioned resid norm 1.150300683784e-05 true resid norm 1.455954597720e+03 ||r(i)||/||b|| 1.055592702004e-05
 94 KSP preconditioned resid norm 1.059084138944e-05 true resid norm 1.262010781402e+03 ||r(i)||/||b|| 9.149800225803e-06
 95 KSP preconditioned resid norm 9.662389323940e-06 true resid norm 1.215301439629e+03 ||r(i)||/||b|| 8.811149279081e-06
 96 KSP preconditioned resid norm 8.681503480457e-06 true resid norm 1.069004503090e+03 ||r(i)||/||b|| 7.750470747086e-06
 97 KSP preconditioned resid norm 7.682419140612e-06 true resid norm 1.045217218220e+03 ||r(i)||/||b|| 7.578008746218e-06
 98 KSP preconditioned resid norm 6.768975296683e-06 true resid norm 9.348747369017e+02 ||r(i)||/||b|| 6.778006341040e-06
 99 KSP preconditioned resid norm 5.977242747868e-06 true resid norm 8.718383163583e+02 ||r(i)||/||b|| 6.320981200350e-06
100 KSP preconditioned resid norm 5.285042408342e-06 true resid norm 7.683347534359e+02 ||r(i)||/||b|| 5.570562156904e-06
101 KSP preconditioned resid norm 4.665732202515e-06 true resid norm 6.777921193321e+02 ||r(i)||/||b|| 4.914112127968e-06
102 KSP preconditioned resid norm 4.130379914167e-06 true resid norm 5.772281802894e+02 ||r(i)||/||b|| 4.185005874899e-06
103 KSP preconditioned resid norm 3.628698410807e-06 true resid norm 4.894653296876e+02 ||r(i)||/||b|| 3.548709765478e-06
104 KSP preconditioned resid norm 3.153048277807e-06 true resid norm 4.249356620286e+02 ||r(i)||/||b|| 3.080858320452e-06
105 KSP preconditioned resid norm 2.757772234239e-06 true resid norm 3.519900248906e+02 ||r(i)||/||b|| 2.551989615847e-06
106 KSP preconditioned resid norm 2.445508908427e-06 true resid norm 3.136882105238e+02 ||r(i)||/||b|| 2.274294722185e-06
107 KSP preconditioned resid norm 2.198218991806e-06 true resid norm 2.634914570760e+02 ||r(i)||/||b|| 1.910359427178e-06
108 KSP preconditioned resid norm 1.974395242035e-06 true resid norm 2.480478776426e+02 ||r(i)||/||b|| 1.798390758867e-06
109 KSP preconditioned resid norm 1.762729796116e-06 true resid norm 2.226623111329e+02 ||r(i)||/||b|| 1.614340934884e-06
110 KSP preconditioned resid norm 1.568122463183e-06 true resid norm 2.145171953162e+02 ||r(i)||/||b|| 1.555287412016e-06
111 KSP preconditioned resid norm 1.387736181698e-06 true resid norm 1.952676099913e+02 ||r(i)||/||b|| 1.415724531296e-06
112 KSP preconditioned resid norm 1.216112588033e-06 true resid norm 1.838192127791e+02 ||r(i)||/||b|| 1.332721637073e-06
113 KSP preconditioned resid norm 1.053116104148e-06 true resid norm 1.636126064266e+02 ||r(i)||/||b|| 1.186220185507e-06
114 KSP preconditioned resid norm 9.031383314478e-07 true resid norm 1.492802463042e+02 ||r(i)||/||b|| 1.082308052729e-06
115 KSP preconditioned resid norm 7.632614892009e-07 true resid norm 1.290421703804e+02 ||r(i)||/||b|| 9.355784412344e-07
116 KSP preconditioned resid norm 6.312348151395e-07 true resid norm 1.137948273916e+02 ||r(i)||/||b|| 8.250325216763e-07
117 KSP preconditioned resid norm 5.131878711201e-07 true resid norm 9.359691688996e+01 ||r(i)||/||b|| 6.785941165598e-07
118 KSP preconditioned resid norm 4.175283244420e-07 true resid norm 7.836275622787e+01 ||r(i)||/||b|| 5.681437711903e-07
119 KSP preconditioned resid norm 3.436406468823e-07 true resid norm 6.250193237076e+01 ||r(i)||/||b|| 4.531500073905e-07
120 KSP preconditioned resid norm 2.848928030049e-07 true resid norm 5.073297682146e+01 ||r(i)||/||b|| 3.678230088186e-07
121 KSP preconditioned resid norm 2.385222482703e-07 true resid norm 4.086992937564e+01 ||r(i)||/||b|| 2.963141793563e-07
122 KSP preconditioned resid norm 2.009837789504e-07 true resid norm 3.313795701825e+01 ||r(i)||/||b|| 2.402560192643e-07
123 KSP preconditioned resid norm 1.702119131612e-07 true resid norm 2.713678016435e+01 ||r(i)||/||b|| 1.967464311197e-07
124 KSP preconditioned resid norm 1.452437153791e-07 true resid norm 2.275092931707e+01 ||r(i)||/||b|| 1.649482407522e-07
125 KSP preconditioned resid norm 1.258597152459e-07 true resid norm 1.969719685589e+01 ||r(i)||/||b|| 1.428081430807e-07
126 KSP preconditioned resid norm 1.099656180693e-07 true resid norm 1.684553971561e+01 ||r(i)||/||b|| 1.221331270423e-07
127 KSP preconditioned resid norm 9.609071194249e-08 true resid norm 1.456894280933e+01 ||r(i)||/||b|| 1.056273988868e-07
128 KSP preconditioned resid norm 8.335806699897e-08 true resid norm 1.259199550933e+01 ||r(i)||/||b|| 9.129418310245e-08
129 KSP preconditioned resid norm 7.201543486543e-08 true resid norm 1.066127980017e+01 ||r(i)||/||b|| 7.729615448656e-08
130 KSP preconditioned resid norm 6.224482708204e-08 true resid norm 9.347965933554e+00 ||r(i)||/||b|| 6.777439786579e-08
131 KSP preconditioned resid norm 5.370305957638e-08 true resid norm 7.934585697190e+00 ||r(i)||/||b|| 5.752714245687e-08
132 KSP preconditioned resid norm 4.572913023727e-08 true resid norm 7.068485009027e+00 ||r(i)||/||b|| 5.124776007052e-08
133 KSP preconditioned resid norm 3.872205065558e-08 true resid norm 5.976371697552e+00 ||r(i)||/||b|| 4.332974639647e-08
134 KSP preconditioned resid norm 3.285092158053e-08 true resid norm 5.218042556055e+00 ||r(i)||/||b|| 3.783172668669e-08
135 KSP preconditioned resid norm 2.789700871097e-08 true resid norm 4.438823171064e+00 ||r(i)||/||b|| 3.218224903577e-08
136 KSP preconditioned resid norm 2.364142882920e-08 true resid norm 3.949840232160e+00 ||r(i)||/||b|| 2.863703668836e-08
137 KSP preconditioned resid norm 2.005918792395e-08 true resid norm 3.363443951745e+00 ||r(i)||/||b|| 2.438556047435e-08
138 KSP preconditioned resid norm 1.701096972675e-08 true resid norm 2.979989310769e+00 ||r(i)||/||b|| 2.160544685544e-08
Linear solve converged due to CONVERGED_RTOL iterations 138
KSP Object: 8 MPI processes
  type: gmres
    GMRES: restart=300, using Modified Gram-Schmidt Orthogonalization
    GMRES: happy breakdown tolerance 1e-30
  maximum iterations=300, initial guess is zero
  tolerances:  relative=1e-09, absolute=1e-20, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object: 8 MPI processes
  type: hypre
    HYPRE BoomerAMG preconditioning
    HYPRE BoomerAMG: Cycle type V
    HYPRE BoomerAMG: Maximum number of levels 25
    HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1
    HYPRE BoomerAMG: Convergence tolerance PER hypre call 0
    HYPRE BoomerAMG: Threshold for strong coupling 0.25
    HYPRE BoomerAMG: Interpolation truncation factor 0
    HYPRE BoomerAMG: Interpolation: max elements per row 0
    HYPRE BoomerAMG: Number of levels of aggressive coarsening 0
    HYPRE BoomerAMG: Number of paths for aggressive coarsening 1
    HYPRE BoomerAMG: Maximum row sums 0.9
    HYPRE BoomerAMG: Sweeps down         1
    HYPRE BoomerAMG: Sweeps up           1
    HYPRE BoomerAMG: Sweeps on coarse    1
    HYPRE BoomerAMG: Relax down          symmetric-SOR/Jacobi
    HYPRE BoomerAMG: Relax up            symmetric-SOR/Jacobi
    HYPRE BoomerAMG: Relax on coarse     Gaussian-elimination
    HYPRE BoomerAMG: Relax weight  (all)      1
    HYPRE BoomerAMG: Outer relax weight (all) 1
    HYPRE BoomerAMG: Using CF-relaxation
    HYPRE BoomerAMG: Measure type        local
    HYPRE BoomerAMG: Coarsen type        PMIS
    HYPRE BoomerAMG: Interpolation type  classical
  linear system matrix = precond matrix:
  Mat Object:   8 MPI processes
    type: mpiaij
    rows=657685, cols=657685
    total: nonzeros=1.19268e+08, allocated nonzeros=1.73433e+08
    total number of mallocs used during MatSetValues calls =0
      using I-node (on process 0) routines: found 27310 nodes, limit used is 5
KSPSolve completed
-------------- next part --------------
Mat BlockSize: 3
  0 KSP preconditioned resid norm 3.922843899310e+01 true resid norm 1.379276869721e+08 ||r(i)||/||b|| 1.000000000000e+00
  1 KSP preconditioned resid norm 3.789896758165e+00 true resid norm 3.322014839602e+08 ||r(i)||/||b|| 2.408519212154e+00
  2 KSP preconditioned resid norm 1.662141140725e+00 true resid norm 1.679851765628e+08 ||r(i)||/||b|| 1.217922088383e+00
  3 KSP preconditioned resid norm 7.267315369946e-01 true resid norm 1.849826360747e+08 ||r(i)||/||b|| 1.341156660679e+00
  4 KSP preconditioned resid norm 2.968302859938e-01 true resid norm 9.098771823190e+07 ||r(i)||/||b|| 6.596769671800e-01
  5 KSP preconditioned resid norm 1.850093521701e-01 true resid norm 3.367649135701e+07 ||r(i)||/||b|| 2.441604879796e-01
  6 KSP preconditioned resid norm 1.123686922687e-01 true resid norm 2.397865218391e+07 ||r(i)||/||b|| 1.738494475642e-01
  7 KSP preconditioned resid norm 7.007966328525e-02 true resid norm 1.704429706270e+07 ||r(i)||/||b|| 1.235741527816e-01
  8 KSP preconditioned resid norm 4.540226244304e-02 true resid norm 1.155499950028e+07 ||r(i)||/||b|| 8.377577956930e-02
  9 KSP preconditioned resid norm 2.826820112754e-02 true resid norm 8.329824418647e+06 ||r(i)||/||b|| 6.039269273277e-02
 10 KSP preconditioned resid norm 1.548122767123e-02 true resid norm 5.046728472042e+06 ||r(i)||/||b|| 3.658966943354e-02
 11 KSP preconditioned resid norm 8.319309697099e-03 true resid norm 2.706550673766e+06 ||r(i)||/||b|| 1.962296862351e-02
 12 KSP preconditioned resid norm 4.305763732343e-03 true resid norm 1.410741128981e+06 ||r(i)||/||b|| 1.022812141601e-02
 13 KSP preconditioned resid norm 2.440491660864e-03 true resid norm 7.333572066616e+05 ||r(i)||/||b|| 5.316968788217e-03
 14 KSP preconditioned resid norm 1.476352449129e-03 true resid norm 4.058067750648e+05 ||r(i)||/||b|| 2.942170524087e-03
 15 KSP preconditioned resid norm 8.703809868174e-04 true resid norm 2.713238735433e+05 ||r(i)||/||b|| 1.967145824741e-03
 16 KSP preconditioned resid norm 5.001006007448e-04 true resid norm 1.707183750745e+05 ||r(i)||/||b|| 1.237738258520e-03
 17 KSP preconditioned resid norm 2.877239505205e-04 true resid norm 1.072077924150e+05 ||r(i)||/||b|| 7.772753590559e-04
 18 KSP preconditioned resid norm 1.800774654043e-04 true resid norm 7.938540730423e+04 ||r(i)||/||b|| 5.755581714373e-04
 19 KSP preconditioned resid norm 1.158652230276e-04 true resid norm 6.225024111862e+04 ||r(i)||/||b|| 4.513252015255e-04
 20 KSP preconditioned resid norm 7.758215398310e-05 true resid norm 5.008804225912e+04 ||r(i)||/||b|| 3.631471197603e-04
 21 KSP preconditioned resid norm 5.662942779039e-05 true resid norm 4.190496993343e+04 ||r(i)||/||b|| 3.038184055237e-04
 22 KSP preconditioned resid norm 4.130690000478e-05 true resid norm 3.441885346649e+04 ||r(i)||/||b|| 2.495427438978e-04
 23 KSP preconditioned resid norm 3.169843455827e-05 true resid norm 2.773595092118e+04 ||r(i)||/||b|| 2.010905245355e-04
 24 KSP preconditioned resid norm 2.398916332360e-05 true resid norm 2.141075928952e+04 ||r(i)||/||b|| 1.552317722391e-04
 25 KSP preconditioned resid norm 1.806290897280e-05 true resid norm 1.600142681600e+04 ||r(i)||/||b|| 1.160131599918e-04
 26 KSP preconditioned resid norm 1.306830199253e-05 true resid norm 1.114410319574e+04 ||r(i)||/||b|| 8.079670906096e-05
 27 KSP preconditioned resid norm 9.332635869324e-06 true resid norm 8.131127062842e+03 ||r(i)||/||b|| 5.895210194083e-05
 28 KSP preconditioned resid norm 7.155256833303e-06 true resid norm 6.423264348021e+03 ||r(i)||/||b|| 4.656979674661e-05
 29 KSP preconditioned resid norm 5.755469825493e-06 true resid norm 5.164732533873e+03 ||r(i)||/||b|| 3.744521964556e-05
 30 KSP preconditioned resid norm 4.713974229264e-06 true resid norm 4.212063007112e+03 ||r(i)||/||b|| 3.053819794690e-05
 31 KSP preconditioned resid norm 3.923079924701e-06 true resid norm 3.379019742987e+03 ||r(i)||/||b|| 2.449848770154e-05
 32 KSP preconditioned resid norm 3.198857062416e-06 true resid norm 2.635520676977e+03 ||r(i)||/||b|| 1.910798864851e-05
 33 KSP preconditioned resid norm 2.434729804632e-06 true resid norm 1.989444200221e+03 ||r(i)||/||b|| 1.442382050982e-05
 34 KSP preconditioned resid norm 1.799411077253e-06 true resid norm 1.601232088542e+03 ||r(i)||/||b|| 1.160921439120e-05
 35 KSP preconditioned resid norm 1.441113857607e-06 true resid norm 1.388357809073e+03 ||r(i)||/||b|| 1.006583840817e-05
 36 KSP preconditioned resid norm 1.222670192141e-06 true resid norm 1.193888531401e+03 ||r(i)||/||b|| 8.655901926654e-06
 37 KSP preconditioned resid norm 1.022705842198e-06 true resid norm 9.615954443382e+02 ||r(i)||/||b|| 6.971736171671e-06
 38 KSP preconditioned resid norm 8.061451908820e-07 true resid norm 7.055806093973e+02 ||r(i)||/||b|| 5.115583570542e-06
 39 KSP preconditioned resid norm 6.141756002800e-07 true resid norm 5.077759033730e+02 ||r(i)||/||b|| 3.681464646585e-06
 40 KSP preconditioned resid norm 4.523502767060e-07 true resid norm 3.646731386764e+02 ||r(i)||/||b|| 2.643944422487e-06
 41 KSP preconditioned resid norm 3.435490374534e-07 true resid norm 2.756734541759e+02 ||r(i)||/||b|| 1.998681049670e-06
 42 KSP preconditioned resid norm 2.698337933227e-07 true resid norm 2.119727541091e+02 ||r(i)||/||b|| 1.536839765551e-06
 43 KSP preconditioned resid norm 2.022177136958e-07 true resid norm 1.525505724444e+02 ||r(i)||/||b|| 1.106018492685e-06
 44 KSP preconditioned resid norm 1.450868732897e-07 true resid norm 1.092286964160e+02 ||r(i)||/||b|| 7.919272686572e-07
 45 KSP preconditioned resid norm 1.093179310599e-07 true resid norm 8.571670843458e+01 ||r(i)||/||b|| 6.214612186741e-07
 46 KSP preconditioned resid norm 8.815037192628e-08 true resid norm 7.064136907088e+01 ||r(i)||/||b|| 5.121623556638e-07
 47 KSP preconditioned resid norm 7.142227995496e-08 true resid norm 5.692059590833e+01 ||r(i)||/||b|| 4.126843359582e-07
 48 KSP preconditioned resid norm 5.506505113710e-08 true resid norm 4.457038722481e+01 ||r(i)||/||b|| 3.231431498871e-07
 49 KSP preconditioned resid norm 3.976822689446e-08 true resid norm 3.530686596329e+01 ||r(i)||/||b|| 2.559809907523e-07
 50 KSP preconditioned resid norm 2.931961034973e-08 true resid norm 2.958079043384e+01 ||r(i)||/||b|| 2.144659356161e-07
Linear solve converged due to CONVERGED_RTOL iterations 50
KSP Object: 8 MPI processes
  type: gmres
    GMRES: restart=300, using Modified Gram-Schmidt Orthogonalization
    GMRES: happy breakdown tolerance 1e-30
  maximum iterations=300, initial guess is zero
  tolerances:  relative=1e-09, absolute=1e-20, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object: 8 MPI processes
  type: hypre
    HYPRE BoomerAMG preconditioning
    HYPRE BoomerAMG: Cycle type V
    HYPRE BoomerAMG: Maximum number of levels 25
    HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1
    HYPRE BoomerAMG: Convergence tolerance PER hypre call 0
    HYPRE BoomerAMG: Threshold for strong coupling 0.25
    HYPRE BoomerAMG: Interpolation truncation factor 0
    HYPRE BoomerAMG: Interpolation: max elements per row 0
    HYPRE BoomerAMG: Number of levels of aggressive coarsening 0
    HYPRE BoomerAMG: Number of paths for aggressive coarsening 1
    HYPRE BoomerAMG: Maximum row sums 0.9
    HYPRE BoomerAMG: Sweeps down         1
    HYPRE BoomerAMG: Sweeps up           1
    HYPRE BoomerAMG: Sweeps on coarse    1
    HYPRE BoomerAMG: Relax down          symmetric-SOR/Jacobi
    HYPRE BoomerAMG: Relax up            symmetric-SOR/Jacobi
    HYPRE BoomerAMG: Relax on coarse     Gaussian-elimination
    HYPRE BoomerAMG: Relax weight  (all)      1
    HYPRE BoomerAMG: Outer relax weight (all) 1
    HYPRE BoomerAMG: Using CF-relaxation
    HYPRE BoomerAMG: Measure type        local
    HYPRE BoomerAMG: Coarsen type        PMIS
    HYPRE BoomerAMG: Interpolation type  classical
  linear system matrix = precond matrix:
  Mat Object:   8 MPI processes
    type: mpiaij
    rows=670170, cols=670170, bs=3
    total: nonzeros=1.22417e+08, allocated nonzeros=1.77479e+08
    total number of mallocs used during MatSetValues calls =0
      using I-node (on process 0) routines: found 27322 nodes, limit used is 5
KSPSolve completed

From knepley at gmail.com  Fri Jan 22 04:15:48 2016
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 22 Jan 2016 04:15:48 -0600
Subject: [petsc-users] Why use MATMPIBAIJ?
In-Reply-To: <CAJW_hKdZsTHkQdN_zs=Ja0e-Z7jY5EMY+QPuRBZny8pri_iYEw@mail.gmail.com>
References: <CAP2=TMiuZV+DDPk6dqPjZ5Ndy8W5C5GaEreny7_FQuZ39i1gBw@mail.gmail.com>
	<2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov>
	<CAP2=TMhZeZ4JZ3AKARajab8CmFwdANnh0HOHqDDQRd8JfGkO3Q@mail.gmail.com>
	<DAA1C465-680F-4A5E-BA8F-9C6F7C055DAD@mcs.anl.gov>
	<CAP2=TMjwihEL1V7NPNByiMWfWfC5Q0MJyE1ByxnLokx1Df+OJA@mail.gmail.com>
	<CAJW_hKeU9MO8v+ub+5CjnaH3OpBkQNtGA8ZEiDtbYoQnSQ5rGQ@mail.gmail.com>
	<87fuy07zvi.fsf@jedbrown.org>
	<B1B62B13-BE6F-4B51-BC1A-F45FEF379854@mcs.anl.gov>
	<CAJW_hKcyTTCK1ErPFeu4hbeW+CD=PQJd_=H3xNXdO7TzhvfQOw@mail.gmail.com>
	<CAMYG4GkedkT9weo7UCUpTXP_f0jRBUYoySKXP5=q+os2Hpm=+Q@mail.gmail.com>
	<CAJW_hKfFoFSiwZ7kzCcXqFm3kkhLa4PBxTvb3NHHFWNLFqLyqA@mail.gmail.com>
	<87si1ug8hl.fsf@jedbrown.org>
	<CAJW_hKdZsTHkQdN_zs=Ja0e-Z7jY5EMY+QPuRBZny8pri_iYEw@mail.gmail.com>
Message-ID: <CAMYG4GmYXAF3E0qys6ijCdWRZSRCbcxHTK1TS3vACLrd8-d5Cw@mail.gmail.com>

On Fri, Jan 22, 2016 at 3:40 AM, Hoang Giang Bui <hgbk2008 at gmail.com> wrote:

> Hi Matt
> I would rather like to set the block size for block P2 too. Why?
>
> Because in one of my test (for problem involves only [u_x u_y u_z]), the
> gmres + Hypre AMG converges in 50 steps with block size 3, whereby it
> increases to 140 if block size is 1 (see attached files).
>

You can still do that. It can be done with options once the decomposition
is working. Its true that these solvers
work better with the block size set. However, if its the P2 Laplacian it
does not really matter since its uncoupled.

This gives me the impression that AMG will give better inversion for "P2"
> block if I can set its block size to 3. Of course it's still an hypothesis
> but worth to try.
>
> Another question: In one of the Petsc presentation, you said the Hypre AMG
> does not scale well, because set up cost amortize the iterations. How is it
> quantified? and what is the memory overhead?
>

I said the Hypre setup cost is not scalable, but it can be amortized over
the iterations. You can quantify this
just by looking at the PCSetUp time as your increase the number of
processes. I don't think they have a good
model for the memory usage, and if they do, I do not know what it is.
However, generally Hypre takes more
memory than the agglomeration MG like ML or GAMG.

  Thanks,

    Matt


>
> Giang
>
> On Mon, Jan 18, 2016 at 5:25 PM, Jed Brown <jed at jedbrown.org> wrote:
>
>> Hoang Giang Bui <hgbk2008 at gmail.com> writes:
>>
>> > Why P2/P2 is not for co-located discretization?
>>
>> Matt typed "P2/P2" when me meant "P2/P1".
>>
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160122/c572934c/attachment.html>

From hgbk2008 at gmail.com  Fri Jan 22 07:27:38 2016
From: hgbk2008 at gmail.com (Hoang Giang Bui)
Date: Fri, 22 Jan 2016 14:27:38 +0100
Subject: [petsc-users] Why use MATMPIBAIJ?
In-Reply-To: <CAMYG4GmYXAF3E0qys6ijCdWRZSRCbcxHTK1TS3vACLrd8-d5Cw@mail.gmail.com>
References: <CAP2=TMiuZV+DDPk6dqPjZ5Ndy8W5C5GaEreny7_FQuZ39i1gBw@mail.gmail.com>
	<2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov>
	<CAP2=TMhZeZ4JZ3AKARajab8CmFwdANnh0HOHqDDQRd8JfGkO3Q@mail.gmail.com>
	<DAA1C465-680F-4A5E-BA8F-9C6F7C055DAD@mcs.anl.gov>
	<CAP2=TMjwihEL1V7NPNByiMWfWfC5Q0MJyE1ByxnLokx1Df+OJA@mail.gmail.com>
	<CAJW_hKeU9MO8v+ub+5CjnaH3OpBkQNtGA8ZEiDtbYoQnSQ5rGQ@mail.gmail.com>
	<87fuy07zvi.fsf@jedbrown.org>
	<B1B62B13-BE6F-4B51-BC1A-F45FEF379854@mcs.anl.gov>
	<CAJW_hKcyTTCK1ErPFeu4hbeW+CD=PQJd_=H3xNXdO7TzhvfQOw@mail.gmail.com>
	<CAMYG4GkedkT9weo7UCUpTXP_f0jRBUYoySKXP5=q+os2Hpm=+Q@mail.gmail.com>
	<CAJW_hKfFoFSiwZ7kzCcXqFm3kkhLa4PBxTvb3NHHFWNLFqLyqA@mail.gmail.com>
	<87si1ug8hl.fsf@jedbrown.org>
	<CAJW_hKdZsTHkQdN_zs=Ja0e-Z7jY5EMY+QPuRBZny8pri_iYEw@mail.gmail.com>
	<CAMYG4GmYXAF3E0qys6ijCdWRZSRCbcxHTK1TS3vACLrd8-d5Cw@mail.gmail.com>
Message-ID: <CAJW_hKfWVBiXvthz=go2UFN1nf8ngyFh64yqTHNq=_xav5PaLw@mail.gmail.com>

DO you mean the option pc_fieldsplit_block_size? In this thread:

http://petsc-users.mcs.anl.narkive.com/qSHIOFhh/fieldsplit-error

It assumes you have a constant number of fields at each grid point, am I
right? However, my field split is not constant, like
[u1_x   u1_y    u1_z    p_1    u2_x    u2_y    u2_z    u3_x    u3_y    u3_z
   p_3    u4_x    u4_y    u4_z]

Subsequently the fieldsplit is
[u1_x   u1_y    u1_z    u2_x    u2_y    u2_z    u3_x    u3_y    u3_z   u4_x
   u4_y    u4_z]
[p_1    p_3]

Then what is the option to set block size 3 for split 0?

Sorry, I search several forum threads but cannot figure out the options as
you said.


> You can still do that. It can be done with options once the decomposition
> is working. Its true that these solvers
> work better with the block size set. However, if its the P2 Laplacian it
> does not really matter since its uncoupled.
>
> Yes, I agree it's uncoupled with the other field, but the crucial factor
defining the quality of the block preconditioner is the approximate
inversion of individual block. I would merely try block Jacobi first,
because it's quite simple. Nevertheless, fieldsplit implements other nice
things, like Schur complement, etc.


Giang


On Fri, Jan 22, 2016 at 11:15 AM, Matthew Knepley <knepley at gmail.com> wrote:

> On Fri, Jan 22, 2016 at 3:40 AM, Hoang Giang Bui <hgbk2008 at gmail.com>
> wrote:
>
>> Hi Matt
>> I would rather like to set the block size for block P2 too. Why?
>>
>> Because in one of my test (for problem involves only [u_x u_y u_z]), the
>> gmres + Hypre AMG converges in 50 steps with block size 3, whereby it
>> increases to 140 if block size is 1 (see attached files).
>>
>
> You can still do that. It can be done with options once the decomposition
> is working. Its true that these solvers
> work better with the block size set. However, if its the P2 Laplacian it
> does not really matter since its uncoupled.
>
> This gives me the impression that AMG will give better inversion for "P2"
>> block if I can set its block size to 3. Of course it's still an hypothesis
>> but worth to try.
>>
>> Another question: In one of the Petsc presentation, you said the Hypre
>> AMG does not scale well, because set up cost amortize the iterations. How
>> is it quantified? and what is the memory overhead?
>>
>
> I said the Hypre setup cost is not scalable, but it can be amortized over
> the iterations. You can quantify this
> just by looking at the PCSetUp time as your increase the number of
> processes. I don't think they have a good
> model for the memory usage, and if they do, I do not know what it is.
> However, generally Hypre takes more
> memory than the agglomeration MG like ML or GAMG.
>
>   Thanks,
>
>     Matt
>
>
>>
>> Giang
>>
>> On Mon, Jan 18, 2016 at 5:25 PM, Jed Brown <jed at jedbrown.org> wrote:
>>
>>> Hoang Giang Bui <hgbk2008 at gmail.com> writes:
>>>
>>> > Why P2/P2 is not for co-located discretization?
>>>
>>> Matt typed "P2/P2" when me meant "P2/P1".
>>>
>>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160122/c8392299/attachment.html>

From knepley at gmail.com  Fri Jan 22 07:57:22 2016
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 22 Jan 2016 07:57:22 -0600
Subject: [petsc-users] Why use MATMPIBAIJ?
In-Reply-To: <CAJW_hKfWVBiXvthz=go2UFN1nf8ngyFh64yqTHNq=_xav5PaLw@mail.gmail.com>
References: <CAP2=TMiuZV+DDPk6dqPjZ5Ndy8W5C5GaEreny7_FQuZ39i1gBw@mail.gmail.com>
	<2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov>
	<CAP2=TMhZeZ4JZ3AKARajab8CmFwdANnh0HOHqDDQRd8JfGkO3Q@mail.gmail.com>
	<DAA1C465-680F-4A5E-BA8F-9C6F7C055DAD@mcs.anl.gov>
	<CAP2=TMjwihEL1V7NPNByiMWfWfC5Q0MJyE1ByxnLokx1Df+OJA@mail.gmail.com>
	<CAJW_hKeU9MO8v+ub+5CjnaH3OpBkQNtGA8ZEiDtbYoQnSQ5rGQ@mail.gmail.com>
	<87fuy07zvi.fsf@jedbrown.org>
	<B1B62B13-BE6F-4B51-BC1A-F45FEF379854@mcs.anl.gov>
	<CAJW_hKcyTTCK1ErPFeu4hbeW+CD=PQJd_=H3xNXdO7TzhvfQOw@mail.gmail.com>
	<CAMYG4GkedkT9weo7UCUpTXP_f0jRBUYoySKXP5=q+os2Hpm=+Q@mail.gmail.com>
	<CAJW_hKfFoFSiwZ7kzCcXqFm3kkhLa4PBxTvb3NHHFWNLFqLyqA@mail.gmail.com>
	<87si1ug8hl.fsf@jedbrown.org>
	<CAJW_hKdZsTHkQdN_zs=Ja0e-Z7jY5EMY+QPuRBZny8pri_iYEw@mail.gmail.com>
	<CAMYG4GmYXAF3E0qys6ijCdWRZSRCbcxHTK1TS3vACLrd8-d5Cw@mail.gmail.com>
	<CAJW_hKfWVBiXvthz=go2UFN1nf8ngyFh64yqTHNq=_xav5PaLw@mail.gmail.com>
Message-ID: <CAMYG4GmGt4wCmSzxUYPHw6xnVmdmnvTc7+NY+m7ws6iQ0up-DQ@mail.gmail.com>

On Fri, Jan 22, 2016 at 7:27 AM, Hoang Giang Bui <hgbk2008 at gmail.com> wrote:

> DO you mean the option pc_fieldsplit_block_size? In this thread:
>
> http://petsc-users.mcs.anl.narkive.com/qSHIOFhh/fieldsplit-error
>

No. "Block Size" is confusing on PETSc since it is used to do several
things. Here block size
is being used to split the matrix. You do not need this since you are
prescribing your splits. The
matrix block size is used two ways:

  1) To indicate that matrix values come in logically dense blocks

  2) To change the storage to match this logical arrangement

After everything works, we can just indicate to the submatrix which is
extracted that it has a
certain block size. However, for the Laplacian I expect it not to matter.


> It assumes you have a constant number of fields at each grid point, am I
> right? However, my field split is not constant, like
> [u1_x   u1_y    u1_z    p_1    u2_x    u2_y    u2_z    u3_x    u3_y
>  u3_z    p_3    u4_x    u4_y    u4_z]
>
> Subsequently the fieldsplit is
> [u1_x   u1_y    u1_z    u2_x    u2_y    u2_z    u3_x    u3_y    u3_z
> u4_x    u4_y    u4_z]
> [p_1    p_3]
>
> Then what is the option to set block size 3 for split 0?
>
> Sorry, I search several forum threads but cannot figure out the options as
> you said.
>
>
>
>> You can still do that. It can be done with options once the decomposition
>> is working. Its true that these solvers
>> work better with the block size set. However, if its the P2 Laplacian it
>> does not really matter since its uncoupled.
>>
>> Yes, I agree it's uncoupled with the other field, but the crucial factor
> defining the quality of the block preconditioner is the approximate
> inversion of individual block. I would merely try block Jacobi first,
> because it's quite simple. Nevertheless, fieldsplit implements other nice
> things, like Schur complement, etc.
>

I think concepts are getting confused here. I was talking about the
interaction of components in one block (the P2 block). You
are talking about interaction between blocks.

  Thanks,

     Matt


> Giang
>
>
>
> On Fri, Jan 22, 2016 at 11:15 AM, Matthew Knepley <knepley at gmail.com>
> wrote:
>
>> On Fri, Jan 22, 2016 at 3:40 AM, Hoang Giang Bui <hgbk2008 at gmail.com>
>> wrote:
>>
>>> Hi Matt
>>> I would rather like to set the block size for block P2 too. Why?
>>>
>>> Because in one of my test (for problem involves only [u_x u_y u_z]), the
>>> gmres + Hypre AMG converges in 50 steps with block size 3, whereby it
>>> increases to 140 if block size is 1 (see attached files).
>>>
>>
>> You can still do that. It can be done with options once the decomposition
>> is working. Its true that these solvers
>> work better with the block size set. However, if its the P2 Laplacian it
>> does not really matter since its uncoupled.
>>
>> This gives me the impression that AMG will give better inversion for "P2"
>>> block if I can set its block size to 3. Of course it's still an hypothesis
>>> but worth to try.
>>>
>>> Another question: In one of the Petsc presentation, you said the Hypre
>>> AMG does not scale well, because set up cost amortize the iterations. How
>>> is it quantified? and what is the memory overhead?
>>>
>>
>> I said the Hypre setup cost is not scalable, but it can be amortized over
>> the iterations. You can quantify this
>> just by looking at the PCSetUp time as your increase the number of
>> processes. I don't think they have a good
>> model for the memory usage, and if they do, I do not know what it is.
>> However, generally Hypre takes more
>> memory than the agglomeration MG like ML or GAMG.
>>
>>   Thanks,
>>
>>     Matt
>>
>>
>>>
>>> Giang
>>>
>>> On Mon, Jan 18, 2016 at 5:25 PM, Jed Brown <jed at jedbrown.org> wrote:
>>>
>>>> Hoang Giang Bui <hgbk2008 at gmail.com> writes:
>>>>
>>>> > Why P2/P2 is not for co-located discretization?
>>>>
>>>> Matt typed "P2/P2" when me meant "P2/P1".
>>>>
>>>
>>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160122/cf536d1c/attachment-0001.html>

From mfadams at lbl.gov  Fri Jan 22 09:27:53 2016
From: mfadams at lbl.gov (Mark Adams)
Date: Fri, 22 Jan 2016 10:27:53 -0500
Subject: [petsc-users] Why use MATMPIBAIJ?
In-Reply-To: <CAMYG4GmYXAF3E0qys6ijCdWRZSRCbcxHTK1TS3vACLrd8-d5Cw@mail.gmail.com>
References: <CAP2=TMiuZV+DDPk6dqPjZ5Ndy8W5C5GaEreny7_FQuZ39i1gBw@mail.gmail.com>
	<2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov>
	<CAP2=TMhZeZ4JZ3AKARajab8CmFwdANnh0HOHqDDQRd8JfGkO3Q@mail.gmail.com>
	<DAA1C465-680F-4A5E-BA8F-9C6F7C055DAD@mcs.anl.gov>
	<CAP2=TMjwihEL1V7NPNByiMWfWfC5Q0MJyE1ByxnLokx1Df+OJA@mail.gmail.com>
	<CAJW_hKeU9MO8v+ub+5CjnaH3OpBkQNtGA8ZEiDtbYoQnSQ5rGQ@mail.gmail.com>
	<87fuy07zvi.fsf@jedbrown.org>
	<B1B62B13-BE6F-4B51-BC1A-F45FEF379854@mcs.anl.gov>
	<CAJW_hKcyTTCK1ErPFeu4hbeW+CD=PQJd_=H3xNXdO7TzhvfQOw@mail.gmail.com>
	<CAMYG4GkedkT9weo7UCUpTXP_f0jRBUYoySKXP5=q+os2Hpm=+Q@mail.gmail.com>
	<CAJW_hKfFoFSiwZ7kzCcXqFm3kkhLa4PBxTvb3NHHFWNLFqLyqA@mail.gmail.com>
	<87si1ug8hl.fsf@jedbrown.org>
	<CAJW_hKdZsTHkQdN_zs=Ja0e-Z7jY5EMY+QPuRBZny8pri_iYEw@mail.gmail.com>
	<CAMYG4GmYXAF3E0qys6ijCdWRZSRCbcxHTK1TS3vACLrd8-d5Cw@mail.gmail.com>
Message-ID: <CADOhEh7d4ip-yKHd6RParniBqA2XzkYrPvy3R4JgjMAz1yqgzg@mail.gmail.com>

>
>
>
> I said the Hypre setup cost is not scalable,
>

I'd be a little careful here.  Scaling for the matrix triple product is
hard and hypre does put effort into scaling. I don't have any data
however.  Do you?


> but it can be amortized over the iterations. You can quantify this
> just by looking at the PCSetUp time as your increase the number of
> processes. I don't think they have a good
> model for the memory usage, and if they do, I do not know what it is.
> However, generally Hypre takes more
> memory than the agglomeration MG like ML or GAMG.
>
>
agglomerations methods tend to have lower "grid complexity", that is
smaller coarse grids, than classic AMG like in hypre. THis is more of a
constant complexity and not a scaling issue though.  You can address this
with parameters to some extent. But for elasticity, you want to at least
try, if not start with, GAMG or ML.


>   Thanks,
>
>     Matt
>
>
>>
>> Giang
>>
>> On Mon, Jan 18, 2016 at 5:25 PM, Jed Brown <jed at jedbrown.org> wrote:
>>
>>> Hoang Giang Bui <hgbk2008 at gmail.com> writes:
>>>
>>> > Why P2/P2 is not for co-located discretization?
>>>
>>> Matt typed "P2/P2" when me meant "P2/P1".
>>>
>>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160122/4db0fbb7/attachment.html>

From knepley at gmail.com  Fri Jan 22 09:32:52 2016
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 22 Jan 2016 09:32:52 -0600
Subject: [petsc-users] Why use MATMPIBAIJ?
In-Reply-To: <CADOhEh7d4ip-yKHd6RParniBqA2XzkYrPvy3R4JgjMAz1yqgzg@mail.gmail.com>
References: <CAP2=TMiuZV+DDPk6dqPjZ5Ndy8W5C5GaEreny7_FQuZ39i1gBw@mail.gmail.com>
	<2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov>
	<CAP2=TMhZeZ4JZ3AKARajab8CmFwdANnh0HOHqDDQRd8JfGkO3Q@mail.gmail.com>
	<DAA1C465-680F-4A5E-BA8F-9C6F7C055DAD@mcs.anl.gov>
	<CAP2=TMjwihEL1V7NPNByiMWfWfC5Q0MJyE1ByxnLokx1Df+OJA@mail.gmail.com>
	<CAJW_hKeU9MO8v+ub+5CjnaH3OpBkQNtGA8ZEiDtbYoQnSQ5rGQ@mail.gmail.com>
	<87fuy07zvi.fsf@jedbrown.org>
	<B1B62B13-BE6F-4B51-BC1A-F45FEF379854@mcs.anl.gov>
	<CAJW_hKcyTTCK1ErPFeu4hbeW+CD=PQJd_=H3xNXdO7TzhvfQOw@mail.gmail.com>
	<CAMYG4GkedkT9weo7UCUpTXP_f0jRBUYoySKXP5=q+os2Hpm=+Q@mail.gmail.com>
	<CAJW_hKfFoFSiwZ7kzCcXqFm3kkhLa4PBxTvb3NHHFWNLFqLyqA@mail.gmail.com>
	<87si1ug8hl.fsf@jedbrown.org>
	<CAJW_hKdZsTHkQdN_zs=Ja0e-Z7jY5EMY+QPuRBZny8pri_iYEw@mail.gmail.com>
	<CAMYG4GmYXAF3E0qys6ijCdWRZSRCbcxHTK1TS3vACLrd8-d5Cw@mail.gmail.com>
	<CADOhEh7d4ip-yKHd6RParniBqA2XzkYrPvy3R4JgjMAz1yqgzg@mail.gmail.com>
Message-ID: <CAMYG4G=oBRei3LGuCZ_+uW+X6wHOe1TB5Hk87rULRENoXAS+xA@mail.gmail.com>

On Fri, Jan 22, 2016 at 9:27 AM, Mark Adams <mfadams at lbl.gov> wrote:

>
>>
>> I said the Hypre setup cost is not scalable,
>>
>
> I'd be a little careful here.  Scaling for the matrix triple product is
> hard and hypre does put effort into scaling. I don't have any data
> however.  Do you?
>

I used it for PyLith and saw this. I did not think any AMG had scalable
setup time.

   Matt


> but it can be amortized over the iterations. You can quantify this
>> just by looking at the PCSetUp time as your increase the number of
>> processes. I don't think they have a good
>> model for the memory usage, and if they do, I do not know what it is.
>> However, generally Hypre takes more
>> memory than the agglomeration MG like ML or GAMG.
>>
>>
> agglomerations methods tend to have lower "grid complexity", that is
> smaller coarse grids, than classic AMG like in hypre. THis is more of a
> constant complexity and not a scaling issue though.  You can address this
> with parameters to some extent. But for elasticity, you want to at least
> try, if not start with, GAMG or ML.
>
>
>>   Thanks,
>>
>>     Matt
>>
>>
>>>
>>> Giang
>>>
>>> On Mon, Jan 18, 2016 at 5:25 PM, Jed Brown <jed at jedbrown.org> wrote:
>>>
>>>> Hoang Giang Bui <hgbk2008 at gmail.com> writes:
>>>>
>>>> > Why P2/P2 is not for co-located discretization?
>>>>
>>>> Matt typed "P2/P2" when me meant "P2/P1".
>>>>
>>>
>>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160122/b3d03570/attachment.html>

From hng.email at gmail.com  Fri Jan 22 10:52:27 2016
From: hng.email at gmail.com (Hom Nath Gharti)
Date: Fri, 22 Jan 2016 11:52:27 -0500
Subject: [petsc-users] Why use MATMPIBAIJ?
In-Reply-To: <CAMYG4G=oBRei3LGuCZ_+uW+X6wHOe1TB5Hk87rULRENoXAS+xA@mail.gmail.com>
References: <CAP2=TMiuZV+DDPk6dqPjZ5Ndy8W5C5GaEreny7_FQuZ39i1gBw@mail.gmail.com>
	<2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov>
	<CAP2=TMhZeZ4JZ3AKARajab8CmFwdANnh0HOHqDDQRd8JfGkO3Q@mail.gmail.com>
	<DAA1C465-680F-4A5E-BA8F-9C6F7C055DAD@mcs.anl.gov>
	<CAP2=TMjwihEL1V7NPNByiMWfWfC5Q0MJyE1ByxnLokx1Df+OJA@mail.gmail.com>
	<CAJW_hKeU9MO8v+ub+5CjnaH3OpBkQNtGA8ZEiDtbYoQnSQ5rGQ@mail.gmail.com>
	<87fuy07zvi.fsf@jedbrown.org>
	<B1B62B13-BE6F-4B51-BC1A-F45FEF379854@mcs.anl.gov>
	<CAJW_hKcyTTCK1ErPFeu4hbeW+CD=PQJd_=H3xNXdO7TzhvfQOw@mail.gmail.com>
	<CAMYG4GkedkT9weo7UCUpTXP_f0jRBUYoySKXP5=q+os2Hpm=+Q@mail.gmail.com>
	<CAJW_hKfFoFSiwZ7kzCcXqFm3kkhLa4PBxTvb3NHHFWNLFqLyqA@mail.gmail.com>
	<87si1ug8hl.fsf@jedbrown.org>
	<CAJW_hKdZsTHkQdN_zs=Ja0e-Z7jY5EMY+QPuRBZny8pri_iYEw@mail.gmail.com>
	<CAMYG4GmYXAF3E0qys6ijCdWRZSRCbcxHTK1TS3vACLrd8-d5Cw@mail.gmail.com>
	<CADOhEh7d4ip-yKHd6RParniBqA2XzkYrPvy3R4JgjMAz1yqgzg@mail.gmail.com>
	<CAMYG4G=oBRei3LGuCZ_+uW+X6wHOe1TB5Hk87rULRENoXAS+xA@mail.gmail.com>
Message-ID: <CAL+XNdUiPcMM+L8O5bsF70SX0EVp3VpUaxq=2=SoAy8kvc_bRA@mail.gmail.com>

Dear all,

I take this opportunity to ask for your important suggestion.

I am solving an elastic-acoustic-gravity equation on the planet. I
have displacement vector (ux,uy,uz) in solid region, displacement
potential (\xi) and pressure (p) in fluid region, and gravitational
potential (\phi) in all of space. All these variables are coupled.

Currently, I am using MATMPIAIJ and form a single global matrix. Does
using a MATMPIBIJ or MATNEST improve the convergence/efficiency in
this case? For your information, total degrees of freedoms are about a
billion.

Any suggestion would be greatly appreciated.

Thanks,
Hom Nath

On Fri, Jan 22, 2016 at 10:32 AM, Matthew Knepley <knepley at gmail.com> wrote:
> On Fri, Jan 22, 2016 at 9:27 AM, Mark Adams <mfadams at lbl.gov> wrote:
>>>
>>>
>>>
>>> I said the Hypre setup cost is not scalable,
>>
>>
>> I'd be a little careful here.  Scaling for the matrix triple product is
>> hard and hypre does put effort into scaling. I don't have any data however.
>> Do you?
>
>
> I used it for PyLith and saw this. I did not think any AMG had scalable
> setup time.
>
>    Matt
>
>>>
>>> but it can be amortized over the iterations. You can quantify this
>>> just by looking at the PCSetUp time as your increase the number of
>>> processes. I don't think they have a good
>>> model for the memory usage, and if they do, I do not know what it is.
>>> However, generally Hypre takes more
>>> memory than the agglomeration MG like ML or GAMG.
>>>
>>
>> agglomerations methods tend to have lower "grid complexity", that is
>> smaller coarse grids, than classic AMG like in hypre. THis is more of a
>> constant complexity and not a scaling issue though.  You can address this
>> with parameters to some extent. But for elasticity, you want to at least
>> try, if not start with, GAMG or ML.
>>
>>>
>>>   Thanks,
>>>
>>>     Matt
>>>
>>>>
>>>>
>>>> Giang
>>>>
>>>> On Mon, Jan 18, 2016 at 5:25 PM, Jed Brown <jed at jedbrown.org> wrote:
>>>>>
>>>>> Hoang Giang Bui <hgbk2008 at gmail.com> writes:
>>>>>
>>>>> > Why P2/P2 is not for co-located discretization?
>>>>>
>>>>> Matt typed "P2/P2" when me meant "P2/P1".
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>
>>
>
>
>
> --
> What most experimenters take for granted before they begin their experiments
> is infinitely more interesting than any results to which their experiments
> lead.
> -- Norbert Wiener

From knepley at gmail.com  Fri Jan 22 11:01:35 2016
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 22 Jan 2016 11:01:35 -0600
Subject: [petsc-users] Why use MATMPIBAIJ?
In-Reply-To: <CAL+XNdUiPcMM+L8O5bsF70SX0EVp3VpUaxq=2=SoAy8kvc_bRA@mail.gmail.com>
References: <CAP2=TMiuZV+DDPk6dqPjZ5Ndy8W5C5GaEreny7_FQuZ39i1gBw@mail.gmail.com>
	<2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov>
	<CAP2=TMhZeZ4JZ3AKARajab8CmFwdANnh0HOHqDDQRd8JfGkO3Q@mail.gmail.com>
	<DAA1C465-680F-4A5E-BA8F-9C6F7C055DAD@mcs.anl.gov>
	<CAP2=TMjwihEL1V7NPNByiMWfWfC5Q0MJyE1ByxnLokx1Df+OJA@mail.gmail.com>
	<CAJW_hKeU9MO8v+ub+5CjnaH3OpBkQNtGA8ZEiDtbYoQnSQ5rGQ@mail.gmail.com>
	<87fuy07zvi.fsf@jedbrown.org>
	<B1B62B13-BE6F-4B51-BC1A-F45FEF379854@mcs.anl.gov>
	<CAJW_hKcyTTCK1ErPFeu4hbeW+CD=PQJd_=H3xNXdO7TzhvfQOw@mail.gmail.com>
	<CAMYG4GkedkT9weo7UCUpTXP_f0jRBUYoySKXP5=q+os2Hpm=+Q@mail.gmail.com>
	<CAJW_hKfFoFSiwZ7kzCcXqFm3kkhLa4PBxTvb3NHHFWNLFqLyqA@mail.gmail.com>
	<87si1ug8hl.fsf@jedbrown.org>
	<CAJW_hKdZsTHkQdN_zs=Ja0e-Z7jY5EMY+QPuRBZny8pri_iYEw@mail.gmail.com>
	<CAMYG4GmYXAF3E0qys6ijCdWRZSRCbcxHTK1TS3vACLrd8-d5Cw@mail.gmail.com>
	<CADOhEh7d4ip-yKHd6RParniBqA2XzkYrPvy3R4JgjMAz1yqgzg@mail.gmail.com>
	<CAMYG4G=oBRei3LGuCZ_+uW+X6wHOe1TB5Hk87rULRENoXAS+xA@mail.gmail.com>
	<CAL+XNdUiPcMM+L8O5bsF70SX0EVp3VpUaxq=2=SoAy8kvc_bRA@mail.gmail.com>
Message-ID: <CAMYG4GmfA-kY2U4mtNwM-NcUzSBP45NG3H-gYUYh4V5mib-mFA@mail.gmail.com>

On Fri, Jan 22, 2016 at 10:52 AM, Hom Nath Gharti <hng.email at gmail.com>
wrote:

> Dear all,
>
> I take this opportunity to ask for your important suggestion.
>
> I am solving an elastic-acoustic-gravity equation on the planet. I
> have displacement vector (ux,uy,uz) in solid region, displacement
> potential (\xi) and pressure (p) in fluid region, and gravitational
> potential (\phi) in all of space. All these variables are coupled.
>
> Currently, I am using MATMPIAIJ and form a single global matrix. Does
> using a MATMPIBIJ or MATNEST improve the convergence/efficiency in
> this case? For your information, total degrees of freedoms are about a
> billion.
>

1) For any solver question, we need to see the output of -ksp_view, and we
would also like

  -ksp_monitor_true_residual -ksp_converged_reason

2) MATNEST does not affect convergence, and MATMPIBAIJ only in the
blocksize which you
    could set without that format

3) However, you might see benefit from using something like PCFIELDSPLIT if
you have multiphysics here

   Matt


> Any suggestion would be greatly appreciated.
>
> Thanks,
> Hom Nath
>
> On Fri, Jan 22, 2016 at 10:32 AM, Matthew Knepley <knepley at gmail.com>
> wrote:
> > On Fri, Jan 22, 2016 at 9:27 AM, Mark Adams <mfadams at lbl.gov> wrote:
> >>>
> >>>
> >>>
> >>> I said the Hypre setup cost is not scalable,
> >>
> >>
> >> I'd be a little careful here.  Scaling for the matrix triple product is
> >> hard and hypre does put effort into scaling. I don't have any data
> however.
> >> Do you?
> >
> >
> > I used it for PyLith and saw this. I did not think any AMG had scalable
> > setup time.
> >
> >    Matt
> >
> >>>
> >>> but it can be amortized over the iterations. You can quantify this
> >>> just by looking at the PCSetUp time as your increase the number of
> >>> processes. I don't think they have a good
> >>> model for the memory usage, and if they do, I do not know what it is.
> >>> However, generally Hypre takes more
> >>> memory than the agglomeration MG like ML or GAMG.
> >>>
> >>
> >> agglomerations methods tend to have lower "grid complexity", that is
> >> smaller coarse grids, than classic AMG like in hypre. THis is more of a
> >> constant complexity and not a scaling issue though.  You can address
> this
> >> with parameters to some extent. But for elasticity, you want to at least
> >> try, if not start with, GAMG or ML.
> >>
> >>>
> >>>   Thanks,
> >>>
> >>>     Matt
> >>>
> >>>>
> >>>>
> >>>> Giang
> >>>>
> >>>> On Mon, Jan 18, 2016 at 5:25 PM, Jed Brown <jed at jedbrown.org> wrote:
> >>>>>
> >>>>> Hoang Giang Bui <hgbk2008 at gmail.com> writes:
> >>>>>
> >>>>> > Why P2/P2 is not for co-located discretization?
> >>>>>
> >>>>> Matt typed "P2/P2" when me meant "P2/P1".
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> What most experimenters take for granted before they begin their
> >>> experiments is infinitely more interesting than any results to which
> their
> >>> experiments lead.
> >>> -- Norbert Wiener
> >>
> >>
> >
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> experiments
> > is infinitely more interesting than any results to which their
> experiments
> > lead.
> > -- Norbert Wiener
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160122/912cfe9d/attachment-0001.html>

From hng.email at gmail.com  Fri Jan 22 11:10:54 2016
From: hng.email at gmail.com (Hom Nath Gharti)
Date: Fri, 22 Jan 2016 12:10:54 -0500
Subject: [petsc-users] Why use MATMPIBAIJ?
In-Reply-To: <CAMYG4GmfA-kY2U4mtNwM-NcUzSBP45NG3H-gYUYh4V5mib-mFA@mail.gmail.com>
References: <CAP2=TMiuZV+DDPk6dqPjZ5Ndy8W5C5GaEreny7_FQuZ39i1gBw@mail.gmail.com>
	<2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov>
	<CAP2=TMhZeZ4JZ3AKARajab8CmFwdANnh0HOHqDDQRd8JfGkO3Q@mail.gmail.com>
	<DAA1C465-680F-4A5E-BA8F-9C6F7C055DAD@mcs.anl.gov>
	<CAP2=TMjwihEL1V7NPNByiMWfWfC5Q0MJyE1ByxnLokx1Df+OJA@mail.gmail.com>
	<CAJW_hKeU9MO8v+ub+5CjnaH3OpBkQNtGA8ZEiDtbYoQnSQ5rGQ@mail.gmail.com>
	<87fuy07zvi.fsf@jedbrown.org>
	<B1B62B13-BE6F-4B51-BC1A-F45FEF379854@mcs.anl.gov>
	<CAJW_hKcyTTCK1ErPFeu4hbeW+CD=PQJd_=H3xNXdO7TzhvfQOw@mail.gmail.com>
	<CAMYG4GkedkT9weo7UCUpTXP_f0jRBUYoySKXP5=q+os2Hpm=+Q@mail.gmail.com>
	<CAJW_hKfFoFSiwZ7kzCcXqFm3kkhLa4PBxTvb3NHHFWNLFqLyqA@mail.gmail.com>
	<87si1ug8hl.fsf@jedbrown.org>
	<CAJW_hKdZsTHkQdN_zs=Ja0e-Z7jY5EMY+QPuRBZny8pri_iYEw@mail.gmail.com>
	<CAMYG4GmYXAF3E0qys6ijCdWRZSRCbcxHTK1TS3vACLrd8-d5Cw@mail.gmail.com>
	<CADOhEh7d4ip-yKHd6RParniBqA2XzkYrPvy3R4JgjMAz1yqgzg@mail.gmail.com>
	<CAMYG4G=oBRei3LGuCZ_+uW+X6wHOe1TB5Hk87rULRENoXAS+xA@mail.gmail.com>
	<CAL+XNdUiPcMM+L8O5bsF70SX0EVp3VpUaxq=2=SoAy8kvc_bRA@mail.gmail.com>
	<CAMYG4GmfA-kY2U4mtNwM-NcUzSBP45NG3H-gYUYh4V5mib-mFA@mail.gmail.com>
Message-ID: <CAL+XNdWpz9jaHsETZ8M6CroSXbyNSwpiti15Oz4tj8qama2xhw@mail.gmail.com>

Thanks Matt.

Attached detailed info on ksp of a much smaller test. This is a
multiphysics problem.

Hom Nath

On Fri, Jan 22, 2016 at 12:01 PM, Matthew Knepley <knepley at gmail.com> wrote:
> On Fri, Jan 22, 2016 at 10:52 AM, Hom Nath Gharti <hng.email at gmail.com>
> wrote:
>>
>> Dear all,
>>
>> I take this opportunity to ask for your important suggestion.
>>
>> I am solving an elastic-acoustic-gravity equation on the planet. I
>> have displacement vector (ux,uy,uz) in solid region, displacement
>> potential (\xi) and pressure (p) in fluid region, and gravitational
>> potential (\phi) in all of space. All these variables are coupled.
>>
>> Currently, I am using MATMPIAIJ and form a single global matrix. Does
>> using a MATMPIBIJ or MATNEST improve the convergence/efficiency in
>> this case? For your information, total degrees of freedoms are about a
>> billion.
>
>
> 1) For any solver question, we need to see the output of -ksp_view, and we
> would also like
>
>   -ksp_monitor_true_residual -ksp_converged_reason
>
> 2) MATNEST does not affect convergence, and MATMPIBAIJ only in the blocksize
> which you
>     could set without that format
>
> 3) However, you might see benefit from using something like PCFIELDSPLIT if
> you have multiphysics here
>
>    Matt
>
>>
>> Any suggestion would be greatly appreciated.
>>
>> Thanks,
>> Hom Nath
>>
>> On Fri, Jan 22, 2016 at 10:32 AM, Matthew Knepley <knepley at gmail.com>
>> wrote:
>> > On Fri, Jan 22, 2016 at 9:27 AM, Mark Adams <mfadams at lbl.gov> wrote:
>> >>>
>> >>>
>> >>>
>> >>> I said the Hypre setup cost is not scalable,
>> >>
>> >>
>> >> I'd be a little careful here.  Scaling for the matrix triple product is
>> >> hard and hypre does put effort into scaling. I don't have any data
>> >> however.
>> >> Do you?
>> >
>> >
>> > I used it for PyLith and saw this. I did not think any AMG had scalable
>> > setup time.
>> >
>> >    Matt
>> >
>> >>>
>> >>> but it can be amortized over the iterations. You can quantify this
>> >>> just by looking at the PCSetUp time as your increase the number of
>> >>> processes. I don't think they have a good
>> >>> model for the memory usage, and if they do, I do not know what it is.
>> >>> However, generally Hypre takes more
>> >>> memory than the agglomeration MG like ML or GAMG.
>> >>>
>> >>
>> >> agglomerations methods tend to have lower "grid complexity", that is
>> >> smaller coarse grids, than classic AMG like in hypre. THis is more of a
>> >> constant complexity and not a scaling issue though.  You can address
>> >> this
>> >> with parameters to some extent. But for elasticity, you want to at
>> >> least
>> >> try, if not start with, GAMG or ML.
>> >>
>> >>>
>> >>>   Thanks,
>> >>>
>> >>>     Matt
>> >>>
>> >>>>
>> >>>>
>> >>>> Giang
>> >>>>
>> >>>> On Mon, Jan 18, 2016 at 5:25 PM, Jed Brown <jed at jedbrown.org> wrote:
>> >>>>>
>> >>>>> Hoang Giang Bui <hgbk2008 at gmail.com> writes:
>> >>>>>
>> >>>>> > Why P2/P2 is not for co-located discretization?
>> >>>>>
>> >>>>> Matt typed "P2/P2" when me meant "P2/P1".
>> >>>>
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> What most experimenters take for granted before they begin their
>> >>> experiments is infinitely more interesting than any results to which
>> >>> their
>> >>> experiments lead.
>> >>> -- Norbert Wiener
>> >>
>> >>
>> >
>> >
>> >
>> > --
>> > What most experimenters take for granted before they begin their
>> > experiments
>> > is infinitely more interesting than any results to which their
>> > experiments
>> > lead.
>> > -- Norbert Wiener
>
>
>
>
> --
> What most experimenters take for granted before they begin their experiments
> is infinitely more interesting than any results to which their experiments
> lead.
> -- Norbert Wiener
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ksplog
Type: application/octet-stream
Size: 14041 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160122/dcd8e24d/attachment.obj>

From knepley at gmail.com  Fri Jan 22 11:16:22 2016
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 22 Jan 2016 11:16:22 -0600
Subject: [petsc-users] Why use MATMPIBAIJ?
In-Reply-To: <CAL+XNdWpz9jaHsETZ8M6CroSXbyNSwpiti15Oz4tj8qama2xhw@mail.gmail.com>
References: <CAP2=TMiuZV+DDPk6dqPjZ5Ndy8W5C5GaEreny7_FQuZ39i1gBw@mail.gmail.com>
	<2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov>
	<CAP2=TMhZeZ4JZ3AKARajab8CmFwdANnh0HOHqDDQRd8JfGkO3Q@mail.gmail.com>
	<DAA1C465-680F-4A5E-BA8F-9C6F7C055DAD@mcs.anl.gov>
	<CAP2=TMjwihEL1V7NPNByiMWfWfC5Q0MJyE1ByxnLokx1Df+OJA@mail.gmail.com>
	<CAJW_hKeU9MO8v+ub+5CjnaH3OpBkQNtGA8ZEiDtbYoQnSQ5rGQ@mail.gmail.com>
	<87fuy07zvi.fsf@jedbrown.org>
	<B1B62B13-BE6F-4B51-BC1A-F45FEF379854@mcs.anl.gov>
	<CAJW_hKcyTTCK1ErPFeu4hbeW+CD=PQJd_=H3xNXdO7TzhvfQOw@mail.gmail.com>
	<CAMYG4GkedkT9weo7UCUpTXP_f0jRBUYoySKXP5=q+os2Hpm=+Q@mail.gmail.com>
	<CAJW_hKfFoFSiwZ7kzCcXqFm3kkhLa4PBxTvb3NHHFWNLFqLyqA@mail.gmail.com>
	<87si1ug8hl.fsf@jedbrown.org>
	<CAJW_hKdZsTHkQdN_zs=Ja0e-Z7jY5EMY+QPuRBZny8pri_iYEw@mail.gmail.com>
	<CAMYG4GmYXAF3E0qys6ijCdWRZSRCbcxHTK1TS3vACLrd8-d5Cw@mail.gmail.com>
	<CADOhEh7d4ip-yKHd6RParniBqA2XzkYrPvy3R4JgjMAz1yqgzg@mail.gmail.com>
	<CAMYG4G=oBRei3LGuCZ_+uW+X6wHOe1TB5Hk87rULRENoXAS+xA@mail.gmail.com>
	<CAL+XNdUiPcMM+L8O5bsF70SX0EVp3VpUaxq=2=SoAy8kvc_bRA@mail.gmail.com>
	<CAMYG4GmfA-kY2U4mtNwM-NcUzSBP45NG3H-gYUYh4V5mib-mFA@mail.gmail.com>
	<CAL+XNdWpz9jaHsETZ8M6CroSXbyNSwpiti15Oz4tj8qama2xhw@mail.gmail.com>
Message-ID: <CAMYG4Gn4BMMOC7hJmEKh8+jeBxKsy6-b-VK6-_xH3mM9E6nQ5Q@mail.gmail.com>

On Fri, Jan 22, 2016 at 11:10 AM, Hom Nath Gharti <hng.email at gmail.com>
wrote:

> Thanks Matt.
>
> Attached detailed info on ksp of a much smaller test. This is a
> multiphysics problem.
>

You are using FGMRES/ASM(ILU0). From your description below, this sounds
like
an elliptic system. I would at least try AMG (-pc_type gamg) to see how it
does. Any
other advice would have to be based on seeing the equations.

  Thanks,

    Matt


> Hom Nath
>
> On Fri, Jan 22, 2016 at 12:01 PM, Matthew Knepley <knepley at gmail.com>
> wrote:
> > On Fri, Jan 22, 2016 at 10:52 AM, Hom Nath Gharti <hng.email at gmail.com>
> > wrote:
> >>
> >> Dear all,
> >>
> >> I take this opportunity to ask for your important suggestion.
> >>
> >> I am solving an elastic-acoustic-gravity equation on the planet. I
> >> have displacement vector (ux,uy,uz) in solid region, displacement
> >> potential (\xi) and pressure (p) in fluid region, and gravitational
> >> potential (\phi) in all of space. All these variables are coupled.
> >>
> >> Currently, I am using MATMPIAIJ and form a single global matrix. Does
> >> using a MATMPIBIJ or MATNEST improve the convergence/efficiency in
> >> this case? For your information, total degrees of freedoms are about a
> >> billion.
> >
> >
> > 1) For any solver question, we need to see the output of -ksp_view, and
> we
> > would also like
> >
> >   -ksp_monitor_true_residual -ksp_converged_reason
> >
> > 2) MATNEST does not affect convergence, and MATMPIBAIJ only in the
> blocksize
> > which you
> >     could set without that format
> >
> > 3) However, you might see benefit from using something like PCFIELDSPLIT
> if
> > you have multiphysics here
> >
> >    Matt
> >
> >>
> >> Any suggestion would be greatly appreciated.
> >>
> >> Thanks,
> >> Hom Nath
> >>
> >> On Fri, Jan 22, 2016 at 10:32 AM, Matthew Knepley <knepley at gmail.com>
> >> wrote:
> >> > On Fri, Jan 22, 2016 at 9:27 AM, Mark Adams <mfadams at lbl.gov> wrote:
> >> >>>
> >> >>>
> >> >>>
> >> >>> I said the Hypre setup cost is not scalable,
> >> >>
> >> >>
> >> >> I'd be a little careful here.  Scaling for the matrix triple product
> is
> >> >> hard and hypre does put effort into scaling. I don't have any data
> >> >> however.
> >> >> Do you?
> >> >
> >> >
> >> > I used it for PyLith and saw this. I did not think any AMG had
> scalable
> >> > setup time.
> >> >
> >> >    Matt
> >> >
> >> >>>
> >> >>> but it can be amortized over the iterations. You can quantify this
> >> >>> just by looking at the PCSetUp time as your increase the number of
> >> >>> processes. I don't think they have a good
> >> >>> model for the memory usage, and if they do, I do not know what it
> is.
> >> >>> However, generally Hypre takes more
> >> >>> memory than the agglomeration MG like ML or GAMG.
> >> >>>
> >> >>
> >> >> agglomerations methods tend to have lower "grid complexity", that is
> >> >> smaller coarse grids, than classic AMG like in hypre. THis is more
> of a
> >> >> constant complexity and not a scaling issue though.  You can address
> >> >> this
> >> >> with parameters to some extent. But for elasticity, you want to at
> >> >> least
> >> >> try, if not start with, GAMG or ML.
> >> >>
> >> >>>
> >> >>>   Thanks,
> >> >>>
> >> >>>     Matt
> >> >>>
> >> >>>>
> >> >>>>
> >> >>>> Giang
> >> >>>>
> >> >>>> On Mon, Jan 18, 2016 at 5:25 PM, Jed Brown <jed at jedbrown.org>
> wrote:
> >> >>>>>
> >> >>>>> Hoang Giang Bui <hgbk2008 at gmail.com> writes:
> >> >>>>>
> >> >>>>> > Why P2/P2 is not for co-located discretization?
> >> >>>>>
> >> >>>>> Matt typed "P2/P2" when me meant "P2/P1".
> >> >>>>
> >> >>>>
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> What most experimenters take for granted before they begin their
> >> >>> experiments is infinitely more interesting than any results to which
> >> >>> their
> >> >>> experiments lead.
> >> >>> -- Norbert Wiener
> >> >>
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > What most experimenters take for granted before they begin their
> >> > experiments
> >> > is infinitely more interesting than any results to which their
> >> > experiments
> >> > lead.
> >> > -- Norbert Wiener
> >
> >
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> experiments
> > is infinitely more interesting than any results to which their
> experiments
> > lead.
> > -- Norbert Wiener
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160122/fa52c14d/attachment-0001.html>

From hng.email at gmail.com  Fri Jan 22 11:47:47 2016
From: hng.email at gmail.com (Hom Nath Gharti)
Date: Fri, 22 Jan 2016 12:47:47 -0500
Subject: [petsc-users] Why use MATMPIBAIJ?
In-Reply-To: <CAMYG4Gn4BMMOC7hJmEKh8+jeBxKsy6-b-VK6-_xH3mM9E6nQ5Q@mail.gmail.com>
References: <CAP2=TMiuZV+DDPk6dqPjZ5Ndy8W5C5GaEreny7_FQuZ39i1gBw@mail.gmail.com>
	<2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov>
	<CAP2=TMhZeZ4JZ3AKARajab8CmFwdANnh0HOHqDDQRd8JfGkO3Q@mail.gmail.com>
	<DAA1C465-680F-4A5E-BA8F-9C6F7C055DAD@mcs.anl.gov>
	<CAP2=TMjwihEL1V7NPNByiMWfWfC5Q0MJyE1ByxnLokx1Df+OJA@mail.gmail.com>
	<CAJW_hKeU9MO8v+ub+5CjnaH3OpBkQNtGA8ZEiDtbYoQnSQ5rGQ@mail.gmail.com>
	<87fuy07zvi.fsf@jedbrown.org>
	<B1B62B13-BE6F-4B51-BC1A-F45FEF379854@mcs.anl.gov>
	<CAJW_hKcyTTCK1ErPFeu4hbeW+CD=PQJd_=H3xNXdO7TzhvfQOw@mail.gmail.com>
	<CAMYG4GkedkT9weo7UCUpTXP_f0jRBUYoySKXP5=q+os2Hpm=+Q@mail.gmail.com>
	<CAJW_hKfFoFSiwZ7kzCcXqFm3kkhLa4PBxTvb3NHHFWNLFqLyqA@mail.gmail.com>
	<87si1ug8hl.fsf@jedbrown.org>
	<CAJW_hKdZsTHkQdN_zs=Ja0e-Z7jY5EMY+QPuRBZny8pri_iYEw@mail.gmail.com>
	<CAMYG4GmYXAF3E0qys6ijCdWRZSRCbcxHTK1TS3vACLrd8-d5Cw@mail.gmail.com>
	<CADOhEh7d4ip-yKHd6RParniBqA2XzkYrPvy3R4JgjMAz1yqgzg@mail.gmail.com>
	<CAMYG4G=oBRei3LGuCZ_+uW+X6wHOe1TB5Hk87rULRENoXAS+xA@mail.gmail.com>
	<CAL+XNdUiPcMM+L8O5bsF70SX0EVp3VpUaxq=2=SoAy8kvc_bRA@mail.gmail.com>
	<CAMYG4GmfA-kY2U4mtNwM-NcUzSBP45NG3H-gYUYh4V5mib-mFA@mail.gmail.com>
	<CAL+XNdWpz9jaHsETZ8M6CroSXbyNSwpiti15Oz4tj8qama2xhw@mail.gmail.com>
	<CAMYG4Gn4BMMOC7hJmEKh8+jeBxKsy6-b-VK6-_xH3mM9E6nQ5Q@mail.gmail.com>
Message-ID: <CAL+XNdXZ1TBCWEpa-D05vqWNNcTnRSu3KC6MNFHG=NsuXNn46A@mail.gmail.com>

Thanks a lot.

With AMG it did not converge within the iteration limit of 3000.

In solid: elastic wave equation with added gravity term \rho \nabla\phi
In fluid: acoustic wave equation with added gravity term \rho \nabla\phi
Both solid and fluid: Poisson's equation for gravity
Outer space: Laplace's equation for gravity

We combine so called mapped infinite element with spectral-element
method (higher order FEM that uses nodal quadrature) and solve in
frequency domain.

Hom Nath

On Fri, Jan 22, 2016 at 12:16 PM, Matthew Knepley <knepley at gmail.com> wrote:
> On Fri, Jan 22, 2016 at 11:10 AM, Hom Nath Gharti <hng.email at gmail.com>
> wrote:
>>
>> Thanks Matt.
>>
>> Attached detailed info on ksp of a much smaller test. This is a
>> multiphysics problem.
>
>
> You are using FGMRES/ASM(ILU0). From your description below, this sounds
> like
> an elliptic system. I would at least try AMG (-pc_type gamg) to see how it
> does. Any
> other advice would have to be based on seeing the equations.
>
>   Thanks,
>
>     Matt
>
>>
>> Hom Nath
>>
>> On Fri, Jan 22, 2016 at 12:01 PM, Matthew Knepley <knepley at gmail.com>
>> wrote:
>> > On Fri, Jan 22, 2016 at 10:52 AM, Hom Nath Gharti <hng.email at gmail.com>
>> > wrote:
>> >>
>> >> Dear all,
>> >>
>> >> I take this opportunity to ask for your important suggestion.
>> >>
>> >> I am solving an elastic-acoustic-gravity equation on the planet. I
>> >> have displacement vector (ux,uy,uz) in solid region, displacement
>> >> potential (\xi) and pressure (p) in fluid region, and gravitational
>> >> potential (\phi) in all of space. All these variables are coupled.
>> >>
>> >> Currently, I am using MATMPIAIJ and form a single global matrix. Does
>> >> using a MATMPIBIJ or MATNEST improve the convergence/efficiency in
>> >> this case? For your information, total degrees of freedoms are about a
>> >> billion.
>> >
>> >
>> > 1) For any solver question, we need to see the output of -ksp_view, and
>> > we
>> > would also like
>> >
>> >   -ksp_monitor_true_residual -ksp_converged_reason
>> >
>> > 2) MATNEST does not affect convergence, and MATMPIBAIJ only in the
>> > blocksize
>> > which you
>> >     could set without that format
>> >
>> > 3) However, you might see benefit from using something like PCFIELDSPLIT
>> > if
>> > you have multiphysics here
>> >
>> >    Matt
>> >
>> >>
>> >> Any suggestion would be greatly appreciated.
>> >>
>> >> Thanks,
>> >> Hom Nath
>> >>
>> >> On Fri, Jan 22, 2016 at 10:32 AM, Matthew Knepley <knepley at gmail.com>
>> >> wrote:
>> >> > On Fri, Jan 22, 2016 at 9:27 AM, Mark Adams <mfadams at lbl.gov> wrote:
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> I said the Hypre setup cost is not scalable,
>> >> >>
>> >> >>
>> >> >> I'd be a little careful here.  Scaling for the matrix triple product
>> >> >> is
>> >> >> hard and hypre does put effort into scaling. I don't have any data
>> >> >> however.
>> >> >> Do you?
>> >> >
>> >> >
>> >> > I used it for PyLith and saw this. I did not think any AMG had
>> >> > scalable
>> >> > setup time.
>> >> >
>> >> >    Matt
>> >> >
>> >> >>>
>> >> >>> but it can be amortized over the iterations. You can quantify this
>> >> >>> just by looking at the PCSetUp time as your increase the number of
>> >> >>> processes. I don't think they have a good
>> >> >>> model for the memory usage, and if they do, I do not know what it
>> >> >>> is.
>> >> >>> However, generally Hypre takes more
>> >> >>> memory than the agglomeration MG like ML or GAMG.
>> >> >>>
>> >> >>
>> >> >> agglomerations methods tend to have lower "grid complexity", that is
>> >> >> smaller coarse grids, than classic AMG like in hypre. THis is more
>> >> >> of a
>> >> >> constant complexity and not a scaling issue though.  You can address
>> >> >> this
>> >> >> with parameters to some extent. But for elasticity, you want to at
>> >> >> least
>> >> >> try, if not start with, GAMG or ML.
>> >> >>
>> >> >>>
>> >> >>>   Thanks,
>> >> >>>
>> >> >>>     Matt
>> >> >>>
>> >> >>>>
>> >> >>>>
>> >> >>>> Giang
>> >> >>>>
>> >> >>>> On Mon, Jan 18, 2016 at 5:25 PM, Jed Brown <jed at jedbrown.org>
>> >> >>>> wrote:
>> >> >>>>>
>> >> >>>>> Hoang Giang Bui <hgbk2008 at gmail.com> writes:
>> >> >>>>>
>> >> >>>>> > Why P2/P2 is not for co-located discretization?
>> >> >>>>>
>> >> >>>>> Matt typed "P2/P2" when me meant "P2/P1".
>> >> >>>>
>> >> >>>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> --
>> >> >>> What most experimenters take for granted before they begin their
>> >> >>> experiments is infinitely more interesting than any results to
>> >> >>> which
>> >> >>> their
>> >> >>> experiments lead.
>> >> >>> -- Norbert Wiener
>> >> >>
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > What most experimenters take for granted before they begin their
>> >> > experiments
>> >> > is infinitely more interesting than any results to which their
>> >> > experiments
>> >> > lead.
>> >> > -- Norbert Wiener
>> >
>> >
>> >
>> >
>> > --
>> > What most experimenters take for granted before they begin their
>> > experiments
>> > is infinitely more interesting than any results to which their
>> > experiments
>> > lead.
>> > -- Norbert Wiener
>
>
>
>
> --
> What most experimenters take for granted before they begin their experiments
> is infinitely more interesting than any results to which their experiments
> lead.
> -- Norbert Wiener

From knepley at gmail.com  Fri Jan 22 12:07:04 2016
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 22 Jan 2016 12:07:04 -0600
Subject: [petsc-users] Why use MATMPIBAIJ?
In-Reply-To: <CAL+XNdXZ1TBCWEpa-D05vqWNNcTnRSu3KC6MNFHG=NsuXNn46A@mail.gmail.com>
References: <CAP2=TMiuZV+DDPk6dqPjZ5Ndy8W5C5GaEreny7_FQuZ39i1gBw@mail.gmail.com>
	<2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov>
	<CAP2=TMhZeZ4JZ3AKARajab8CmFwdANnh0HOHqDDQRd8JfGkO3Q@mail.gmail.com>
	<DAA1C465-680F-4A5E-BA8F-9C6F7C055DAD@mcs.anl.gov>
	<CAP2=TMjwihEL1V7NPNByiMWfWfC5Q0MJyE1ByxnLokx1Df+OJA@mail.gmail.com>
	<CAJW_hKeU9MO8v+ub+5CjnaH3OpBkQNtGA8ZEiDtbYoQnSQ5rGQ@mail.gmail.com>
	<87fuy07zvi.fsf@jedbrown.org>
	<B1B62B13-BE6F-4B51-BC1A-F45FEF379854@mcs.anl.gov>
	<CAJW_hKcyTTCK1ErPFeu4hbeW+CD=PQJd_=H3xNXdO7TzhvfQOw@mail.gmail.com>
	<CAMYG4GkedkT9weo7UCUpTXP_f0jRBUYoySKXP5=q+os2Hpm=+Q@mail.gmail.com>
	<CAJW_hKfFoFSiwZ7kzCcXqFm3kkhLa4PBxTvb3NHHFWNLFqLyqA@mail.gmail.com>
	<87si1ug8hl.fsf@jedbrown.org>
	<CAJW_hKdZsTHkQdN_zs=Ja0e-Z7jY5EMY+QPuRBZny8pri_iYEw@mail.gmail.com>
	<CAMYG4GmYXAF3E0qys6ijCdWRZSRCbcxHTK1TS3vACLrd8-d5Cw@mail.gmail.com>
	<CADOhEh7d4ip-yKHd6RParniBqA2XzkYrPvy3R4JgjMAz1yqgzg@mail.gmail.com>
	<CAMYG4G=oBRei3LGuCZ_+uW+X6wHOe1TB5Hk87rULRENoXAS+xA@mail.gmail.com>
	<CAL+XNdUiPcMM+L8O5bsF70SX0EVp3VpUaxq=2=SoAy8kvc_bRA@mail.gmail.com>
	<CAMYG4GmfA-kY2U4mtNwM-NcUzSBP45NG3H-gYUYh4V5mib-mFA@mail.gmail.com>
	<CAL+XNdWpz9jaHsETZ8M6CroSXbyNSwpiti15Oz4tj8qama2xhw@mail.gmail.com>
	<CAMYG4Gn4BMMOC7hJmEKh8+jeBxKsy6-b-VK6-_xH3mM9E6nQ5Q@mail.gmail.com>
	<CAL+XNdXZ1TBCWEpa-D05vqWNNcTnRSu3KC6MNFHG=NsuXNn46A@mail.gmail.com>
Message-ID: <CAMYG4G=U=cofcJcCDn7C7D0VQOHLuu800VBTFmY+6V=ZoOh0Yw@mail.gmail.com>

On Fri, Jan 22, 2016 at 11:47 AM, Hom Nath Gharti <hng.email at gmail.com>
wrote:

> Thanks a lot.
>
> With AMG it did not converge within the iteration limit of 3000.
>
> In solid: elastic wave equation with added gravity term \rho \nabla\phi
> In fluid: acoustic wave equation with added gravity term \rho \nabla\phi
> Both solid and fluid: Poisson's equation for gravity
> Outer space: Laplace's equation for gravity
>
> We combine so called mapped infinite element with spectral-element
> method (higher order FEM that uses nodal quadrature) and solve in
> frequency domain.
>

1) The Poisson and Laplace equation should be using MG, however you are
using SEM, so
    you would need to use a low order PC for the high order problem, also
called p-MG (Paul Fischer), see

      http://epubs.siam.org/doi/abs/10.1137/110834512

2) The acoustic wave equation is Helmholtz to us, and that needs special MG
tweaks that
     are still research material so I can understand using ASM.

3) Same thing for the elastic wave equations. Some people say they have
this solved using
    hierarchical matrix methods, something like

      http://portal.nersc.gov/project/sparse/strumpack/

    However, I think the jury is still out.

If you can do 100 iterations of plain vanilla solvers, that seems like a
win right now. You might improve
the time using FS, but I am not sure about the iterations on the smaller
problem.

  Thanks,

    Matt


> Hom Nath
>
> On Fri, Jan 22, 2016 at 12:16 PM, Matthew Knepley <knepley at gmail.com>
> wrote:
> > On Fri, Jan 22, 2016 at 11:10 AM, Hom Nath Gharti <hng.email at gmail.com>
> > wrote:
> >>
> >> Thanks Matt.
> >>
> >> Attached detailed info on ksp of a much smaller test. This is a
> >> multiphysics problem.
> >
> >
> > You are using FGMRES/ASM(ILU0). From your description below, this sounds
> > like
> > an elliptic system. I would at least try AMG (-pc_type gamg) to see how
> it
> > does. Any
> > other advice would have to be based on seeing the equations.
> >
> >   Thanks,
> >
> >     Matt
> >
> >>
> >> Hom Nath
> >>
> >> On Fri, Jan 22, 2016 at 12:01 PM, Matthew Knepley <knepley at gmail.com>
> >> wrote:
> >> > On Fri, Jan 22, 2016 at 10:52 AM, Hom Nath Gharti <
> hng.email at gmail.com>
> >> > wrote:
> >> >>
> >> >> Dear all,
> >> >>
> >> >> I take this opportunity to ask for your important suggestion.
> >> >>
> >> >> I am solving an elastic-acoustic-gravity equation on the planet. I
> >> >> have displacement vector (ux,uy,uz) in solid region, displacement
> >> >> potential (\xi) and pressure (p) in fluid region, and gravitational
> >> >> potential (\phi) in all of space. All these variables are coupled.
> >> >>
> >> >> Currently, I am using MATMPIAIJ and form a single global matrix. Does
> >> >> using a MATMPIBIJ or MATNEST improve the convergence/efficiency in
> >> >> this case? For your information, total degrees of freedoms are about
> a
> >> >> billion.
> >> >
> >> >
> >> > 1) For any solver question, we need to see the output of -ksp_view,
> and
> >> > we
> >> > would also like
> >> >
> >> >   -ksp_monitor_true_residual -ksp_converged_reason
> >> >
> >> > 2) MATNEST does not affect convergence, and MATMPIBAIJ only in the
> >> > blocksize
> >> > which you
> >> >     could set without that format
> >> >
> >> > 3) However, you might see benefit from using something like
> PCFIELDSPLIT
> >> > if
> >> > you have multiphysics here
> >> >
> >> >    Matt
> >> >
> >> >>
> >> >> Any suggestion would be greatly appreciated.
> >> >>
> >> >> Thanks,
> >> >> Hom Nath
> >> >>
> >> >> On Fri, Jan 22, 2016 at 10:32 AM, Matthew Knepley <knepley at gmail.com
> >
> >> >> wrote:
> >> >> > On Fri, Jan 22, 2016 at 9:27 AM, Mark Adams <mfadams at lbl.gov>
> wrote:
> >> >> >>>
> >> >> >>>
> >> >> >>>
> >> >> >>> I said the Hypre setup cost is not scalable,
> >> >> >>
> >> >> >>
> >> >> >> I'd be a little careful here.  Scaling for the matrix triple
> product
> >> >> >> is
> >> >> >> hard and hypre does put effort into scaling. I don't have any data
> >> >> >> however.
> >> >> >> Do you?
> >> >> >
> >> >> >
> >> >> > I used it for PyLith and saw this. I did not think any AMG had
> >> >> > scalable
> >> >> > setup time.
> >> >> >
> >> >> >    Matt
> >> >> >
> >> >> >>>
> >> >> >>> but it can be amortized over the iterations. You can quantify
> this
> >> >> >>> just by looking at the PCSetUp time as your increase the number
> of
> >> >> >>> processes. I don't think they have a good
> >> >> >>> model for the memory usage, and if they do, I do not know what it
> >> >> >>> is.
> >> >> >>> However, generally Hypre takes more
> >> >> >>> memory than the agglomeration MG like ML or GAMG.
> >> >> >>>
> >> >> >>
> >> >> >> agglomerations methods tend to have lower "grid complexity", that
> is
> >> >> >> smaller coarse grids, than classic AMG like in hypre. THis is more
> >> >> >> of a
> >> >> >> constant complexity and not a scaling issue though.  You can
> address
> >> >> >> this
> >> >> >> with parameters to some extent. But for elasticity, you want to at
> >> >> >> least
> >> >> >> try, if not start with, GAMG or ML.
> >> >> >>
> >> >> >>>
> >> >> >>>   Thanks,
> >> >> >>>
> >> >> >>>     Matt
> >> >> >>>
> >> >> >>>>
> >> >> >>>>
> >> >> >>>> Giang
> >> >> >>>>
> >> >> >>>> On Mon, Jan 18, 2016 at 5:25 PM, Jed Brown <jed at jedbrown.org>
> >> >> >>>> wrote:
> >> >> >>>>>
> >> >> >>>>> Hoang Giang Bui <hgbk2008 at gmail.com> writes:
> >> >> >>>>>
> >> >> >>>>> > Why P2/P2 is not for co-located discretization?
> >> >> >>>>>
> >> >> >>>>> Matt typed "P2/P2" when me meant "P2/P1".
> >> >> >>>>
> >> >> >>>>
> >> >> >>>
> >> >> >>>
> >> >> >>>
> >> >> >>> --
> >> >> >>> What most experimenters take for granted before they begin their
> >> >> >>> experiments is infinitely more interesting than any results to
> >> >> >>> which
> >> >> >>> their
> >> >> >>> experiments lead.
> >> >> >>> -- Norbert Wiener
> >> >> >>
> >> >> >>
> >> >> >
> >> >> >
> >> >> >
> >> >> > --
> >> >> > What most experimenters take for granted before they begin their
> >> >> > experiments
> >> >> > is infinitely more interesting than any results to which their
> >> >> > experiments
> >> >> > lead.
> >> >> > -- Norbert Wiener
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > What most experimenters take for granted before they begin their
> >> > experiments
> >> > is infinitely more interesting than any results to which their
> >> > experiments
> >> > lead.
> >> > -- Norbert Wiener
> >
> >
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> experiments
> > is infinitely more interesting than any results to which their
> experiments
> > lead.
> > -- Norbert Wiener
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160122/d96ee2a1/attachment.html>

From hng.email at gmail.com  Fri Jan 22 12:17:16 2016
From: hng.email at gmail.com (Hom Nath Gharti)
Date: Fri, 22 Jan 2016 13:17:16 -0500
Subject: [petsc-users] Why use MATMPIBAIJ?
In-Reply-To: <CAMYG4G=U=cofcJcCDn7C7D0VQOHLuu800VBTFmY+6V=ZoOh0Yw@mail.gmail.com>
References: <CAP2=TMiuZV+DDPk6dqPjZ5Ndy8W5C5GaEreny7_FQuZ39i1gBw@mail.gmail.com>
	<2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov>
	<CAP2=TMhZeZ4JZ3AKARajab8CmFwdANnh0HOHqDDQRd8JfGkO3Q@mail.gmail.com>
	<DAA1C465-680F-4A5E-BA8F-9C6F7C055DAD@mcs.anl.gov>
	<CAP2=TMjwihEL1V7NPNByiMWfWfC5Q0MJyE1ByxnLokx1Df+OJA@mail.gmail.com>
	<CAJW_hKeU9MO8v+ub+5CjnaH3OpBkQNtGA8ZEiDtbYoQnSQ5rGQ@mail.gmail.com>
	<87fuy07zvi.fsf@jedbrown.org>
	<B1B62B13-BE6F-4B51-BC1A-F45FEF379854@mcs.anl.gov>
	<CAJW_hKcyTTCK1ErPFeu4hbeW+CD=PQJd_=H3xNXdO7TzhvfQOw@mail.gmail.com>
	<CAMYG4GkedkT9weo7UCUpTXP_f0jRBUYoySKXP5=q+os2Hpm=+Q@mail.gmail.com>
	<CAJW_hKfFoFSiwZ7kzCcXqFm3kkhLa4PBxTvb3NHHFWNLFqLyqA@mail.gmail.com>
	<87si1ug8hl.fsf@jedbrown.org>
	<CAJW_hKdZsTHkQdN_zs=Ja0e-Z7jY5EMY+QPuRBZny8pri_iYEw@mail.gmail.com>
	<CAMYG4GmYXAF3E0qys6ijCdWRZSRCbcxHTK1TS3vACLrd8-d5Cw@mail.gmail.com>
	<CADOhEh7d4ip-yKHd6RParniBqA2XzkYrPvy3R4JgjMAz1yqgzg@mail.gmail.com>
	<CAMYG4G=oBRei3LGuCZ_+uW+X6wHOe1TB5Hk87rULRENoXAS+xA@mail.gmail.com>
	<CAL+XNdUiPcMM+L8O5bsF70SX0EVp3VpUaxq=2=SoAy8kvc_bRA@mail.gmail.com>
	<CAMYG4GmfA-kY2U4mtNwM-NcUzSBP45NG3H-gYUYh4V5mib-mFA@mail.gmail.com>
	<CAL+XNdWpz9jaHsETZ8M6CroSXbyNSwpiti15Oz4tj8qama2xhw@mail.gmail.com>
	<CAMYG4Gn4BMMOC7hJmEKh8+jeBxKsy6-b-VK6-_xH3mM9E6nQ5Q@mail.gmail.com>
	<CAL+XNdXZ1TBCWEpa-D05vqWNNcTnRSu3KC6MNFHG=NsuXNn46A@mail.gmail.com>
	<CAMYG4G=U=cofcJcCDn7C7D0VQOHLuu800VBTFmY+6V=ZoOh0Yw@mail.gmail.com>
Message-ID: <CAL+XNdXY77z-fP0UmFS2SCtmTPqreEL51f7SLk=if7KgL_AtPg@mail.gmail.com>

Thanks Matt for great suggestion. One last question, do you know
whether the GPU capability of current PETSC version is matured enough
to try for my problem?

Thanks again for your help.
Hom Nath

On Fri, Jan 22, 2016 at 1:07 PM, Matthew Knepley <knepley at gmail.com> wrote:
> On Fri, Jan 22, 2016 at 11:47 AM, Hom Nath Gharti <hng.email at gmail.com>
> wrote:
>>
>> Thanks a lot.
>>
>> With AMG it did not converge within the iteration limit of 3000.
>>
>> In solid: elastic wave equation with added gravity term \rho \nabla\phi
>> In fluid: acoustic wave equation with added gravity term \rho \nabla\phi
>> Both solid and fluid: Poisson's equation for gravity
>> Outer space: Laplace's equation for gravity
>>
>> We combine so called mapped infinite element with spectral-element
>> method (higher order FEM that uses nodal quadrature) and solve in
>> frequency domain.
>
>
> 1) The Poisson and Laplace equation should be using MG, however you are
> using SEM, so
>     you would need to use a low order PC for the high order problem, also
> called p-MG (Paul Fischer), see
>
>       http://epubs.siam.org/doi/abs/10.1137/110834512
>
> 2) The acoustic wave equation is Helmholtz to us, and that needs special MG
> tweaks that
>      are still research material so I can understand using ASM.
>
> 3) Same thing for the elastic wave equations. Some people say they have this
> solved using
>     hierarchical matrix methods, something like
>
>       http://portal.nersc.gov/project/sparse/strumpack/
>
>     However, I think the jury is still out.
>
> If you can do 100 iterations of plain vanilla solvers, that seems like a win
> right now. You might improve
> the time using FS, but I am not sure about the iterations on the smaller
> problem.
>
>   Thanks,
>
>     Matt
>
>>
>> Hom Nath
>>
>> On Fri, Jan 22, 2016 at 12:16 PM, Matthew Knepley <knepley at gmail.com>
>> wrote:
>> > On Fri, Jan 22, 2016 at 11:10 AM, Hom Nath Gharti <hng.email at gmail.com>
>> > wrote:
>> >>
>> >> Thanks Matt.
>> >>
>> >> Attached detailed info on ksp of a much smaller test. This is a
>> >> multiphysics problem.
>> >
>> >
>> > You are using FGMRES/ASM(ILU0). From your description below, this sounds
>> > like
>> > an elliptic system. I would at least try AMG (-pc_type gamg) to see how
>> > it
>> > does. Any
>> > other advice would have to be based on seeing the equations.
>> >
>> >   Thanks,
>> >
>> >     Matt
>> >
>> >>
>> >> Hom Nath
>> >>
>> >> On Fri, Jan 22, 2016 at 12:01 PM, Matthew Knepley <knepley at gmail.com>
>> >> wrote:
>> >> > On Fri, Jan 22, 2016 at 10:52 AM, Hom Nath Gharti
>> >> > <hng.email at gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> Dear all,
>> >> >>
>> >> >> I take this opportunity to ask for your important suggestion.
>> >> >>
>> >> >> I am solving an elastic-acoustic-gravity equation on the planet. I
>> >> >> have displacement vector (ux,uy,uz) in solid region, displacement
>> >> >> potential (\xi) and pressure (p) in fluid region, and gravitational
>> >> >> potential (\phi) in all of space. All these variables are coupled.
>> >> >>
>> >> >> Currently, I am using MATMPIAIJ and form a single global matrix.
>> >> >> Does
>> >> >> using a MATMPIBIJ or MATNEST improve the convergence/efficiency in
>> >> >> this case? For your information, total degrees of freedoms are about
>> >> >> a
>> >> >> billion.
>> >> >
>> >> >
>> >> > 1) For any solver question, we need to see the output of -ksp_view,
>> >> > and
>> >> > we
>> >> > would also like
>> >> >
>> >> >   -ksp_monitor_true_residual -ksp_converged_reason
>> >> >
>> >> > 2) MATNEST does not affect convergence, and MATMPIBAIJ only in the
>> >> > blocksize
>> >> > which you
>> >> >     could set without that format
>> >> >
>> >> > 3) However, you might see benefit from using something like
>> >> > PCFIELDSPLIT
>> >> > if
>> >> > you have multiphysics here
>> >> >
>> >> >    Matt
>> >> >
>> >> >>
>> >> >> Any suggestion would be greatly appreciated.
>> >> >>
>> >> >> Thanks,
>> >> >> Hom Nath
>> >> >>
>> >> >> On Fri, Jan 22, 2016 at 10:32 AM, Matthew Knepley
>> >> >> <knepley at gmail.com>
>> >> >> wrote:
>> >> >> > On Fri, Jan 22, 2016 at 9:27 AM, Mark Adams <mfadams at lbl.gov>
>> >> >> > wrote:
>> >> >> >>>
>> >> >> >>>
>> >> >> >>>
>> >> >> >>> I said the Hypre setup cost is not scalable,
>> >> >> >>
>> >> >> >>
>> >> >> >> I'd be a little careful here.  Scaling for the matrix triple
>> >> >> >> product
>> >> >> >> is
>> >> >> >> hard and hypre does put effort into scaling. I don't have any
>> >> >> >> data
>> >> >> >> however.
>> >> >> >> Do you?
>> >> >> >
>> >> >> >
>> >> >> > I used it for PyLith and saw this. I did not think any AMG had
>> >> >> > scalable
>> >> >> > setup time.
>> >> >> >
>> >> >> >    Matt
>> >> >> >
>> >> >> >>>
>> >> >> >>> but it can be amortized over the iterations. You can quantify
>> >> >> >>> this
>> >> >> >>> just by looking at the PCSetUp time as your increase the number
>> >> >> >>> of
>> >> >> >>> processes. I don't think they have a good
>> >> >> >>> model for the memory usage, and if they do, I do not know what
>> >> >> >>> it
>> >> >> >>> is.
>> >> >> >>> However, generally Hypre takes more
>> >> >> >>> memory than the agglomeration MG like ML or GAMG.
>> >> >> >>>
>> >> >> >>
>> >> >> >> agglomerations methods tend to have lower "grid complexity", that
>> >> >> >> is
>> >> >> >> smaller coarse grids, than classic AMG like in hypre. THis is
>> >> >> >> more
>> >> >> >> of a
>> >> >> >> constant complexity and not a scaling issue though.  You can
>> >> >> >> address
>> >> >> >> this
>> >> >> >> with parameters to some extent. But for elasticity, you want to
>> >> >> >> at
>> >> >> >> least
>> >> >> >> try, if not start with, GAMG or ML.
>> >> >> >>
>> >> >> >>>
>> >> >> >>>   Thanks,
>> >> >> >>>
>> >> >> >>>     Matt
>> >> >> >>>
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>> Giang
>> >> >> >>>>
>> >> >> >>>> On Mon, Jan 18, 2016 at 5:25 PM, Jed Brown <jed at jedbrown.org>
>> >> >> >>>> wrote:
>> >> >> >>>>>
>> >> >> >>>>> Hoang Giang Bui <hgbk2008 at gmail.com> writes:
>> >> >> >>>>>
>> >> >> >>>>> > Why P2/P2 is not for co-located discretization?
>> >> >> >>>>>
>> >> >> >>>>> Matt typed "P2/P2" when me meant "P2/P1".
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>
>> >> >> >>>
>> >> >> >>>
>> >> >> >>> --
>> >> >> >>> What most experimenters take for granted before they begin their
>> >> >> >>> experiments is infinitely more interesting than any results to
>> >> >> >>> which
>> >> >> >>> their
>> >> >> >>> experiments lead.
>> >> >> >>> -- Norbert Wiener
>> >> >> >>
>> >> >> >>
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > What most experimenters take for granted before they begin their
>> >> >> > experiments
>> >> >> > is infinitely more interesting than any results to which their
>> >> >> > experiments
>> >> >> > lead.
>> >> >> > -- Norbert Wiener
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > What most experimenters take for granted before they begin their
>> >> > experiments
>> >> > is infinitely more interesting than any results to which their
>> >> > experiments
>> >> > lead.
>> >> > -- Norbert Wiener
>> >
>> >
>> >
>> >
>> > --
>> > What most experimenters take for granted before they begin their
>> > experiments
>> > is infinitely more interesting than any results to which their
>> > experiments
>> > lead.
>> > -- Norbert Wiener
>
>
>
>
> --
> What most experimenters take for granted before they begin their experiments
> is infinitely more interesting than any results to which their experiments
> lead.
> -- Norbert Wiener

From hng.email at gmail.com  Fri Jan 22 14:19:10 2016
From: hng.email at gmail.com (Hom Nath Gharti)
Date: Fri, 22 Jan 2016 15:19:10 -0500
Subject: [petsc-users] PCFIELDSPLIT question
Message-ID: <CAL+XNdWXY93NXrzcK8v0FZvN0-uYqPKCwk-kP5VcNPxOSkU9dg@mail.gmail.com>

Dear all,

I am new to PcFieldSplit.

I have a matrix formed using MATMPIAIJ. Is it possible to use
PCFIELDSPLIT operations in this type of matrix? Or does it have to be
MATMPIBIJ or MATNEST format?

If possible for MATMPIAIJ, could anybody provide me a simple example
or few steps? Variables in the equations are displacement vector,
scalar potential and pressure.

Thanks for help.

Hom Nath

From mfadams at lbl.gov  Fri Jan 22 14:44:15 2016
From: mfadams at lbl.gov (Mark Adams)
Date: Fri, 22 Jan 2016 15:44:15 -0500
Subject: [petsc-users] Why use MATMPIBAIJ?
In-Reply-To: <CAMYG4G=oBRei3LGuCZ_+uW+X6wHOe1TB5Hk87rULRENoXAS+xA@mail.gmail.com>
References: <CAP2=TMiuZV+DDPk6dqPjZ5Ndy8W5C5GaEreny7_FQuZ39i1gBw@mail.gmail.com>
	<2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov>
	<CAP2=TMhZeZ4JZ3AKARajab8CmFwdANnh0HOHqDDQRd8JfGkO3Q@mail.gmail.com>
	<DAA1C465-680F-4A5E-BA8F-9C6F7C055DAD@mcs.anl.gov>
	<CAP2=TMjwihEL1V7NPNByiMWfWfC5Q0MJyE1ByxnLokx1Df+OJA@mail.gmail.com>
	<CAJW_hKeU9MO8v+ub+5CjnaH3OpBkQNtGA8ZEiDtbYoQnSQ5rGQ@mail.gmail.com>
	<87fuy07zvi.fsf@jedbrown.org>
	<B1B62B13-BE6F-4B51-BC1A-F45FEF379854@mcs.anl.gov>
	<CAJW_hKcyTTCK1ErPFeu4hbeW+CD=PQJd_=H3xNXdO7TzhvfQOw@mail.gmail.com>
	<CAMYG4GkedkT9weo7UCUpTXP_f0jRBUYoySKXP5=q+os2Hpm=+Q@mail.gmail.com>
	<CAJW_hKfFoFSiwZ7kzCcXqFm3kkhLa4PBxTvb3NHHFWNLFqLyqA@mail.gmail.com>
	<87si1ug8hl.fsf@jedbrown.org>
	<CAJW_hKdZsTHkQdN_zs=Ja0e-Z7jY5EMY+QPuRBZny8pri_iYEw@mail.gmail.com>
	<CAMYG4GmYXAF3E0qys6ijCdWRZSRCbcxHTK1TS3vACLrd8-d5Cw@mail.gmail.com>
	<CADOhEh7d4ip-yKHd6RParniBqA2XzkYrPvy3R4JgjMAz1yqgzg@mail.gmail.com>
	<CAMYG4G=oBRei3LGuCZ_+uW+X6wHOe1TB5Hk87rULRENoXAS+xA@mail.gmail.com>
Message-ID: <CADOhEh7BTxYWdvYdXCfS7GvG8Bts6xJ33B9D8qNsye0Sx=_uxQ@mail.gmail.com>

>
>
> I used it for PyLith and saw this. I did not think any AMG had scalable
> setup time.
>
>
OK, I am guessing it was scaling poorly, weak scaling, but it was sublinear
after some saturation at the beginning.

I have not done a weak scaling study on matrix setup (RAP primarily) ever,
but I did on Prometheus in the GB work.  Prometheus' RAP  was pretty simple
also and PETSc's is probably faster, and hence may look less scalable.  I
don't think there is anything fundamentally unscalable about RAP, it is
just a complicated algorithm and we have never gotten around to doing all
the things that we would like to do with it.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160122/7d6a7502/attachment.html>

From knepley at gmail.com  Fri Jan 22 15:33:13 2016
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 22 Jan 2016 15:33:13 -0600
Subject: [petsc-users] Why use MATMPIBAIJ?
In-Reply-To: <CAL+XNdXY77z-fP0UmFS2SCtmTPqreEL51f7SLk=if7KgL_AtPg@mail.gmail.com>
References: <CAP2=TMiuZV+DDPk6dqPjZ5Ndy8W5C5GaEreny7_FQuZ39i1gBw@mail.gmail.com>
	<2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov>
	<CAP2=TMhZeZ4JZ3AKARajab8CmFwdANnh0HOHqDDQRd8JfGkO3Q@mail.gmail.com>
	<DAA1C465-680F-4A5E-BA8F-9C6F7C055DAD@mcs.anl.gov>
	<CAP2=TMjwihEL1V7NPNByiMWfWfC5Q0MJyE1ByxnLokx1Df+OJA@mail.gmail.com>
	<CAJW_hKeU9MO8v+ub+5CjnaH3OpBkQNtGA8ZEiDtbYoQnSQ5rGQ@mail.gmail.com>
	<87fuy07zvi.fsf@jedbrown.org>
	<B1B62B13-BE6F-4B51-BC1A-F45FEF379854@mcs.anl.gov>
	<CAJW_hKcyTTCK1ErPFeu4hbeW+CD=PQJd_=H3xNXdO7TzhvfQOw@mail.gmail.com>
	<CAMYG4GkedkT9weo7UCUpTXP_f0jRBUYoySKXP5=q+os2Hpm=+Q@mail.gmail.com>
	<CAJW_hKfFoFSiwZ7kzCcXqFm3kkhLa4PBxTvb3NHHFWNLFqLyqA@mail.gmail.com>
	<87si1ug8hl.fsf@jedbrown.org>
	<CAJW_hKdZsTHkQdN_zs=Ja0e-Z7jY5EMY+QPuRBZny8pri_iYEw@mail.gmail.com>
	<CAMYG4GmYXAF3E0qys6ijCdWRZSRCbcxHTK1TS3vACLrd8-d5Cw@mail.gmail.com>
	<CADOhEh7d4ip-yKHd6RParniBqA2XzkYrPvy3R4JgjMAz1yqgzg@mail.gmail.com>
	<CAMYG4G=oBRei3LGuCZ_+uW+X6wHOe1TB5Hk87rULRENoXAS+xA@mail.gmail.com>
	<CAL+XNdUiPcMM+L8O5bsF70SX0EVp3VpUaxq=2=SoAy8kvc_bRA@mail.gmail.com>
	<CAMYG4GmfA-kY2U4mtNwM-NcUzSBP45NG3H-gYUYh4V5mib-mFA@mail.gmail.com>
	<CAL+XNdWpz9jaHsETZ8M6CroSXbyNSwpiti15Oz4tj8qama2xhw@mail.gmail.com>
	<CAMYG4Gn4BMMOC7hJmEKh8+jeBxKsy6-b-VK6-_xH3mM9E6nQ5Q@mail.gmail.com>
	<CAL+XNdXZ1TBCWEpa-D05vqWNNcTnRSu3KC6MNFHG=NsuXNn46A@mail.gmail.com>
	<CAMYG4G=U=cofcJcCDn7C7D0VQOHLuu800VBTFmY+6V=ZoOh0Yw@mail.gmail.com>
	<CAL+XNdXY77z-fP0UmFS2SCtmTPqreEL51f7SLk=if7KgL_AtPg@mail.gmail.com>
Message-ID: <CAMYG4G=kJdg1gLagcj_W0vEPj-K9Uf0Eh2HLJt5bj=kdh72usw@mail.gmail.com>

On Fri, Jan 22, 2016 at 12:17 PM, Hom Nath Gharti <hng.email at gmail.com>
wrote:

> Thanks Matt for great suggestion. One last question, do you know
> whether the GPU capability of current PETSC version is matured enough
> to try for my problem?
>

The only thing that would really make sense to do on the GPU is the SEM
integration, which
would not be part of PETSc. This is what SPECFEM has optimized.

  Thanks,

    Matt


> Thanks again for your help.
> Hom Nath
>
> On Fri, Jan 22, 2016 at 1:07 PM, Matthew Knepley <knepley at gmail.com>
> wrote:
> > On Fri, Jan 22, 2016 at 11:47 AM, Hom Nath Gharti <hng.email at gmail.com>
> > wrote:
> >>
> >> Thanks a lot.
> >>
> >> With AMG it did not converge within the iteration limit of 3000.
> >>
> >> In solid: elastic wave equation with added gravity term \rho \nabla\phi
> >> In fluid: acoustic wave equation with added gravity term \rho \nabla\phi
> >> Both solid and fluid: Poisson's equation for gravity
> >> Outer space: Laplace's equation for gravity
> >>
> >> We combine so called mapped infinite element with spectral-element
> >> method (higher order FEM that uses nodal quadrature) and solve in
> >> frequency domain.
> >
> >
> > 1) The Poisson and Laplace equation should be using MG, however you are
> > using SEM, so
> >     you would need to use a low order PC for the high order problem, also
> > called p-MG (Paul Fischer), see
> >
> >       http://epubs.siam.org/doi/abs/10.1137/110834512
> >
> > 2) The acoustic wave equation is Helmholtz to us, and that needs special
> MG
> > tweaks that
> >      are still research material so I can understand using ASM.
> >
> > 3) Same thing for the elastic wave equations. Some people say they have
> this
> > solved using
> >     hierarchical matrix methods, something like
> >
> >       http://portal.nersc.gov/project/sparse/strumpack/
> >
> >     However, I think the jury is still out.
> >
> > If you can do 100 iterations of plain vanilla solvers, that seems like a
> win
> > right now. You might improve
> > the time using FS, but I am not sure about the iterations on the smaller
> > problem.
> >
> >   Thanks,
> >
> >     Matt
> >
> >>
> >> Hom Nath
> >>
> >> On Fri, Jan 22, 2016 at 12:16 PM, Matthew Knepley <knepley at gmail.com>
> >> wrote:
> >> > On Fri, Jan 22, 2016 at 11:10 AM, Hom Nath Gharti <
> hng.email at gmail.com>
> >> > wrote:
> >> >>
> >> >> Thanks Matt.
> >> >>
> >> >> Attached detailed info on ksp of a much smaller test. This is a
> >> >> multiphysics problem.
> >> >
> >> >
> >> > You are using FGMRES/ASM(ILU0). From your description below, this
> sounds
> >> > like
> >> > an elliptic system. I would at least try AMG (-pc_type gamg) to see
> how
> >> > it
> >> > does. Any
> >> > other advice would have to be based on seeing the equations.
> >> >
> >> >   Thanks,
> >> >
> >> >     Matt
> >> >
> >> >>
> >> >> Hom Nath
> >> >>
> >> >> On Fri, Jan 22, 2016 at 12:01 PM, Matthew Knepley <knepley at gmail.com
> >
> >> >> wrote:
> >> >> > On Fri, Jan 22, 2016 at 10:52 AM, Hom Nath Gharti
> >> >> > <hng.email at gmail.com>
> >> >> > wrote:
> >> >> >>
> >> >> >> Dear all,
> >> >> >>
> >> >> >> I take this opportunity to ask for your important suggestion.
> >> >> >>
> >> >> >> I am solving an elastic-acoustic-gravity equation on the planet. I
> >> >> >> have displacement vector (ux,uy,uz) in solid region, displacement
> >> >> >> potential (\xi) and pressure (p) in fluid region, and
> gravitational
> >> >> >> potential (\phi) in all of space. All these variables are coupled.
> >> >> >>
> >> >> >> Currently, I am using MATMPIAIJ and form a single global matrix.
> >> >> >> Does
> >> >> >> using a MATMPIBIJ or MATNEST improve the convergence/efficiency in
> >> >> >> this case? For your information, total degrees of freedoms are
> about
> >> >> >> a
> >> >> >> billion.
> >> >> >
> >> >> >
> >> >> > 1) For any solver question, we need to see the output of -ksp_view,
> >> >> > and
> >> >> > we
> >> >> > would also like
> >> >> >
> >> >> >   -ksp_monitor_true_residual -ksp_converged_reason
> >> >> >
> >> >> > 2) MATNEST does not affect convergence, and MATMPIBAIJ only in the
> >> >> > blocksize
> >> >> > which you
> >> >> >     could set without that format
> >> >> >
> >> >> > 3) However, you might see benefit from using something like
> >> >> > PCFIELDSPLIT
> >> >> > if
> >> >> > you have multiphysics here
> >> >> >
> >> >> >    Matt
> >> >> >
> >> >> >>
> >> >> >> Any suggestion would be greatly appreciated.
> >> >> >>
> >> >> >> Thanks,
> >> >> >> Hom Nath
> >> >> >>
> >> >> >> On Fri, Jan 22, 2016 at 10:32 AM, Matthew Knepley
> >> >> >> <knepley at gmail.com>
> >> >> >> wrote:
> >> >> >> > On Fri, Jan 22, 2016 at 9:27 AM, Mark Adams <mfadams at lbl.gov>
> >> >> >> > wrote:
> >> >> >> >>>
> >> >> >> >>>
> >> >> >> >>>
> >> >> >> >>> I said the Hypre setup cost is not scalable,
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> I'd be a little careful here.  Scaling for the matrix triple
> >> >> >> >> product
> >> >> >> >> is
> >> >> >> >> hard and hypre does put effort into scaling. I don't have any
> >> >> >> >> data
> >> >> >> >> however.
> >> >> >> >> Do you?
> >> >> >> >
> >> >> >> >
> >> >> >> > I used it for PyLith and saw this. I did not think any AMG had
> >> >> >> > scalable
> >> >> >> > setup time.
> >> >> >> >
> >> >> >> >    Matt
> >> >> >> >
> >> >> >> >>>
> >> >> >> >>> but it can be amortized over the iterations. You can quantify
> >> >> >> >>> this
> >> >> >> >>> just by looking at the PCSetUp time as your increase the
> number
> >> >> >> >>> of
> >> >> >> >>> processes. I don't think they have a good
> >> >> >> >>> model for the memory usage, and if they do, I do not know what
> >> >> >> >>> it
> >> >> >> >>> is.
> >> >> >> >>> However, generally Hypre takes more
> >> >> >> >>> memory than the agglomeration MG like ML or GAMG.
> >> >> >> >>>
> >> >> >> >>
> >> >> >> >> agglomerations methods tend to have lower "grid complexity",
> that
> >> >> >> >> is
> >> >> >> >> smaller coarse grids, than classic AMG like in hypre. THis is
> >> >> >> >> more
> >> >> >> >> of a
> >> >> >> >> constant complexity and not a scaling issue though.  You can
> >> >> >> >> address
> >> >> >> >> this
> >> >> >> >> with parameters to some extent. But for elasticity, you want to
> >> >> >> >> at
> >> >> >> >> least
> >> >> >> >> try, if not start with, GAMG or ML.
> >> >> >> >>
> >> >> >> >>>
> >> >> >> >>>   Thanks,
> >> >> >> >>>
> >> >> >> >>>     Matt
> >> >> >> >>>
> >> >> >> >>>>
> >> >> >> >>>>
> >> >> >> >>>> Giang
> >> >> >> >>>>
> >> >> >> >>>> On Mon, Jan 18, 2016 at 5:25 PM, Jed Brown <jed at jedbrown.org
> >
> >> >> >> >>>> wrote:
> >> >> >> >>>>>
> >> >> >> >>>>> Hoang Giang Bui <hgbk2008 at gmail.com> writes:
> >> >> >> >>>>>
> >> >> >> >>>>> > Why P2/P2 is not for co-located discretization?
> >> >> >> >>>>>
> >> >> >> >>>>> Matt typed "P2/P2" when me meant "P2/P1".
> >> >> >> >>>>
> >> >> >> >>>>
> >> >> >> >>>
> >> >> >> >>>
> >> >> >> >>>
> >> >> >> >>> --
> >> >> >> >>> What most experimenters take for granted before they begin
> their
> >> >> >> >>> experiments is infinitely more interesting than any results to
> >> >> >> >>> which
> >> >> >> >>> their
> >> >> >> >>> experiments lead.
> >> >> >> >>> -- Norbert Wiener
> >> >> >> >>
> >> >> >> >>
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > --
> >> >> >> > What most experimenters take for granted before they begin their
> >> >> >> > experiments
> >> >> >> > is infinitely more interesting than any results to which their
> >> >> >> > experiments
> >> >> >> > lead.
> >> >> >> > -- Norbert Wiener
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > --
> >> >> > What most experimenters take for granted before they begin their
> >> >> > experiments
> >> >> > is infinitely more interesting than any results to which their
> >> >> > experiments
> >> >> > lead.
> >> >> > -- Norbert Wiener
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > What most experimenters take for granted before they begin their
> >> > experiments
> >> > is infinitely more interesting than any results to which their
> >> > experiments
> >> > lead.
> >> > -- Norbert Wiener
> >
> >
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> experiments
> > is infinitely more interesting than any results to which their
> experiments
> > lead.
> > -- Norbert Wiener
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160122/27a39636/attachment.html>

From hng.email at gmail.com  Fri Jan 22 15:47:13 2016
From: hng.email at gmail.com (Hom Nath Gharti)
Date: Fri, 22 Jan 2016 16:47:13 -0500
Subject: [petsc-users] Why use MATMPIBAIJ?
In-Reply-To: <CAMYG4G=kJdg1gLagcj_W0vEPj-K9Uf0Eh2HLJt5bj=kdh72usw@mail.gmail.com>
References: <CAP2=TMiuZV+DDPk6dqPjZ5Ndy8W5C5GaEreny7_FQuZ39i1gBw@mail.gmail.com>
	<2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov>
	<CAP2=TMhZeZ4JZ3AKARajab8CmFwdANnh0HOHqDDQRd8JfGkO3Q@mail.gmail.com>
	<DAA1C465-680F-4A5E-BA8F-9C6F7C055DAD@mcs.anl.gov>
	<CAP2=TMjwihEL1V7NPNByiMWfWfC5Q0MJyE1ByxnLokx1Df+OJA@mail.gmail.com>
	<CAJW_hKeU9MO8v+ub+5CjnaH3OpBkQNtGA8ZEiDtbYoQnSQ5rGQ@mail.gmail.com>
	<87fuy07zvi.fsf@jedbrown.org>
	<B1B62B13-BE6F-4B51-BC1A-F45FEF379854@mcs.anl.gov>
	<CAJW_hKcyTTCK1ErPFeu4hbeW+CD=PQJd_=H3xNXdO7TzhvfQOw@mail.gmail.com>
	<CAMYG4GkedkT9weo7UCUpTXP_f0jRBUYoySKXP5=q+os2Hpm=+Q@mail.gmail.com>
	<CAJW_hKfFoFSiwZ7kzCcXqFm3kkhLa4PBxTvb3NHHFWNLFqLyqA@mail.gmail.com>
	<87si1ug8hl.fsf@jedbrown.org>
	<CAJW_hKdZsTHkQdN_zs=Ja0e-Z7jY5EMY+QPuRBZny8pri_iYEw@mail.gmail.com>
	<CAMYG4GmYXAF3E0qys6ijCdWRZSRCbcxHTK1TS3vACLrd8-d5Cw@mail.gmail.com>
	<CADOhEh7d4ip-yKHd6RParniBqA2XzkYrPvy3R4JgjMAz1yqgzg@mail.gmail.com>
	<CAMYG4G=oBRei3LGuCZ_+uW+X6wHOe1TB5Hk87rULRENoXAS+xA@mail.gmail.com>
	<CAL+XNdUiPcMM+L8O5bsF70SX0EVp3VpUaxq=2=SoAy8kvc_bRA@mail.gmail.com>
	<CAMYG4GmfA-kY2U4mtNwM-NcUzSBP45NG3H-gYUYh4V5mib-mFA@mail.gmail.com>
	<CAL+XNdWpz9jaHsETZ8M6CroSXbyNSwpiti15Oz4tj8qama2xhw@mail.gmail.com>
	<CAMYG4Gn4BMMOC7hJmEKh8+jeBxKsy6-b-VK6-_xH3mM9E6nQ5Q@mail.gmail.com>
	<CAL+XNdXZ1TBCWEpa-D05vqWNNcTnRSu3KC6MNFHG=NsuXNn46A@mail.gmail.com>
	<CAMYG4G=U=cofcJcCDn7C7D0VQOHLuu800VBTFmY+6V=ZoOh0Yw@mail.gmail.com>
	<CAL+XNdXY77z-fP0UmFS2SCtmTPqreEL51f7SLk=if7KgL_AtPg@mail.gmail.com>
	<CAMYG4G=kJdg1gLagcj_W0vEPj-K9Uf0Eh2HLJt5bj=kdh72usw@mail.gmail.com>
Message-ID: <CAL+XNdW0p-wbVsxOEE4TjwCU9=owXrgkcr=L+8XmjJuBygvsFg@mail.gmail.com>

Hi Matt,

SPECFEM currently has only an explicit time scheme and does not have
full gravity implemented. I am adding implicit time scheme and full
gravity so that it can be used for interesting quasistatic problems
such as glacial rebound, post seismic relaxation etc. I am using Petsc
as a linear solver which I would like to see GPU implemented.

Thanks,
Hom Nath

On Fri, Jan 22, 2016 at 4:33 PM, Matthew Knepley <knepley at gmail.com> wrote:
> On Fri, Jan 22, 2016 at 12:17 PM, Hom Nath Gharti <hng.email at gmail.com>
> wrote:
>>
>> Thanks Matt for great suggestion. One last question, do you know
>> whether the GPU capability of current PETSC version is matured enough
>> to try for my problem?
>
>
> The only thing that would really make sense to do on the GPU is the SEM
> integration, which
> would not be part of PETSc. This is what SPECFEM has optimized.
>
>   Thanks,
>
>     Matt
>
>>
>> Thanks again for your help.
>> Hom Nath
>>
>> On Fri, Jan 22, 2016 at 1:07 PM, Matthew Knepley <knepley at gmail.com>
>> wrote:
>> > On Fri, Jan 22, 2016 at 11:47 AM, Hom Nath Gharti <hng.email at gmail.com>
>> > wrote:
>> >>
>> >> Thanks a lot.
>> >>
>> >> With AMG it did not converge within the iteration limit of 3000.
>> >>
>> >> In solid: elastic wave equation with added gravity term \rho \nabla\phi
>> >> In fluid: acoustic wave equation with added gravity term \rho
>> >> \nabla\phi
>> >> Both solid and fluid: Poisson's equation for gravity
>> >> Outer space: Laplace's equation for gravity
>> >>
>> >> We combine so called mapped infinite element with spectral-element
>> >> method (higher order FEM that uses nodal quadrature) and solve in
>> >> frequency domain.
>> >
>> >
>> > 1) The Poisson and Laplace equation should be using MG, however you are
>> > using SEM, so
>> >     you would need to use a low order PC for the high order problem,
>> > also
>> > called p-MG (Paul Fischer), see
>> >
>> >       http://epubs.siam.org/doi/abs/10.1137/110834512
>> >
>> > 2) The acoustic wave equation is Helmholtz to us, and that needs special
>> > MG
>> > tweaks that
>> >      are still research material so I can understand using ASM.
>> >
>> > 3) Same thing for the elastic wave equations. Some people say they have
>> > this
>> > solved using
>> >     hierarchical matrix methods, something like
>> >
>> >       http://portal.nersc.gov/project/sparse/strumpack/
>> >
>> >     However, I think the jury is still out.
>> >
>> > If you can do 100 iterations of plain vanilla solvers, that seems like a
>> > win
>> > right now. You might improve
>> > the time using FS, but I am not sure about the iterations on the smaller
>> > problem.
>> >
>> >   Thanks,
>> >
>> >     Matt
>> >
>> >>
>> >> Hom Nath
>> >>
>> >> On Fri, Jan 22, 2016 at 12:16 PM, Matthew Knepley <knepley at gmail.com>
>> >> wrote:
>> >> > On Fri, Jan 22, 2016 at 11:10 AM, Hom Nath Gharti
>> >> > <hng.email at gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> Thanks Matt.
>> >> >>
>> >> >> Attached detailed info on ksp of a much smaller test. This is a
>> >> >> multiphysics problem.
>> >> >
>> >> >
>> >> > You are using FGMRES/ASM(ILU0). From your description below, this
>> >> > sounds
>> >> > like
>> >> > an elliptic system. I would at least try AMG (-pc_type gamg) to see
>> >> > how
>> >> > it
>> >> > does. Any
>> >> > other advice would have to be based on seeing the equations.
>> >> >
>> >> >   Thanks,
>> >> >
>> >> >     Matt
>> >> >
>> >> >>
>> >> >> Hom Nath
>> >> >>
>> >> >> On Fri, Jan 22, 2016 at 12:01 PM, Matthew Knepley
>> >> >> <knepley at gmail.com>
>> >> >> wrote:
>> >> >> > On Fri, Jan 22, 2016 at 10:52 AM, Hom Nath Gharti
>> >> >> > <hng.email at gmail.com>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> Dear all,
>> >> >> >>
>> >> >> >> I take this opportunity to ask for your important suggestion.
>> >> >> >>
>> >> >> >> I am solving an elastic-acoustic-gravity equation on the planet.
>> >> >> >> I
>> >> >> >> have displacement vector (ux,uy,uz) in solid region, displacement
>> >> >> >> potential (\xi) and pressure (p) in fluid region, and
>> >> >> >> gravitational
>> >> >> >> potential (\phi) in all of space. All these variables are
>> >> >> >> coupled.
>> >> >> >>
>> >> >> >> Currently, I am using MATMPIAIJ and form a single global matrix.
>> >> >> >> Does
>> >> >> >> using a MATMPIBIJ or MATNEST improve the convergence/efficiency
>> >> >> >> in
>> >> >> >> this case? For your information, total degrees of freedoms are
>> >> >> >> about
>> >> >> >> a
>> >> >> >> billion.
>> >> >> >
>> >> >> >
>> >> >> > 1) For any solver question, we need to see the output of
>> >> >> > -ksp_view,
>> >> >> > and
>> >> >> > we
>> >> >> > would also like
>> >> >> >
>> >> >> >   -ksp_monitor_true_residual -ksp_converged_reason
>> >> >> >
>> >> >> > 2) MATNEST does not affect convergence, and MATMPIBAIJ only in the
>> >> >> > blocksize
>> >> >> > which you
>> >> >> >     could set without that format
>> >> >> >
>> >> >> > 3) However, you might see benefit from using something like
>> >> >> > PCFIELDSPLIT
>> >> >> > if
>> >> >> > you have multiphysics here
>> >> >> >
>> >> >> >    Matt
>> >> >> >
>> >> >> >>
>> >> >> >> Any suggestion would be greatly appreciated.
>> >> >> >>
>> >> >> >> Thanks,
>> >> >> >> Hom Nath
>> >> >> >>
>> >> >> >> On Fri, Jan 22, 2016 at 10:32 AM, Matthew Knepley
>> >> >> >> <knepley at gmail.com>
>> >> >> >> wrote:
>> >> >> >> > On Fri, Jan 22, 2016 at 9:27 AM, Mark Adams <mfadams at lbl.gov>
>> >> >> >> > wrote:
>> >> >> >> >>>
>> >> >> >> >>>
>> >> >> >> >>>
>> >> >> >> >>> I said the Hypre setup cost is not scalable,
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> I'd be a little careful here.  Scaling for the matrix triple
>> >> >> >> >> product
>> >> >> >> >> is
>> >> >> >> >> hard and hypre does put effort into scaling. I don't have any
>> >> >> >> >> data
>> >> >> >> >> however.
>> >> >> >> >> Do you?
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > I used it for PyLith and saw this. I did not think any AMG had
>> >> >> >> > scalable
>> >> >> >> > setup time.
>> >> >> >> >
>> >> >> >> >    Matt
>> >> >> >> >
>> >> >> >> >>>
>> >> >> >> >>> but it can be amortized over the iterations. You can quantify
>> >> >> >> >>> this
>> >> >> >> >>> just by looking at the PCSetUp time as your increase the
>> >> >> >> >>> number
>> >> >> >> >>> of
>> >> >> >> >>> processes. I don't think they have a good
>> >> >> >> >>> model for the memory usage, and if they do, I do not know
>> >> >> >> >>> what
>> >> >> >> >>> it
>> >> >> >> >>> is.
>> >> >> >> >>> However, generally Hypre takes more
>> >> >> >> >>> memory than the agglomeration MG like ML or GAMG.
>> >> >> >> >>>
>> >> >> >> >>
>> >> >> >> >> agglomerations methods tend to have lower "grid complexity",
>> >> >> >> >> that
>> >> >> >> >> is
>> >> >> >> >> smaller coarse grids, than classic AMG like in hypre. THis is
>> >> >> >> >> more
>> >> >> >> >> of a
>> >> >> >> >> constant complexity and not a scaling issue though.  You can
>> >> >> >> >> address
>> >> >> >> >> this
>> >> >> >> >> with parameters to some extent. But for elasticity, you want
>> >> >> >> >> to
>> >> >> >> >> at
>> >> >> >> >> least
>> >> >> >> >> try, if not start with, GAMG or ML.
>> >> >> >> >>
>> >> >> >> >>>
>> >> >> >> >>>   Thanks,
>> >> >> >> >>>
>> >> >> >> >>>     Matt
>> >> >> >> >>>
>> >> >> >> >>>>
>> >> >> >> >>>>
>> >> >> >> >>>> Giang
>> >> >> >> >>>>
>> >> >> >> >>>> On Mon, Jan 18, 2016 at 5:25 PM, Jed Brown
>> >> >> >> >>>> <jed at jedbrown.org>
>> >> >> >> >>>> wrote:
>> >> >> >> >>>>>
>> >> >> >> >>>>> Hoang Giang Bui <hgbk2008 at gmail.com> writes:
>> >> >> >> >>>>>
>> >> >> >> >>>>> > Why P2/P2 is not for co-located discretization?
>> >> >> >> >>>>>
>> >> >> >> >>>>> Matt typed "P2/P2" when me meant "P2/P1".
>> >> >> >> >>>>
>> >> >> >> >>>>
>> >> >> >> >>>
>> >> >> >> >>>
>> >> >> >> >>>
>> >> >> >> >>> --
>> >> >> >> >>> What most experimenters take for granted before they begin
>> >> >> >> >>> their
>> >> >> >> >>> experiments is infinitely more interesting than any results
>> >> >> >> >>> to
>> >> >> >> >>> which
>> >> >> >> >>> their
>> >> >> >> >>> experiments lead.
>> >> >> >> >>> -- Norbert Wiener
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > --
>> >> >> >> > What most experimenters take for granted before they begin
>> >> >> >> > their
>> >> >> >> > experiments
>> >> >> >> > is infinitely more interesting than any results to which their
>> >> >> >> > experiments
>> >> >> >> > lead.
>> >> >> >> > -- Norbert Wiener
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > What most experimenters take for granted before they begin their
>> >> >> > experiments
>> >> >> > is infinitely more interesting than any results to which their
>> >> >> > experiments
>> >> >> > lead.
>> >> >> > -- Norbert Wiener
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > What most experimenters take for granted before they begin their
>> >> > experiments
>> >> > is infinitely more interesting than any results to which their
>> >> > experiments
>> >> > lead.
>> >> > -- Norbert Wiener
>> >
>> >
>> >
>> >
>> > --
>> > What most experimenters take for granted before they begin their
>> > experiments
>> > is infinitely more interesting than any results to which their
>> > experiments
>> > lead.
>> > -- Norbert Wiener
>
>
>
>
> --
> What most experimenters take for granted before they begin their experiments
> is infinitely more interesting than any results to which their experiments
> lead.
> -- Norbert Wiener

From knepley at gmail.com  Fri Jan 22 16:06:09 2016
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 22 Jan 2016 16:06:09 -0600
Subject: [petsc-users] Why use MATMPIBAIJ?
In-Reply-To: <CAL+XNdW0p-wbVsxOEE4TjwCU9=owXrgkcr=L+8XmjJuBygvsFg@mail.gmail.com>
References: <CAP2=TMiuZV+DDPk6dqPjZ5Ndy8W5C5GaEreny7_FQuZ39i1gBw@mail.gmail.com>
	<2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov>
	<CAP2=TMhZeZ4JZ3AKARajab8CmFwdANnh0HOHqDDQRd8JfGkO3Q@mail.gmail.com>
	<DAA1C465-680F-4A5E-BA8F-9C6F7C055DAD@mcs.anl.gov>
	<CAP2=TMjwihEL1V7NPNByiMWfWfC5Q0MJyE1ByxnLokx1Df+OJA@mail.gmail.com>
	<CAJW_hKeU9MO8v+ub+5CjnaH3OpBkQNtGA8ZEiDtbYoQnSQ5rGQ@mail.gmail.com>
	<87fuy07zvi.fsf@jedbrown.org>
	<B1B62B13-BE6F-4B51-BC1A-F45FEF379854@mcs.anl.gov>
	<CAJW_hKcyTTCK1ErPFeu4hbeW+CD=PQJd_=H3xNXdO7TzhvfQOw@mail.gmail.com>
	<CAMYG4GkedkT9weo7UCUpTXP_f0jRBUYoySKXP5=q+os2Hpm=+Q@mail.gmail.com>
	<CAJW_hKfFoFSiwZ7kzCcXqFm3kkhLa4PBxTvb3NHHFWNLFqLyqA@mail.gmail.com>
	<87si1ug8hl.fsf@jedbrown.org>
	<CAJW_hKdZsTHkQdN_zs=Ja0e-Z7jY5EMY+QPuRBZny8pri_iYEw@mail.gmail.com>
	<CAMYG4GmYXAF3E0qys6ijCdWRZSRCbcxHTK1TS3vACLrd8-d5Cw@mail.gmail.com>
	<CADOhEh7d4ip-yKHd6RParniBqA2XzkYrPvy3R4JgjMAz1yqgzg@mail.gmail.com>
	<CAMYG4G=oBRei3LGuCZ_+uW+X6wHOe1TB5Hk87rULRENoXAS+xA@mail.gmail.com>
	<CAL+XNdUiPcMM+L8O5bsF70SX0EVp3VpUaxq=2=SoAy8kvc_bRA@mail.gmail.com>
	<CAMYG4GmfA-kY2U4mtNwM-NcUzSBP45NG3H-gYUYh4V5mib-mFA@mail.gmail.com>
	<CAL+XNdWpz9jaHsETZ8M6CroSXbyNSwpiti15Oz4tj8qama2xhw@mail.gmail.com>
	<CAMYG4Gn4BMMOC7hJmEKh8+jeBxKsy6-b-VK6-_xH3mM9E6nQ5Q@mail.gmail.com>
	<CAL+XNdXZ1TBCWEpa-D05vqWNNcTnRSu3KC6MNFHG=NsuXNn46A@mail.gmail.com>
	<CAMYG4G=U=cofcJcCDn7C7D0VQOHLuu800VBTFmY+6V=ZoOh0Yw@mail.gmail.com>
	<CAL+XNdXY77z-fP0UmFS2SCtmTPqreEL51f7SLk=if7KgL_AtPg@mail.gmail.com>
	<CAMYG4G=kJdg1gLagcj_W0vEPj-K9Uf0Eh2HLJt5bj=kdh72usw@mail.gmail.com>
	<CAL+XNdW0p-wbVsxOEE4TjwCU9=owXrgkcr=L+8XmjJuBygvsFg@mail.gmail.com>
Message-ID: <CAMYG4GnfA47uck37RX8kWJBkww_Jv2+DD2AE+K9nWnE57z0GKg@mail.gmail.com>

On Fri, Jan 22, 2016 at 3:47 PM, Hom Nath Gharti <hng.email at gmail.com>
wrote:

> Hi Matt,
>
> SPECFEM currently has only an explicit time scheme and does not have
> full gravity implemented. I am adding implicit time scheme and full
> gravity so that it can be used for interesting quasistatic problems
> such as glacial rebound, post seismic relaxation etc. I am using Petsc
> as a linear solver which I would like to see GPU implemented.
>

Why? It really does not make sense for those operations.

It is an unfortunate fact, but the usefulness of GPUs has been oversold.
You can certainly
get some mileage out of a SpMV on the GPU, but there the maximum win is
maybe 2x or
less for a nice CPU, and then you have to account for transfer time and
other latencies.
Unless you have a really compelling case, I would not waste your time.

To come to this opinion, I used years of my own time looking at GPUs.

  Thanks,

     Matt


> Thanks,
> Hom Nath
>
> On Fri, Jan 22, 2016 at 4:33 PM, Matthew Knepley <knepley at gmail.com>
> wrote:
> > On Fri, Jan 22, 2016 at 12:17 PM, Hom Nath Gharti <hng.email at gmail.com>
> > wrote:
> >>
> >> Thanks Matt for great suggestion. One last question, do you know
> >> whether the GPU capability of current PETSC version is matured enough
> >> to try for my problem?
> >
> >
> > The only thing that would really make sense to do on the GPU is the SEM
> > integration, which
> > would not be part of PETSc. This is what SPECFEM has optimized.
> >
> >   Thanks,
> >
> >     Matt
> >
> >>
> >> Thanks again for your help.
> >> Hom Nath
> >>
> >> On Fri, Jan 22, 2016 at 1:07 PM, Matthew Knepley <knepley at gmail.com>
> >> wrote:
> >> > On Fri, Jan 22, 2016 at 11:47 AM, Hom Nath Gharti <
> hng.email at gmail.com>
> >> > wrote:
> >> >>
> >> >> Thanks a lot.
> >> >>
> >> >> With AMG it did not converge within the iteration limit of 3000.
> >> >>
> >> >> In solid: elastic wave equation with added gravity term \rho
> \nabla\phi
> >> >> In fluid: acoustic wave equation with added gravity term \rho
> >> >> \nabla\phi
> >> >> Both solid and fluid: Poisson's equation for gravity
> >> >> Outer space: Laplace's equation for gravity
> >> >>
> >> >> We combine so called mapped infinite element with spectral-element
> >> >> method (higher order FEM that uses nodal quadrature) and solve in
> >> >> frequency domain.
> >> >
> >> >
> >> > 1) The Poisson and Laplace equation should be using MG, however you
> are
> >> > using SEM, so
> >> >     you would need to use a low order PC for the high order problem,
> >> > also
> >> > called p-MG (Paul Fischer), see
> >> >
> >> >       http://epubs.siam.org/doi/abs/10.1137/110834512
> >> >
> >> > 2) The acoustic wave equation is Helmholtz to us, and that needs
> special
> >> > MG
> >> > tweaks that
> >> >      are still research material so I can understand using ASM.
> >> >
> >> > 3) Same thing for the elastic wave equations. Some people say they
> have
> >> > this
> >> > solved using
> >> >     hierarchical matrix methods, something like
> >> >
> >> >       http://portal.nersc.gov/project/sparse/strumpack/
> >> >
> >> >     However, I think the jury is still out.
> >> >
> >> > If you can do 100 iterations of plain vanilla solvers, that seems
> like a
> >> > win
> >> > right now. You might improve
> >> > the time using FS, but I am not sure about the iterations on the
> smaller
> >> > problem.
> >> >
> >> >   Thanks,
> >> >
> >> >     Matt
> >> >
> >> >>
> >> >> Hom Nath
> >> >>
> >> >> On Fri, Jan 22, 2016 at 12:16 PM, Matthew Knepley <knepley at gmail.com
> >
> >> >> wrote:
> >> >> > On Fri, Jan 22, 2016 at 11:10 AM, Hom Nath Gharti
> >> >> > <hng.email at gmail.com>
> >> >> > wrote:
> >> >> >>
> >> >> >> Thanks Matt.
> >> >> >>
> >> >> >> Attached detailed info on ksp of a much smaller test. This is a
> >> >> >> multiphysics problem.
> >> >> >
> >> >> >
> >> >> > You are using FGMRES/ASM(ILU0). From your description below, this
> >> >> > sounds
> >> >> > like
> >> >> > an elliptic system. I would at least try AMG (-pc_type gamg) to see
> >> >> > how
> >> >> > it
> >> >> > does. Any
> >> >> > other advice would have to be based on seeing the equations.
> >> >> >
> >> >> >   Thanks,
> >> >> >
> >> >> >     Matt
> >> >> >
> >> >> >>
> >> >> >> Hom Nath
> >> >> >>
> >> >> >> On Fri, Jan 22, 2016 at 12:01 PM, Matthew Knepley
> >> >> >> <knepley at gmail.com>
> >> >> >> wrote:
> >> >> >> > On Fri, Jan 22, 2016 at 10:52 AM, Hom Nath Gharti
> >> >> >> > <hng.email at gmail.com>
> >> >> >> > wrote:
> >> >> >> >>
> >> >> >> >> Dear all,
> >> >> >> >>
> >> >> >> >> I take this opportunity to ask for your important suggestion.
> >> >> >> >>
> >> >> >> >> I am solving an elastic-acoustic-gravity equation on the
> planet.
> >> >> >> >> I
> >> >> >> >> have displacement vector (ux,uy,uz) in solid region,
> displacement
> >> >> >> >> potential (\xi) and pressure (p) in fluid region, and
> >> >> >> >> gravitational
> >> >> >> >> potential (\phi) in all of space. All these variables are
> >> >> >> >> coupled.
> >> >> >> >>
> >> >> >> >> Currently, I am using MATMPIAIJ and form a single global
> matrix.
> >> >> >> >> Does
> >> >> >> >> using a MATMPIBIJ or MATNEST improve the convergence/efficiency
> >> >> >> >> in
> >> >> >> >> this case? For your information, total degrees of freedoms are
> >> >> >> >> about
> >> >> >> >> a
> >> >> >> >> billion.
> >> >> >> >
> >> >> >> >
> >> >> >> > 1) For any solver question, we need to see the output of
> >> >> >> > -ksp_view,
> >> >> >> > and
> >> >> >> > we
> >> >> >> > would also like
> >> >> >> >
> >> >> >> >   -ksp_monitor_true_residual -ksp_converged_reason
> >> >> >> >
> >> >> >> > 2) MATNEST does not affect convergence, and MATMPIBAIJ only in
> the
> >> >> >> > blocksize
> >> >> >> > which you
> >> >> >> >     could set without that format
> >> >> >> >
> >> >> >> > 3) However, you might see benefit from using something like
> >> >> >> > PCFIELDSPLIT
> >> >> >> > if
> >> >> >> > you have multiphysics here
> >> >> >> >
> >> >> >> >    Matt
> >> >> >> >
> >> >> >> >>
> >> >> >> >> Any suggestion would be greatly appreciated.
> >> >> >> >>
> >> >> >> >> Thanks,
> >> >> >> >> Hom Nath
> >> >> >> >>
> >> >> >> >> On Fri, Jan 22, 2016 at 10:32 AM, Matthew Knepley
> >> >> >> >> <knepley at gmail.com>
> >> >> >> >> wrote:
> >> >> >> >> > On Fri, Jan 22, 2016 at 9:27 AM, Mark Adams <mfadams at lbl.gov
> >
> >> >> >> >> > wrote:
> >> >> >> >> >>>
> >> >> >> >> >>>
> >> >> >> >> >>>
> >> >> >> >> >>> I said the Hypre setup cost is not scalable,
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >> I'd be a little careful here.  Scaling for the matrix triple
> >> >> >> >> >> product
> >> >> >> >> >> is
> >> >> >> >> >> hard and hypre does put effort into scaling. I don't have
> any
> >> >> >> >> >> data
> >> >> >> >> >> however.
> >> >> >> >> >> Do you?
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> > I used it for PyLith and saw this. I did not think any AMG
> had
> >> >> >> >> > scalable
> >> >> >> >> > setup time.
> >> >> >> >> >
> >> >> >> >> >    Matt
> >> >> >> >> >
> >> >> >> >> >>>
> >> >> >> >> >>> but it can be amortized over the iterations. You can
> quantify
> >> >> >> >> >>> this
> >> >> >> >> >>> just by looking at the PCSetUp time as your increase the
> >> >> >> >> >>> number
> >> >> >> >> >>> of
> >> >> >> >> >>> processes. I don't think they have a good
> >> >> >> >> >>> model for the memory usage, and if they do, I do not know
> >> >> >> >> >>> what
> >> >> >> >> >>> it
> >> >> >> >> >>> is.
> >> >> >> >> >>> However, generally Hypre takes more
> >> >> >> >> >>> memory than the agglomeration MG like ML or GAMG.
> >> >> >> >> >>>
> >> >> >> >> >>
> >> >> >> >> >> agglomerations methods tend to have lower "grid complexity",
> >> >> >> >> >> that
> >> >> >> >> >> is
> >> >> >> >> >> smaller coarse grids, than classic AMG like in hypre. THis
> is
> >> >> >> >> >> more
> >> >> >> >> >> of a
> >> >> >> >> >> constant complexity and not a scaling issue though.  You can
> >> >> >> >> >> address
> >> >> >> >> >> this
> >> >> >> >> >> with parameters to some extent. But for elasticity, you want
> >> >> >> >> >> to
> >> >> >> >> >> at
> >> >> >> >> >> least
> >> >> >> >> >> try, if not start with, GAMG or ML.
> >> >> >> >> >>
> >> >> >> >> >>>
> >> >> >> >> >>>   Thanks,
> >> >> >> >> >>>
> >> >> >> >> >>>     Matt
> >> >> >> >> >>>
> >> >> >> >> >>>>
> >> >> >> >> >>>>
> >> >> >> >> >>>> Giang
> >> >> >> >> >>>>
> >> >> >> >> >>>> On Mon, Jan 18, 2016 at 5:25 PM, Jed Brown
> >> >> >> >> >>>> <jed at jedbrown.org>
> >> >> >> >> >>>> wrote:
> >> >> >> >> >>>>>
> >> >> >> >> >>>>> Hoang Giang Bui <hgbk2008 at gmail.com> writes:
> >> >> >> >> >>>>>
> >> >> >> >> >>>>> > Why P2/P2 is not for co-located discretization?
> >> >> >> >> >>>>>
> >> >> >> >> >>>>> Matt typed "P2/P2" when me meant "P2/P1".
> >> >> >> >> >>>>
> >> >> >> >> >>>>
> >> >> >> >> >>>
> >> >> >> >> >>>
> >> >> >> >> >>>
> >> >> >> >> >>> --
> >> >> >> >> >>> What most experimenters take for granted before they begin
> >> >> >> >> >>> their
> >> >> >> >> >>> experiments is infinitely more interesting than any results
> >> >> >> >> >>> to
> >> >> >> >> >>> which
> >> >> >> >> >>> their
> >> >> >> >> >>> experiments lead.
> >> >> >> >> >>> -- Norbert Wiener
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> > --
> >> >> >> >> > What most experimenters take for granted before they begin
> >> >> >> >> > their
> >> >> >> >> > experiments
> >> >> >> >> > is infinitely more interesting than any results to which
> their
> >> >> >> >> > experiments
> >> >> >> >> > lead.
> >> >> >> >> > -- Norbert Wiener
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > --
> >> >> >> > What most experimenters take for granted before they begin their
> >> >> >> > experiments
> >> >> >> > is infinitely more interesting than any results to which their
> >> >> >> > experiments
> >> >> >> > lead.
> >> >> >> > -- Norbert Wiener
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > --
> >> >> > What most experimenters take for granted before they begin their
> >> >> > experiments
> >> >> > is infinitely more interesting than any results to which their
> >> >> > experiments
> >> >> > lead.
> >> >> > -- Norbert Wiener
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > What most experimenters take for granted before they begin their
> >> > experiments
> >> > is infinitely more interesting than any results to which their
> >> > experiments
> >> > lead.
> >> > -- Norbert Wiener
> >
> >
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> experiments
> > is infinitely more interesting than any results to which their
> experiments
> > lead.
> > -- Norbert Wiener
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160122/fb380622/attachment.html>

From hng.email at gmail.com  Fri Jan 22 16:11:14 2016
From: hng.email at gmail.com (Hom Nath Gharti)
Date: Fri, 22 Jan 2016 17:11:14 -0500
Subject: [petsc-users] Why use MATMPIBAIJ?
In-Reply-To: <CAMYG4GnfA47uck37RX8kWJBkww_Jv2+DD2AE+K9nWnE57z0GKg@mail.gmail.com>
References: <CAP2=TMiuZV+DDPk6dqPjZ5Ndy8W5C5GaEreny7_FQuZ39i1gBw@mail.gmail.com>
	<2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov>
	<CAP2=TMhZeZ4JZ3AKARajab8CmFwdANnh0HOHqDDQRd8JfGkO3Q@mail.gmail.com>
	<DAA1C465-680F-4A5E-BA8F-9C6F7C055DAD@mcs.anl.gov>
	<CAP2=TMjwihEL1V7NPNByiMWfWfC5Q0MJyE1ByxnLokx1Df+OJA@mail.gmail.com>
	<CAJW_hKeU9MO8v+ub+5CjnaH3OpBkQNtGA8ZEiDtbYoQnSQ5rGQ@mail.gmail.com>
	<87fuy07zvi.fsf@jedbrown.org>
	<B1B62B13-BE6F-4B51-BC1A-F45FEF379854@mcs.anl.gov>
	<CAJW_hKcyTTCK1ErPFeu4hbeW+CD=PQJd_=H3xNXdO7TzhvfQOw@mail.gmail.com>
	<CAMYG4GkedkT9weo7UCUpTXP_f0jRBUYoySKXP5=q+os2Hpm=+Q@mail.gmail.com>
	<CAJW_hKfFoFSiwZ7kzCcXqFm3kkhLa4PBxTvb3NHHFWNLFqLyqA@mail.gmail.com>
	<87si1ug8hl.fsf@jedbrown.org>
	<CAJW_hKdZsTHkQdN_zs=Ja0e-Z7jY5EMY+QPuRBZny8pri_iYEw@mail.gmail.com>
	<CAMYG4GmYXAF3E0qys6ijCdWRZSRCbcxHTK1TS3vACLrd8-d5Cw@mail.gmail.com>
	<CADOhEh7d4ip-yKHd6RParniBqA2XzkYrPvy3R4JgjMAz1yqgzg@mail.gmail.com>
	<CAMYG4G=oBRei3LGuCZ_+uW+X6wHOe1TB5Hk87rULRENoXAS+xA@mail.gmail.com>
	<CAL+XNdUiPcMM+L8O5bsF70SX0EVp3VpUaxq=2=SoAy8kvc_bRA@mail.gmail.com>
	<CAMYG4GmfA-kY2U4mtNwM-NcUzSBP45NG3H-gYUYh4V5mib-mFA@mail.gmail.com>
	<CAL+XNdWpz9jaHsETZ8M6CroSXbyNSwpiti15Oz4tj8qama2xhw@mail.gmail.com>
	<CAMYG4Gn4BMMOC7hJmEKh8+jeBxKsy6-b-VK6-_xH3mM9E6nQ5Q@mail.gmail.com>
	<CAL+XNdXZ1TBCWEpa-D05vqWNNcTnRSu3KC6MNFHG=NsuXNn46A@mail.gmail.com>
	<CAMYG4G=U=cofcJcCDn7C7D0VQOHLuu800VBTFmY+6V=ZoOh0Yw@mail.gmail.com>
	<CAL+XNdXY77z-fP0UmFS2SCtmTPqreEL51f7SLk=if7KgL_AtPg@mail.gmail.com>
	<CAMYG4G=kJdg1gLagcj_W0vEPj-K9Uf0Eh2HLJt5bj=kdh72usw@mail.gmail.com>
	<CAL+XNdW0p-wbVsxOEE4TjwCU9=owXrgkcr=L+8XmjJuBygvsFg@mail.gmail.com>
	<CAMYG4GnfA47uck37RX8kWJBkww_Jv2+DD2AE+K9nWnE57z0GKg@mail.gmail.com>
Message-ID: <CAL+XNdU17REUJvdq_yBQFvk67ATxhdj65nft50EY3Wvm3CnafA@mail.gmail.com>

Thanks for your suggestions! If it's just 2X, I will not waste my time!

Hom Nath

On Fri, Jan 22, 2016 at 5:06 PM, Matthew Knepley <knepley at gmail.com> wrote:
> On Fri, Jan 22, 2016 at 3:47 PM, Hom Nath Gharti <hng.email at gmail.com>
> wrote:
>>
>> Hi Matt,
>>
>> SPECFEM currently has only an explicit time scheme and does not have
>> full gravity implemented. I am adding implicit time scheme and full
>> gravity so that it can be used for interesting quasistatic problems
>> such as glacial rebound, post seismic relaxation etc. I am using Petsc
>> as a linear solver which I would like to see GPU implemented.
>
>
> Why? It really does not make sense for those operations.
>
> It is an unfortunate fact, but the usefulness of GPUs has been oversold. You
> can certainly
> get some mileage out of a SpMV on the GPU, but there the maximum win is
> maybe 2x or
> less for a nice CPU, and then you have to account for transfer time and
> other latencies.
> Unless you have a really compelling case, I would not waste your time.
>
> To come to this opinion, I used years of my own time looking at GPUs.
>
>   Thanks,
>
>      Matt
>
>>
>> Thanks,
>> Hom Nath
>>
>> On Fri, Jan 22, 2016 at 4:33 PM, Matthew Knepley <knepley at gmail.com>
>> wrote:
>> > On Fri, Jan 22, 2016 at 12:17 PM, Hom Nath Gharti <hng.email at gmail.com>
>> > wrote:
>> >>
>> >> Thanks Matt for great suggestion. One last question, do you know
>> >> whether the GPU capability of current PETSC version is matured enough
>> >> to try for my problem?
>> >
>> >
>> > The only thing that would really make sense to do on the GPU is the SEM
>> > integration, which
>> > would not be part of PETSc. This is what SPECFEM has optimized.
>> >
>> >   Thanks,
>> >
>> >     Matt
>> >
>> >>
>> >> Thanks again for your help.
>> >> Hom Nath
>> >>
>> >> On Fri, Jan 22, 2016 at 1:07 PM, Matthew Knepley <knepley at gmail.com>
>> >> wrote:
>> >> > On Fri, Jan 22, 2016 at 11:47 AM, Hom Nath Gharti
>> >> > <hng.email at gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> Thanks a lot.
>> >> >>
>> >> >> With AMG it did not converge within the iteration limit of 3000.
>> >> >>
>> >> >> In solid: elastic wave equation with added gravity term \rho
>> >> >> \nabla\phi
>> >> >> In fluid: acoustic wave equation with added gravity term \rho
>> >> >> \nabla\phi
>> >> >> Both solid and fluid: Poisson's equation for gravity
>> >> >> Outer space: Laplace's equation for gravity
>> >> >>
>> >> >> We combine so called mapped infinite element with spectral-element
>> >> >> method (higher order FEM that uses nodal quadrature) and solve in
>> >> >> frequency domain.
>> >> >
>> >> >
>> >> > 1) The Poisson and Laplace equation should be using MG, however you
>> >> > are
>> >> > using SEM, so
>> >> >     you would need to use a low order PC for the high order problem,
>> >> > also
>> >> > called p-MG (Paul Fischer), see
>> >> >
>> >> >       http://epubs.siam.org/doi/abs/10.1137/110834512
>> >> >
>> >> > 2) The acoustic wave equation is Helmholtz to us, and that needs
>> >> > special
>> >> > MG
>> >> > tweaks that
>> >> >      are still research material so I can understand using ASM.
>> >> >
>> >> > 3) Same thing for the elastic wave equations. Some people say they
>> >> > have
>> >> > this
>> >> > solved using
>> >> >     hierarchical matrix methods, something like
>> >> >
>> >> >       http://portal.nersc.gov/project/sparse/strumpack/
>> >> >
>> >> >     However, I think the jury is still out.
>> >> >
>> >> > If you can do 100 iterations of plain vanilla solvers, that seems
>> >> > like a
>> >> > win
>> >> > right now. You might improve
>> >> > the time using FS, but I am not sure about the iterations on the
>> >> > smaller
>> >> > problem.
>> >> >
>> >> >   Thanks,
>> >> >
>> >> >     Matt
>> >> >
>> >> >>
>> >> >> Hom Nath
>> >> >>
>> >> >> On Fri, Jan 22, 2016 at 12:16 PM, Matthew Knepley
>> >> >> <knepley at gmail.com>
>> >> >> wrote:
>> >> >> > On Fri, Jan 22, 2016 at 11:10 AM, Hom Nath Gharti
>> >> >> > <hng.email at gmail.com>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> Thanks Matt.
>> >> >> >>
>> >> >> >> Attached detailed info on ksp of a much smaller test. This is a
>> >> >> >> multiphysics problem.
>> >> >> >
>> >> >> >
>> >> >> > You are using FGMRES/ASM(ILU0). From your description below, this
>> >> >> > sounds
>> >> >> > like
>> >> >> > an elliptic system. I would at least try AMG (-pc_type gamg) to
>> >> >> > see
>> >> >> > how
>> >> >> > it
>> >> >> > does. Any
>> >> >> > other advice would have to be based on seeing the equations.
>> >> >> >
>> >> >> >   Thanks,
>> >> >> >
>> >> >> >     Matt
>> >> >> >
>> >> >> >>
>> >> >> >> Hom Nath
>> >> >> >>
>> >> >> >> On Fri, Jan 22, 2016 at 12:01 PM, Matthew Knepley
>> >> >> >> <knepley at gmail.com>
>> >> >> >> wrote:
>> >> >> >> > On Fri, Jan 22, 2016 at 10:52 AM, Hom Nath Gharti
>> >> >> >> > <hng.email at gmail.com>
>> >> >> >> > wrote:
>> >> >> >> >>
>> >> >> >> >> Dear all,
>> >> >> >> >>
>> >> >> >> >> I take this opportunity to ask for your important suggestion.
>> >> >> >> >>
>> >> >> >> >> I am solving an elastic-acoustic-gravity equation on the
>> >> >> >> >> planet.
>> >> >> >> >> I
>> >> >> >> >> have displacement vector (ux,uy,uz) in solid region,
>> >> >> >> >> displacement
>> >> >> >> >> potential (\xi) and pressure (p) in fluid region, and
>> >> >> >> >> gravitational
>> >> >> >> >> potential (\phi) in all of space. All these variables are
>> >> >> >> >> coupled.
>> >> >> >> >>
>> >> >> >> >> Currently, I am using MATMPIAIJ and form a single global
>> >> >> >> >> matrix.
>> >> >> >> >> Does
>> >> >> >> >> using a MATMPIBIJ or MATNEST improve the
>> >> >> >> >> convergence/efficiency
>> >> >> >> >> in
>> >> >> >> >> this case? For your information, total degrees of freedoms are
>> >> >> >> >> about
>> >> >> >> >> a
>> >> >> >> >> billion.
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > 1) For any solver question, we need to see the output of
>> >> >> >> > -ksp_view,
>> >> >> >> > and
>> >> >> >> > we
>> >> >> >> > would also like
>> >> >> >> >
>> >> >> >> >   -ksp_monitor_true_residual -ksp_converged_reason
>> >> >> >> >
>> >> >> >> > 2) MATNEST does not affect convergence, and MATMPIBAIJ only in
>> >> >> >> > the
>> >> >> >> > blocksize
>> >> >> >> > which you
>> >> >> >> >     could set without that format
>> >> >> >> >
>> >> >> >> > 3) However, you might see benefit from using something like
>> >> >> >> > PCFIELDSPLIT
>> >> >> >> > if
>> >> >> >> > you have multiphysics here
>> >> >> >> >
>> >> >> >> >    Matt
>> >> >> >> >
>> >> >> >> >>
>> >> >> >> >> Any suggestion would be greatly appreciated.
>> >> >> >> >>
>> >> >> >> >> Thanks,
>> >> >> >> >> Hom Nath
>> >> >> >> >>
>> >> >> >> >> On Fri, Jan 22, 2016 at 10:32 AM, Matthew Knepley
>> >> >> >> >> <knepley at gmail.com>
>> >> >> >> >> wrote:
>> >> >> >> >> > On Fri, Jan 22, 2016 at 9:27 AM, Mark Adams
>> >> >> >> >> > <mfadams at lbl.gov>
>> >> >> >> >> > wrote:
>> >> >> >> >> >>>
>> >> >> >> >> >>>
>> >> >> >> >> >>>
>> >> >> >> >> >>> I said the Hypre setup cost is not scalable,
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> I'd be a little careful here.  Scaling for the matrix
>> >> >> >> >> >> triple
>> >> >> >> >> >> product
>> >> >> >> >> >> is
>> >> >> >> >> >> hard and hypre does put effort into scaling. I don't have
>> >> >> >> >> >> any
>> >> >> >> >> >> data
>> >> >> >> >> >> however.
>> >> >> >> >> >> Do you?
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> > I used it for PyLith and saw this. I did not think any AMG
>> >> >> >> >> > had
>> >> >> >> >> > scalable
>> >> >> >> >> > setup time.
>> >> >> >> >> >
>> >> >> >> >> >    Matt
>> >> >> >> >> >
>> >> >> >> >> >>>
>> >> >> >> >> >>> but it can be amortized over the iterations. You can
>> >> >> >> >> >>> quantify
>> >> >> >> >> >>> this
>> >> >> >> >> >>> just by looking at the PCSetUp time as your increase the
>> >> >> >> >> >>> number
>> >> >> >> >> >>> of
>> >> >> >> >> >>> processes. I don't think they have a good
>> >> >> >> >> >>> model for the memory usage, and if they do, I do not know
>> >> >> >> >> >>> what
>> >> >> >> >> >>> it
>> >> >> >> >> >>> is.
>> >> >> >> >> >>> However, generally Hypre takes more
>> >> >> >> >> >>> memory than the agglomeration MG like ML or GAMG.
>> >> >> >> >> >>>
>> >> >> >> >> >>
>> >> >> >> >> >> agglomerations methods tend to have lower "grid
>> >> >> >> >> >> complexity",
>> >> >> >> >> >> that
>> >> >> >> >> >> is
>> >> >> >> >> >> smaller coarse grids, than classic AMG like in hypre. THis
>> >> >> >> >> >> is
>> >> >> >> >> >> more
>> >> >> >> >> >> of a
>> >> >> >> >> >> constant complexity and not a scaling issue though.  You
>> >> >> >> >> >> can
>> >> >> >> >> >> address
>> >> >> >> >> >> this
>> >> >> >> >> >> with parameters to some extent. But for elasticity, you
>> >> >> >> >> >> want
>> >> >> >> >> >> to
>> >> >> >> >> >> at
>> >> >> >> >> >> least
>> >> >> >> >> >> try, if not start with, GAMG or ML.
>> >> >> >> >> >>
>> >> >> >> >> >>>
>> >> >> >> >> >>>   Thanks,
>> >> >> >> >> >>>
>> >> >> >> >> >>>     Matt
>> >> >> >> >> >>>
>> >> >> >> >> >>>>
>> >> >> >> >> >>>>
>> >> >> >> >> >>>> Giang
>> >> >> >> >> >>>>
>> >> >> >> >> >>>> On Mon, Jan 18, 2016 at 5:25 PM, Jed Brown
>> >> >> >> >> >>>> <jed at jedbrown.org>
>> >> >> >> >> >>>> wrote:
>> >> >> >> >> >>>>>
>> >> >> >> >> >>>>> Hoang Giang Bui <hgbk2008 at gmail.com> writes:
>> >> >> >> >> >>>>>
>> >> >> >> >> >>>>> > Why P2/P2 is not for co-located discretization?
>> >> >> >> >> >>>>>
>> >> >> >> >> >>>>> Matt typed "P2/P2" when me meant "P2/P1".
>> >> >> >> >> >>>>
>> >> >> >> >> >>>>
>> >> >> >> >> >>>
>> >> >> >> >> >>>
>> >> >> >> >> >>>
>> >> >> >> >> >>> --
>> >> >> >> >> >>> What most experimenters take for granted before they begin
>> >> >> >> >> >>> their
>> >> >> >> >> >>> experiments is infinitely more interesting than any
>> >> >> >> >> >>> results
>> >> >> >> >> >>> to
>> >> >> >> >> >>> which
>> >> >> >> >> >>> their
>> >> >> >> >> >>> experiments lead.
>> >> >> >> >> >>> -- Norbert Wiener
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> > --
>> >> >> >> >> > What most experimenters take for granted before they begin
>> >> >> >> >> > their
>> >> >> >> >> > experiments
>> >> >> >> >> > is infinitely more interesting than any results to which
>> >> >> >> >> > their
>> >> >> >> >> > experiments
>> >> >> >> >> > lead.
>> >> >> >> >> > -- Norbert Wiener
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > --
>> >> >> >> > What most experimenters take for granted before they begin
>> >> >> >> > their
>> >> >> >> > experiments
>> >> >> >> > is infinitely more interesting than any results to which their
>> >> >> >> > experiments
>> >> >> >> > lead.
>> >> >> >> > -- Norbert Wiener
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > What most experimenters take for granted before they begin their
>> >> >> > experiments
>> >> >> > is infinitely more interesting than any results to which their
>> >> >> > experiments
>> >> >> > lead.
>> >> >> > -- Norbert Wiener
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > What most experimenters take for granted before they begin their
>> >> > experiments
>> >> > is infinitely more interesting than any results to which their
>> >> > experiments
>> >> > lead.
>> >> > -- Norbert Wiener
>> >
>> >
>> >
>> >
>> > --
>> > What most experimenters take for granted before they begin their
>> > experiments
>> > is infinitely more interesting than any results to which their
>> > experiments
>> > lead.
>> > -- Norbert Wiener
>
>
>
>
> --
> What most experimenters take for granted before they begin their experiments
> is infinitely more interesting than any results to which their experiments
> lead.
> -- Norbert Wiener

From jychang48 at gmail.com  Fri Jan 22 16:27:00 2016
From: jychang48 at gmail.com (Justin Chang)
Date: Fri, 22 Jan 2016 15:27:00 -0700
Subject: [petsc-users] Optimization methods in PETSc/TAO
Message-ID: <CAP2=TMgvMxkGAMu07KxcKewgAvo9rGEmmp4Qmt8LXOFOe0Dwpg@mail.gmail.com>

Hi all,

Consider the following problem:

minimize  1/2<c,Kc> - <c,f>
subject to  c >= 0                     (P1)

To solve (P1) using TAO, I recall that there were two recommended solvers
to use: TRON and BLMVM

I recently got reviews for this paper of mine that uses BLMVM and got
hammered for this, as I quote, "convenient yet inadequate choice" of
solver.

It was suggested that I use either semi smooth Newton methods or projected
Newton methods for the optimization problem. My question is, are these
methodologies/solvers available currently within PETSc/TAO?

1) I see that we have SNESVINEWTONSSLS, and I tried this over half a year
ago but it didn't seem to work. I believe I was told by one of the PETSc
developers (Matt?) that this was not the one to use?

2) Is TRON a type of projected Newton method? I know it's an active-set
Newton trust region, but is this a well-accepted high performing
optimization method to use?

I was also referred to ROL: https://trilinos.org/packages/rol but I am
guessing this isn't accessible/downloadable from petsc at the moment?

Thanks,
Justin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160122/ae1c5f8f/attachment.html>

From knepley at gmail.com  Fri Jan 22 16:42:20 2016
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 22 Jan 2016 16:42:20 -0600
Subject: [petsc-users] Optimization methods in PETSc/TAO
In-Reply-To: <CAP2=TMgvMxkGAMu07KxcKewgAvo9rGEmmp4Qmt8LXOFOe0Dwpg@mail.gmail.com>
References: <CAP2=TMgvMxkGAMu07KxcKewgAvo9rGEmmp4Qmt8LXOFOe0Dwpg@mail.gmail.com>
Message-ID: <CAMYG4Gnc-4UzbNTYPD65uzRXUBaXRWkQuKY+G+h5CjCgrjxxJw@mail.gmail.com>

On Fri, Jan 22, 2016 at 4:27 PM, Justin Chang <jychang48 at gmail.com> wrote:

> Hi all,
>
> Consider the following problem:
>
> minimize  1/2<c,Kc> - <c,f>
> subject to  c >= 0                     (P1)
>
> To solve (P1) using TAO, I recall that there were two recommended solvers
> to use: TRON and BLMVM
>
> I recently got reviews for this paper of mine that uses BLMVM and got
> hammered for this, as I quote, "convenient yet inadequate choice" of
> solver.
>

If they did not back this up with a citation it is just empty snobbery, not
surprising from some quarters.


> It was suggested that I use either semi smooth Newton methods or
> projected Newton methods for the optimization problem. My question is, are
> these methodologies/solvers available currently within PETSc/TAO?
>

You can Google TRON and BLMVM and they come up on the NEOS pages. BLMVM is
a gradient descent method, but
TRON is a Newton method, so trying it may silence the doubters.

  Matt


> 1) I see that we have SNESVINEWTONSSLS, and I tried this over half a year
> ago but it didn't seem to work. I believe I was told by one of the PETSc
> developers (Matt?) that this was not the one to use?
>
> 2) Is TRON a type of projected Newton method? I know it's an active-set
> Newton trust region, but is this a well-accepted high performing
> optimization method to use?
>
> I was also referred to ROL: https://trilinos.org/packages/rol but I am
> guessing this isn't accessible/downloadable from petsc at the moment?
>
> Thanks,
> Justin
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160122/c3564678/attachment.html>

From jychang48 at gmail.com  Fri Jan 22 16:57:45 2016
From: jychang48 at gmail.com (Justin Chang)
Date: Fri, 22 Jan 2016 15:57:45 -0700
Subject: [petsc-users] Optimization methods in PETSc/TAO
In-Reply-To: <CAMYG4Gnc-4UzbNTYPD65uzRXUBaXRWkQuKY+G+h5CjCgrjxxJw@mail.gmail.com>
References: <CAP2=TMgvMxkGAMu07KxcKewgAvo9rGEmmp4Qmt8LXOFOe0Dwpg@mail.gmail.com>
	<CAMYG4Gnc-4UzbNTYPD65uzRXUBaXRWkQuKY+G+h5CjCgrjxxJw@mail.gmail.com>
Message-ID: <CAP2=TMieLvBvb9H_PZUfDcL6DhZJYTruzoy40r+bXoWCKdC23w@mail.gmail.com>

This was one of the citations provided:

M. Ulbrich, "Semismooth Newton Methods for Variational
Inequalities and Constrained Optimization Problems in
Function Spaces", SIAM, 2011,

Haven't looked into this in detail, but is what's described in that
equivalent to the SNESVINEWTONSSLS?

On Fri, Jan 22, 2016 at 3:42 PM, Matthew Knepley <knepley at gmail.com> wrote:

> On Fri, Jan 22, 2016 at 4:27 PM, Justin Chang <jychang48 at gmail.com> wrote:
>
>> Hi all,
>>
>> Consider the following problem:
>>
>> minimize  1/2<c,Kc> - <c,f>
>> subject to  c >= 0                     (P1)
>>
>> To solve (P1) using TAO, I recall that there were two recommended solvers
>> to use: TRON and BLMVM
>>
>> I recently got reviews for this paper of mine that uses BLMVM and got
>> hammered for this, as I quote, "convenient yet inadequate choice" of
>> solver.
>>
>
> If they did not back this up with a citation it is just empty snobbery,
> not surprising from some quarters.
>
>
>> It was suggested that I use either semi smooth Newton methods or
>> projected Newton methods for the optimization problem. My question is, are
>> these methodologies/solvers available currently within PETSc/TAO?
>>
>
> You can Google TRON and BLMVM and they come up on the NEOS pages. BLMVM is
> a gradient descent method, but
> TRON is a Newton method, so trying it may silence the doubters.
>
>   Matt
>
>
>> 1) I see that we have SNESVINEWTONSSLS, and I tried this over half a year
>> ago but it didn't seem to work. I believe I was told by one of the PETSc
>> developers (Matt?) that this was not the one to use?
>>
>> 2) Is TRON a type of projected Newton method? I know it's an active-set
>> Newton trust region, but is this a well-accepted high performing
>> optimization method to use?
>>
>> I was also referred to ROL: https://trilinos.org/packages/rol but I am
>> guessing this isn't accessible/downloadable from petsc at the moment?
>>
>> Thanks,
>> Justin
>>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160122/96a95be3/attachment.html>

From knepley at gmail.com  Sun Jan 24 05:26:24 2016
From: knepley at gmail.com (Matthew Knepley)
Date: Sun, 24 Jan 2016 05:26:24 -0600
Subject: [petsc-users] PCFIELDSPLIT question
In-Reply-To: <CAL+XNdWXY93NXrzcK8v0FZvN0-uYqPKCwk-kP5VcNPxOSkU9dg@mail.gmail.com>
References: <CAL+XNdWXY93NXrzcK8v0FZvN0-uYqPKCwk-kP5VcNPxOSkU9dg@mail.gmail.com>
Message-ID: <CAMYG4G=gJqkkduhpQBNCDoCzMCO+9+0OR21ANc84_kdw1KhuTg@mail.gmail.com>

On Fri, Jan 22, 2016 at 2:19 PM, Hom Nath Gharti <hng.email at gmail.com>
wrote:

> Dear all,
>
> I am new to PcFieldSplit.
>
> I have a matrix formed using MATMPIAIJ. Is it possible to use
> PCFIELDSPLIT operations in this type of matrix? Or does it have to be
> MATMPIBIJ or MATNEST format?
>

Yes, you can split AIJ.


> If possible for MATMPIAIJ, could anybody provide me a simple example
> or few steps? Variables in the equations are displacement vector,
> scalar potential and pressure.
>

If you do not have a collocated discretization, then you have to use


http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCFieldSplitSetIS.html

  Thanks,

     Matt


> Thanks for help.
>
> Hom Nath
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160124/b04e72a9/attachment.html>

From hng.email at gmail.com  Sun Jan 24 18:14:34 2016
From: hng.email at gmail.com (Hom Nath Gharti)
Date: Sun, 24 Jan 2016 19:14:34 -0500
Subject: [petsc-users] PCFIELDSPLIT question
In-Reply-To: <CAMYG4G=gJqkkduhpQBNCDoCzMCO+9+0OR21ANc84_kdw1KhuTg@mail.gmail.com>
References: <CAL+XNdWXY93NXrzcK8v0FZvN0-uYqPKCwk-kP5VcNPxOSkU9dg@mail.gmail.com>
	<CAMYG4G=gJqkkduhpQBNCDoCzMCO+9+0OR21ANc84_kdw1KhuTg@mail.gmail.com>
Message-ID: <CAL+XNdVasmxj5_GxqWNUrq+DhTF-dpM-mBWcHF+DFA-sJ+eoFQ@mail.gmail.com>

Thank you so much Matt! I will try.

Hom Nath

On Sun, Jan 24, 2016 at 6:26 AM, Matthew Knepley <knepley at gmail.com> wrote:
> On Fri, Jan 22, 2016 at 2:19 PM, Hom Nath Gharti <hng.email at gmail.com>
> wrote:
>>
>> Dear all,
>>
>> I am new to PcFieldSplit.
>>
>> I have a matrix formed using MATMPIAIJ. Is it possible to use
>> PCFIELDSPLIT operations in this type of matrix? Or does it have to be
>> MATMPIBIJ or MATNEST format?
>
>
> Yes, you can split AIJ.
>
>>
>> If possible for MATMPIAIJ, could anybody provide me a simple example
>> or few steps? Variables in the equations are displacement vector,
>> scalar potential and pressure.
>
>
> If you do not have a collocated discretization, then you have to use
>
>
> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCFieldSplitSetIS.html
>
>   Thanks,
>
>      Matt
>
>>
>> Thanks for help.
>>
>> Hom Nath
>
>
>
>
> --
> What most experimenters take for granted before they begin their experiments
> is infinitely more interesting than any results to which their experiments
> lead.
> -- Norbert Wiener

From praveenpetsc at gmail.com  Mon Jan 25 00:26:32 2016
From: praveenpetsc at gmail.com (praveen kumar)
Date: Mon, 25 Jan 2016 11:56:32 +0530
Subject: [petsc-users] error message from GDB
Message-ID: <CAJC+_cOrfyAmACaGd-azvFOxpAJTvw4Mo=XRaQoq9PHwdSLZxA@mail.gmail.com>

I am employing PETSc for DD in existing serial fortran code. the program
ran for few seconds and showed segmentation fault core dumped. would anyone
suggest how to fix this
error message from GDB:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff64f7cc0 in PetscCheckPointer (ptr=0x15e00007fff,
dtype=PETSC_OBJECT) at /home/praveen/petsc/src/sys/error/checkptr.c:106
106          PETSC_UNUSED volatile PetscClassId classid =
((PetscObject)ptr)->classid;
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160125/4f11bd78/attachment.html>

From knepley at gmail.com  Mon Jan 25 00:41:07 2016
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 25 Jan 2016 00:41:07 -0600
Subject: [petsc-users] [petsc-maint] error message from GDB
In-Reply-To: <CAJC+_cOrfyAmACaGd-azvFOxpAJTvw4Mo=XRaQoq9PHwdSLZxA@mail.gmail.com>
References: <CAJC+_cOrfyAmACaGd-azvFOxpAJTvw4Mo=XRaQoq9PHwdSLZxA@mail.gmail.com>
Message-ID: <CAMYG4GkeugDwGoCmPpcJdY_0-D8W9APwwnO4kZU6GsEyX8ffYg@mail.gmail.com>

We will need to see a stack trace. You can use the debugger,
-start_in_debugger, to get one.

Also, always send the whole error message.

  Thanks,

    Matt

On Mon, Jan 25, 2016 at 12:26 AM, praveen kumar <praveenpetsc at gmail.com>
wrote:

> I am employing PETSc for DD in existing serial fortran code. the program
> ran for few seconds and showed segmentation fault core dumped. would anyone
> suggest how to fix this
> error message from GDB:
>
> Program received signal SIGSEGV, Segmentation fault.
> 0x00007ffff64f7cc0 in PetscCheckPointer (ptr=0x15e00007fff,
> dtype=PETSC_OBJECT) at /home/praveen/petsc/src/sys/error/checkptr.c:106
> 106          PETSC_UNUSED volatile PetscClassId classid =
> ((PetscObject)ptr)->classid;
>
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160125/b5c59862/attachment.html>

From praveenpetsc at gmail.com  Mon Jan 25 04:26:40 2016
From: praveenpetsc at gmail.com (praveen kumar)
Date: Mon, 25 Jan 2016 15:56:40 +0530
Subject: [petsc-users] [petsc-maint] error message from GDB
In-Reply-To: <CAMYG4GkeugDwGoCmPpcJdY_0-D8W9APwwnO4kZU6GsEyX8ffYg@mail.gmail.com>
References: <CAJC+_cOrfyAmACaGd-azvFOxpAJTvw4Mo=XRaQoq9PHwdSLZxA@mail.gmail.com>
	<CAMYG4GkeugDwGoCmPpcJdY_0-D8W9APwwnO4kZU6GsEyX8ffYg@mail.gmail.com>
Message-ID: <CAJC+_cNceSROYvCHUkoW1Yx152MpOktvetxun8H6rwzRfoLPGg@mail.gmail.com>

Thanks Matt. I ran with -start_in_debugger and fixed the error. The code is
running but I can't figure out why the results are wrong when compared with
serial code. if you get time, please go through the code. it is a simple 2D
conduction code and I?ve employed DMDAcreate2D.


Thanks,
Praveen


On Mon, Jan 25, 2016 at 12:11 PM, Matthew Knepley <knepley at gmail.com> wrote:

> We will need to see a stack trace. You can use the debugger,
> -start_in_debugger, to get one.
>
> Also, always send the whole error message.
>
>   Thanks,
>
>     Matt
>
> On Mon, Jan 25, 2016 at 12:26 AM, praveen kumar <praveenpetsc at gmail.com>
> wrote:
>
>> I am employing PETSc for DD in existing serial fortran code. the program
>> ran for few seconds and showed segmentation fault core dumped. would anyone
>> suggest how to fix this
>> error message from GDB:
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> 0x00007ffff64f7cc0 in PetscCheckPointer (ptr=0x15e00007fff,
>> dtype=PETSC_OBJECT) at /home/praveen/petsc/src/sys/error/checkptr.c:106
>> 106          PETSC_UNUSED volatile PetscClassId classid =
>> ((PetscObject)ptr)->classid;
>>
>>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160125/5698d7c5/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.F90
Type: text/x-fortran
Size: 14263 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160125/5698d7c5/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: input
Type: application/octet-stream
Size: 454 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160125/5698d7c5/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: makefile
Type: application/octet-stream
Size: 515 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160125/5698d7c5/attachment-0003.obj>

From zocca.marco at gmail.com  Mon Jan 25 04:34:00 2016
From: zocca.marco at gmail.com (Marco Zocca)
Date: Mon, 25 Jan 2016 11:34:00 +0100
Subject: [petsc-users] erratic bug with nested PETSc and SLEPc
Message-ID: <CAKE6T0Rq-7QtAPTKzN5CP5u+h1LZyTT4c4R+__U5nUz7b0vRDw@mail.gmail.com>

Dear all,

I have a simple code with a matrix filling, assembly and output to stdout.
This in turn is wrapped within a SLEPc bracket, and that in turn in a
PETSc bracket, both called with default options
(`XInitializeNoArguments`).

I run this on my laptop, MPI comm size == 1.

Issue: _sometimes_ the above crashes upon exit from the SLEPc bracket
(i.e. after printing out the matrix), with an error code > 8000 . I
haven't found this documented anywhere.
It's funny because this doesn't happen with probability 1 and no
conditions change (running from a makefile).

Other times it simply works.

In general, when using both PETSc and SLEPc functionality, is it
enough to link to SLEPc alone? Does it take care of importing all of
PETSc?

Any hints re. this behaviour?


Thank you in advance and kind regards,

Marco

From torquil at gmail.com  Mon Jan 25 04:58:41 2016
From: torquil at gmail.com (=?UTF-8?Q?Torquil_Macdonald_S=c3=b8rensen?=)
Date: Mon, 25 Jan 2016 11:58:41 +0100
Subject: [petsc-users] Question about TSSetIJacobian examples
Message-ID: <56A5FFE1.6030402@gmail.com>

Hi!

I have been looking at some of the PETSc examples where TSSetIJacobian,
and there is one thing which is unclear to me. Consider e.g. the example:

http://www.mcs.anl.gov/petsc/petsc-current/src/ts/examples/tutorials/ex8.c.html

In the function RoberJacobian(), CEJacobian(), OregoJacobian(), there
are two matrix function arguments A and B. The matrix A is the one that
is actually set in the code. My question is: what is the purpose of

MatAssemblyBegin(B,MAT_FINAL_ASSEMBLY);
MatAssemblyEnd(B,MAT_FINAL_ASSEMBLY);

when A != B at the end of the function? How does that piece of code
affect B? In the documentation of these functions it says that they are
to be called after e.g. MatSetValues. But MatSetValues have not been
called on B in those functions, so that's why I'm wondering what those
lines are for.

Best regards,
Torquil S?rensen


From jroman at dsic.upv.es  Mon Jan 25 05:03:01 2016
From: jroman at dsic.upv.es (Jose E. Roman)
Date: Mon, 25 Jan 2016 12:03:01 +0100
Subject: [petsc-users] erratic bug with nested PETSc and SLEPc
In-Reply-To: <CAKE6T0Rq-7QtAPTKzN5CP5u+h1LZyTT4c4R+__U5nUz7b0vRDw@mail.gmail.com>
References: <CAKE6T0Rq-7QtAPTKzN5CP5u+h1LZyTT4c4R+__U5nUz7b0vRDw@mail.gmail.com>
Message-ID: <E1B9641A-0B75-493A-A79B-2AD1FDCBA572@dsic.upv.es>


> El 25 ene 2016, a las 11:34, Marco Zocca <zocca.marco at gmail.com> escribi?:
> 
> Dear all,
> 
> I have a simple code with a matrix filling, assembly and output to stdout.
> This in turn is wrapped within a SLEPc bracket, and that in turn in a
> PETSc bracket, both called with default options
> (`XInitializeNoArguments`).
> 
> I run this on my laptop, MPI comm size == 1.
> 
> Issue: _sometimes_ the above crashes upon exit from the SLEPc bracket
> (i.e. after printing out the matrix), with an error code > 8000 . I
> haven't found this documented anywhere.
> It's funny because this doesn't happen with probability 1 and no
> conditions change (running from a makefile).
> 
> Other times it simply works.
> 
> In general, when using both PETSc and SLEPc functionality, is it
> enough to link to SLEPc alone? Does it take care of importing all of
> PETSc?
> 
> Any hints re. this behaviour?
> 
> 
> Thank you in advance and kind regards,
> 
> Marco

SLEPc makefiles include PETSc makefiles. Generally you just need to include ${SLEPC_DIR}/lib/slepc/conf/slepc_common and then add e.g. ${SLEPC_EPS_LIB} in your link line, which in turn  will add ${PETSC_KSP_LIB}. [Note: if you need other components of PETSc such as SNES or TS you may need to add these, but it is usually not necessary unless PETSc has been configured --with-single-library=0].

If you call both SlepcInitialize() and PetscInitialize(), in any order, it should work. It should work also with the NoArguments versions. So I don't know where the problem is. If you share a test code I could try to reproduce the problem.

Jose


From torquil at gmail.com  Mon Jan 25 05:09:09 2016
From: torquil at gmail.com (=?UTF-8?Q?Torquil_Macdonald_S=c3=b8rensen?=)
Date: Mon, 25 Jan 2016 12:09:09 +0100
Subject: [petsc-users] Question about TSSetIJacobian examples
In-Reply-To: <56A5FFE1.6030402@gmail.com>
References: <56A5FFE1.6030402@gmail.com>
Message-ID: <56A60255.5000908@gmail.com>

Sorry, I meant: what is the reason for MatAssemblyBegin/End being run
for matrix A, in the case when A != B?

Best regards,
Torquil S?rensen

On 25/01/16 11:58, Torquil Macdonald S?rensen wrote:
> Hi!
>
> I have been looking at some of the PETSc examples where TSSetIJacobian,
> and there is one thing which is unclear to me. Consider e.g. the example:
>
> http://www.mcs.anl.gov/petsc/petsc-current/src/ts/examples/tutorials/ex8.c.html
>
> In the function RoberJacobian(), CEJacobian(), OregoJacobian(), there
> are two matrix function arguments A and B. The matrix A is the one that
> is actually set in the code. My question is: what is the purpose of
>
> MatAssemblyBegin(B,MAT_FINAL_ASSEMBLY);
> MatAssemblyEnd(B,MAT_FINAL_ASSEMBLY);
>
> when A != B at the end of the function? How does that piece of code
> affect B? In the documentation of these functions it says that they are
> to be called after e.g. MatSetValues. But MatSetValues have not been
> called on B in those functions, so that's why I'm wondering what those
> lines are for.
>
> Best regards,
> Torquil S?rensen
>


From hgbk2008 at gmail.com  Mon Jan 25 11:13:58 2016
From: hgbk2008 at gmail.com (Hoang Giang Bui)
Date: Mon, 25 Jan 2016 18:13:58 +0100
Subject: [petsc-users] Why use MATMPIBAIJ?
In-Reply-To: <CAMYG4GmGt4wCmSzxUYPHw6xnVmdmnvTc7+NY+m7ws6iQ0up-DQ@mail.gmail.com>
References: <CAP2=TMiuZV+DDPk6dqPjZ5Ndy8W5C5GaEreny7_FQuZ39i1gBw@mail.gmail.com>
	<2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov>
	<CAP2=TMhZeZ4JZ3AKARajab8CmFwdANnh0HOHqDDQRd8JfGkO3Q@mail.gmail.com>
	<DAA1C465-680F-4A5E-BA8F-9C6F7C055DAD@mcs.anl.gov>
	<CAP2=TMjwihEL1V7NPNByiMWfWfC5Q0MJyE1ByxnLokx1Df+OJA@mail.gmail.com>
	<CAJW_hKeU9MO8v+ub+5CjnaH3OpBkQNtGA8ZEiDtbYoQnSQ5rGQ@mail.gmail.com>
	<87fuy07zvi.fsf@jedbrown.org>
	<B1B62B13-BE6F-4B51-BC1A-F45FEF379854@mcs.anl.gov>
	<CAJW_hKcyTTCK1ErPFeu4hbeW+CD=PQJd_=H3xNXdO7TzhvfQOw@mail.gmail.com>
	<CAMYG4GkedkT9weo7UCUpTXP_f0jRBUYoySKXP5=q+os2Hpm=+Q@mail.gmail.com>
	<CAJW_hKfFoFSiwZ7kzCcXqFm3kkhLa4PBxTvb3NHHFWNLFqLyqA@mail.gmail.com>
	<87si1ug8hl.fsf@jedbrown.org>
	<CAJW_hKdZsTHkQdN_zs=Ja0e-Z7jY5EMY+QPuRBZny8pri_iYEw@mail.gmail.com>
	<CAMYG4GmYXAF3E0qys6ijCdWRZSRCbcxHTK1TS3vACLrd8-d5Cw@mail.gmail.com>
	<CAJW_hKfWVBiXvthz=go2UFN1nf8ngyFh64yqTHNq=_xav5PaLw@mail.gmail.com>
	<CAMYG4GmGt4wCmSzxUYPHw6xnVmdmnvTc7+NY+m7ws6iQ0up-DQ@mail.gmail.com>
Message-ID: <CAJW_hKe5Y+f2DfuAKzzxkPniGanUHXXUtbcW4ZzOPBg1P0n3Zg@mail.gmail.com>

OK, let's come back to my problem. I got your point about the interaction
between components in one block. In my case, the interaction is strong.

As you said, I try this:

        ierr = KSPSetFromOptions(ksp); CHKERRQ(ierr);
        ierr = PCFieldSplitGetSubKSP(pc, &nsplits, &sub_ksp); CHKERRQ(ierr);
        ksp_U = sub_ksp[0];
        ierr = KSPGetOperators(ksp_U, &A_U, &P_U); CHKERRQ(ierr);
        ierr = MatSetBlockSize(A_U, 3); CHKERRQ(ierr);
        ierr = MatSetBlockSize(P_U, 3); CHKERRQ(ierr);
        ierr = PetscFree(sub_ksp); CHKERRQ(ierr);

But it seems doesn't work. The output from -ksp_view shows that matrix
passed to Hypre still has bs=1

KSP Object:    (fieldsplit_u_)     8 MPI processes
      type: preonly
      maximum iterations=10000, initial guess is zero
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using NONE norm type for convergence test
    PC Object:    (fieldsplit_u_)     8 MPI processes
      type: hypre
        HYPRE BoomerAMG preconditioning
        HYPRE BoomerAMG: Cycle type V
        HYPRE BoomerAMG: Maximum number of levels 25
        HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1
        HYPRE BoomerAMG: Convergence tolerance PER hypre call 0
        HYPRE BoomerAMG: Threshold for strong coupling 0.25
        HYPRE BoomerAMG: Interpolation truncation factor 0
        HYPRE BoomerAMG: Interpolation: max elements per row 0
        HYPRE BoomerAMG: Number of levels of aggressive coarsening 0
        HYPRE BoomerAMG: Number of paths for aggressive coarsening 1
        HYPRE BoomerAMG: Maximum row sums 0.9
        HYPRE BoomerAMG: Sweeps down         1
        HYPRE BoomerAMG: Sweeps up           1
        HYPRE BoomerAMG: Sweeps on coarse    1
        HYPRE BoomerAMG: Relax down          symmetric-SOR/Jacobi
        HYPRE BoomerAMG: Relax up            symmetric-SOR/Jacobi
        HYPRE BoomerAMG: Relax on coarse     Gaussian-elimination
        HYPRE BoomerAMG: Relax weight  (all)      1
        HYPRE BoomerAMG: Outer relax weight (all) 1
        HYPRE BoomerAMG: Using CF-relaxation
        HYPRE BoomerAMG: Measure type        local
        HYPRE BoomerAMG: Coarsen type        PMIS
        HYPRE BoomerAMG: Interpolation type  classical
      linear system matrix = precond matrix:
      Mat Object:      (fieldsplit_u_)       8 MPI processes
        type: mpiaij
        rows=792333, cols=792333
        total: nonzeros=1.39004e+08, allocated nonzeros=1.39004e+08
        total number of mallocs used during MatSetValues calls =0
          using I-node (on process 0) routines: found 30057 nodes, limit
used is 5

In other test, I can see the block size bs=3 in the section of Mat Object

Regardless the setup cost of Hypre AMG, I saw it gives quite a radical
performance, providing that the material parameters does not vary strongly,
and the geometry is regular enough.


Giang

On Fri, Jan 22, 2016 at 2:57 PM, Matthew Knepley <knepley at gmail.com> wrote:

> On Fri, Jan 22, 2016 at 7:27 AM, Hoang Giang Bui <hgbk2008 at gmail.com>
> wrote:
>
>> DO you mean the option pc_fieldsplit_block_size? In this thread:
>>
>> http://petsc-users.mcs.anl.narkive.com/qSHIOFhh/fieldsplit-error
>>
>
> No. "Block Size" is confusing on PETSc since it is used to do several
> things. Here block size
> is being used to split the matrix. You do not need this since you are
> prescribing your splits. The
> matrix block size is used two ways:
>
>   1) To indicate that matrix values come in logically dense blocks
>
>   2) To change the storage to match this logical arrangement
>
> After everything works, we can just indicate to the submatrix which is
> extracted that it has a
> certain block size. However, for the Laplacian I expect it not to matter.
>
>
>> It assumes you have a constant number of fields at each grid point, am I
>> right? However, my field split is not constant, like
>> [u1_x   u1_y    u1_z    p_1    u2_x    u2_y    u2_z    u3_x    u3_y
>>  u3_z    p_3    u4_x    u4_y    u4_z]
>>
>> Subsequently the fieldsplit is
>> [u1_x   u1_y    u1_z    u2_x    u2_y    u2_z    u3_x    u3_y    u3_z
>> u4_x    u4_y    u4_z]
>> [p_1    p_3]
>>
>> Then what is the option to set block size 3 for split 0?
>>
>> Sorry, I search several forum threads but cannot figure out the options
>> as you said.
>>
>>
>>
>>> You can still do that. It can be done with options once the
>>> decomposition is working. Its true that these solvers
>>> work better with the block size set. However, if its the P2 Laplacian it
>>> does not really matter since its uncoupled.
>>>
>>> Yes, I agree it's uncoupled with the other field, but the crucial factor
>> defining the quality of the block preconditioner is the approximate
>> inversion of individual block. I would merely try block Jacobi first,
>> because it's quite simple. Nevertheless, fieldsplit implements other nice
>> things, like Schur complement, etc.
>>
>
> I think concepts are getting confused here. I was talking about the
> interaction of components in one block (the P2 block). You
> are talking about interaction between blocks.
>
>   Thanks,
>
>      Matt
>
>
>> Giang
>>
>>
>>
>> On Fri, Jan 22, 2016 at 11:15 AM, Matthew Knepley <knepley at gmail.com>
>> wrote:
>>
>>> On Fri, Jan 22, 2016 at 3:40 AM, Hoang Giang Bui <hgbk2008 at gmail.com>
>>> wrote:
>>>
>>>> Hi Matt
>>>> I would rather like to set the block size for block P2 too. Why?
>>>>
>>>> Because in one of my test (for problem involves only [u_x u_y u_z]),
>>>> the gmres + Hypre AMG converges in 50 steps with block size 3, whereby it
>>>> increases to 140 if block size is 1 (see attached files).
>>>>
>>>
>>> You can still do that. It can be done with options once the
>>> decomposition is working. Its true that these solvers
>>> work better with the block size set. However, if its the P2 Laplacian it
>>> does not really matter since its uncoupled.
>>>
>>> This gives me the impression that AMG will give better inversion for
>>>> "P2" block if I can set its block size to 3. Of course it's still an
>>>> hypothesis but worth to try.
>>>>
>>>> Another question: In one of the Petsc presentation, you said the Hypre
>>>> AMG does not scale well, because set up cost amortize the iterations. How
>>>> is it quantified? and what is the memory overhead?
>>>>
>>>
>>> I said the Hypre setup cost is not scalable, but it can be amortized
>>> over the iterations. You can quantify this
>>> just by looking at the PCSetUp time as your increase the number of
>>> processes. I don't think they have a good
>>> model for the memory usage, and if they do, I do not know what it is.
>>> However, generally Hypre takes more
>>> memory than the agglomeration MG like ML or GAMG.
>>>
>>>   Thanks,
>>>
>>>     Matt
>>>
>>>
>>>>
>>>> Giang
>>>>
>>>> On Mon, Jan 18, 2016 at 5:25 PM, Jed Brown <jed at jedbrown.org> wrote:
>>>>
>>>>> Hoang Giang Bui <hgbk2008 at gmail.com> writes:
>>>>>
>>>>> > Why P2/P2 is not for co-located discretization?
>>>>>
>>>>> Matt typed "P2/P2" when me meant "P2/P1".
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160125/1f174f25/attachment-0001.html>

From bsmith at mcs.anl.gov  Mon Jan 25 11:34:01 2016
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Mon, 25 Jan 2016 11:34:01 -0600
Subject: [petsc-users] Question about TSSetIJacobian examples
In-Reply-To: <56A5FFE1.6030402@gmail.com>
References: <56A5FFE1.6030402@gmail.com>
Message-ID: <5BE437F5-F68D-454D-BA55-345581C53C63@mcs.anl.gov>


    The reason for the MatAssembly... on A when A is not B is when using a matrix-free A. For example -snes_mf_operator
Recall that matrix free matrix vector products with finite differences are computed with 

                     F(U + alpha*dx) - F(U)
J(U)*dx =  -----------------------------------
                         alpha*dx

dx, of course, is different for each call to the multiply. Each new Newton step uses a new U.  The MatAssemblyBegin/End() is when the matrix free matrix A 
is informed of the new U value (otherwise even with new Newton steps the original U from the first Newton step would be used forever); this is handled internally by the MatCreateSNESMF() object.

   Barry


> On Jan 25, 2016, at 4:58 AM, Torquil Macdonald S?rensen <torquil at gmail.com> wrote:
> 
> Hi!
> 
> I have been looking at some of the PETSc examples where TSSetIJacobian,
> and there is one thing which is unclear to me. Consider e.g. the example:
> 
> http://www.mcs.anl.gov/petsc/petsc-current/src/ts/examples/tutorials/ex8.c.html
> 
> In the function RoberJacobian(), CEJacobian(), OregoJacobian(), there
> are two matrix function arguments A and B. The matrix A is the one that
> is actually set in the code. My question is: what is the purpose of
> 
> MatAssemblyBegin(B,MAT_FINAL_ASSEMBLY);
> MatAssemblyEnd(B,MAT_FINAL_ASSEMBLY);
> 
> when A != B at the end of the function? How does that piece of code
> affect B? In the documentation of these functions it says that they are
> to be called after e.g. MatSetValues. But MatSetValues have not been
> called on B in those functions, so that's why I'm wondering what those
> lines are for.
> 
> Best regards,
> Torquil S?rensen
> 


From kalan019 at umn.edu  Mon Jan 25 12:15:04 2016
From: kalan019 at umn.edu (Vasileios Kalantzis)
Date: Mon, 25 Jan 2016 12:15:04 -0600
Subject: [petsc-users] MatSetValues
Message-ID: <CAFe60dcpNC2v_Y38nxRrYft4DKmBQ0sMVHeErp_xK+SUewuATQ@mail.gmail.com>

Dear all,

I am trying to form an approximation of the
Schur complement S = C-E'*(B\E) of a matrix
A = [B, E ; E', C]. Matrix C is stored as a
distributed Mat object while matrices B and E
are locally distributed to each processor (the
block partitioning comes from a Domain
Decomposition point-of-view). All matrices B, E,
and C are sparse.

I already have a sparse version of -E'*(B\E)
computed. Moreover, -E'*(B\E)  is block-diagonal.
The only issue now is how to merge (add) C and
-E'*(B\E). The way i do the addition right now is
based on checking every entry of -E'*(B\E) and,
if larger than a threshold value, add it to C using
the MatSetValue routine. The above is being done
in parallel for each diagonal block of -E'*(B\E).

The code works fine numerically but my approach
is too slow if -E'*(B\E) is not highly sparse.

I know that I can set many entries together by
using the MatSetValues routine but I am not
sure how to do it because the sparsity pattern of
each column of -E'*(B\E) differs. Maybe I can
assemble the sparsified Schur complement
column-by-column using MatSetValues but is
there any other idea perhaps?

Thanks ! :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160125/af1350e2/attachment.html>

From bsmith at mcs.anl.gov  Mon Jan 25 12:43:26 2016
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Mon, 25 Jan 2016 12:43:26 -0600
Subject: [petsc-users] Why use MATMPIBAIJ?
In-Reply-To: <CAJW_hKe5Y+f2DfuAKzzxkPniGanUHXXUtbcW4ZzOPBg1P0n3Zg@mail.gmail.com>
References: <CAP2=TMiuZV+DDPk6dqPjZ5Ndy8W5C5GaEreny7_FQuZ39i1gBw@mail.gmail.com>
	<2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov>
	<CAP2=TMhZeZ4JZ3AKARajab8CmFwdANnh0HOHqDDQRd8JfGkO3Q@mail.gmail.com>
	<DAA1C465-680F-4A5E-BA8F-9C6F7C055DAD@mcs.anl.gov>
	<CAP2=TMjwihEL1V7NPNByiMWfWfC5Q0MJyE1ByxnLokx1Df+OJA@mail.gmail.com>
	<CAJW_hKeU9MO8v+ub+5CjnaH3OpBkQNtGA8ZEiDtbYoQnSQ5rGQ@mail.gmail.com>
	<87fuy07zvi.fsf@jedbrown.org>
	<B1B62B13-BE6F-4B51-BC1A-F45FEF379854@mcs.anl.gov>
	<CAJW_hKcyTTCK1ErPFeu4hbeW+CD=PQJd_=H3xNXdO7TzhvfQOw@mail.gmail.com>
	<CAMYG4GkedkT9weo7UCUpTXP_f0jRBUYoySKXP5=q+os2Hpm=+Q@mail.gmail.com>
	<CAJW_hKfFoFSiwZ7kzCcXqFm3kkhLa4PBxTvb3NHHFWNLFqLyqA@mail.gmail.com>
	<87si1ug8hl.fsf@jedbrown.org>
	<CAJW_hKdZsTHkQdN_zs=Ja0e-Z7jY5EMY+QPuRBZny8pri_iYEw@mail.gmail.com>
	<CAMYG4GmYXAF3E0qys6ijCdWRZSRCbcxHTK1TS3vACLrd8-d5Cw@mail.gmail.com>
	<CAJW_hKfWVBiXvthz=go2UFN1nf8ngyFh64yqTHNq=_xav5PaLw@mail.gmail.com>
	<CAMYG4GmGt4wCmSzxUYPHw6xnVmdmnvTc7+NY+m7ws6iQ0up-DQ@mail.gmail.com>
	<CAJW_hKe5Y+f2DfuAKzzxkPniGanUHXXUtbcW4ZzOPBg1P0n3Zg@mail.gmail.com>
Message-ID: <AC95667B-741D-4F81-9CF8-A83AA6E17A09@mcs.anl.gov>


> On Jan 25, 2016, at 11:13 AM, Hoang Giang Bui <hgbk2008 at gmail.com> wrote:
> 
> OK, let's come back to my problem. I got your point about the interaction between components in one block. In my case, the interaction is strong.
> 
> As you said, I try this:
> 
>         ierr = KSPSetFromOptions(ksp); CHKERRQ(ierr);
>         ierr = PCFieldSplitGetSubKSP(pc, &nsplits, &sub_ksp); CHKERRQ(ierr);
>         ksp_U = sub_ksp[0];
>         ierr = KSPGetOperators(ksp_U, &A_U, &P_U); CHKERRQ(ierr);
>         ierr = MatSetBlockSize(A_U, 3); CHKERRQ(ierr);
>         ierr = MatSetBlockSize(P_U, 3); CHKERRQ(ierr);
>         ierr = PetscFree(sub_ksp); CHKERRQ(ierr);
> 
> But it seems doesn't work. The output from -ksp_view shows that matrix passed to Hypre still has bs=1

   Hmm, this is strange. MatSetBlockSize() should have either set the block size to 3 or generated an error.  Can you run in the debugger on one process and put a break point in MatSetBlockSize() and see what it is setting the block size to. Then in PCSetUp_hypre() you can see what it is passing to hypre as the block size and maybe figure out how it becomes 1.

  Barry


> 
> KSP Object:    (fieldsplit_u_)     8 MPI processes
>       type: preonly
>       maximum iterations=10000, initial guess is zero
>       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>       left preconditioning
>       using NONE norm type for convergence test
>     PC Object:    (fieldsplit_u_)     8 MPI processes
>       type: hypre
>         HYPRE BoomerAMG preconditioning
>         HYPRE BoomerAMG: Cycle type V
>         HYPRE BoomerAMG: Maximum number of levels 25
>         HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1
>         HYPRE BoomerAMG: Convergence tolerance PER hypre call 0
>         HYPRE BoomerAMG: Threshold for strong coupling 0.25
>         HYPRE BoomerAMG: Interpolation truncation factor 0
>         HYPRE BoomerAMG: Interpolation: max elements per row 0
>         HYPRE BoomerAMG: Number of levels of aggressive coarsening 0
>         HYPRE BoomerAMG: Number of paths for aggressive coarsening 1
>         HYPRE BoomerAMG: Maximum row sums 0.9
>         HYPRE BoomerAMG: Sweeps down         1
>         HYPRE BoomerAMG: Sweeps up           1
>         HYPRE BoomerAMG: Sweeps on coarse    1
>         HYPRE BoomerAMG: Relax down          symmetric-SOR/Jacobi
>         HYPRE BoomerAMG: Relax up            symmetric-SOR/Jacobi
>         HYPRE BoomerAMG: Relax on coarse     Gaussian-elimination
>         HYPRE BoomerAMG: Relax weight  (all)      1
>         HYPRE BoomerAMG: Outer relax weight (all) 1
>         HYPRE BoomerAMG: Using CF-relaxation
>         HYPRE BoomerAMG: Measure type        local
>         HYPRE BoomerAMG: Coarsen type        PMIS
>         HYPRE BoomerAMG: Interpolation type  classical
>       linear system matrix = precond matrix:
>       Mat Object:      (fieldsplit_u_)       8 MPI processes
>         type: mpiaij
>         rows=792333, cols=792333
>         total: nonzeros=1.39004e+08, allocated nonzeros=1.39004e+08
>         total number of mallocs used during MatSetValues calls =0
>           using I-node (on process 0) routines: found 30057 nodes, limit used is 5
> 
> In other test, I can see the block size bs=3 in the section of Mat Object
> 
> Regardless the setup cost of Hypre AMG, I saw it gives quite a radical performance, providing that the material parameters does not vary strongly, and the geometry is regular enough.
> 
> 
> Giang
> 
> On Fri, Jan 22, 2016 at 2:57 PM, Matthew Knepley <knepley at gmail.com> wrote:
> On Fri, Jan 22, 2016 at 7:27 AM, Hoang Giang Bui <hgbk2008 at gmail.com> wrote:
> DO you mean the option pc_fieldsplit_block_size? In this thread:
> 
> http://petsc-users.mcs.anl.narkive.com/qSHIOFhh/fieldsplit-error
> 
> No. "Block Size" is confusing on PETSc since it is used to do several things. Here block size
> is being used to split the matrix. You do not need this since you are prescribing your splits. The
> matrix block size is used two ways:
> 
>   1) To indicate that matrix values come in logically dense blocks
> 
>   2) To change the storage to match this logical arrangement
> 
> After everything works, we can just indicate to the submatrix which is extracted that it has a
> certain block size. However, for the Laplacian I expect it not to matter.
>  
> It assumes you have a constant number of fields at each grid point, am I right? However, my field split is not constant, like
> [u1_x   u1_y    u1_z    p_1    u2_x    u2_y    u2_z    u3_x    u3_y    u3_z    p_3    u4_x    u4_y    u4_z]
> 
> Subsequently the fieldsplit is
> [u1_x   u1_y    u1_z    u2_x    u2_y    u2_z    u3_x    u3_y    u3_z   u4_x    u4_y    u4_z]
> [p_1    p_3]
> 
> Then what is the option to set block size 3 for split 0?
> 
> Sorry, I search several forum threads but cannot figure out the options as you said.
> 
> 
> 
> You can still do that. It can be done with options once the decomposition is working. Its true that these solvers
> work better with the block size set. However, if its the P2 Laplacian it does not really matter since its uncoupled.
> 
> Yes, I agree it's uncoupled with the other field, but the crucial factor defining the quality of the block preconditioner is the approximate inversion of individual block. I would merely try block Jacobi first, because it's quite simple. Nevertheless, fieldsplit implements other nice things, like Schur complement, etc.
> 
> I think concepts are getting confused here. I was talking about the interaction of components in one block (the P2 block). You
> are talking about interaction between blocks.
> 
>   Thanks,
> 
>      Matt
>  
> Giang
> 
> 
> 
> On Fri, Jan 22, 2016 at 11:15 AM, Matthew Knepley <knepley at gmail.com> wrote:
> On Fri, Jan 22, 2016 at 3:40 AM, Hoang Giang Bui <hgbk2008 at gmail.com> wrote:
> Hi Matt
> I would rather like to set the block size for block P2 too. Why?
> 
> Because in one of my test (for problem involves only [u_x u_y u_z]), the gmres + Hypre AMG converges in 50 steps with block size 3, whereby it increases to 140 if block size is 1 (see attached files).
> 
> You can still do that. It can be done with options once the decomposition is working. Its true that these solvers
> work better with the block size set. However, if its the P2 Laplacian it does not really matter since its uncoupled.
> 
> This gives me the impression that AMG will give better inversion for "P2" block if I can set its block size to 3. Of course it's still an hypothesis but worth to try.
> 
> Another question: In one of the Petsc presentation, you said the Hypre AMG does not scale well, because set up cost amortize the iterations. How is it quantified? and what is the memory overhead?
> 
> I said the Hypre setup cost is not scalable, but it can be amortized over the iterations. You can quantify this
> just by looking at the PCSetUp time as your increase the number of processes. I don't think they have a good
> model for the memory usage, and if they do, I do not know what it is. However, generally Hypre takes more
> memory than the agglomeration MG like ML or GAMG.
> 
>   Thanks,
> 
>     Matt
>  
> 
> Giang
> 
> On Mon, Jan 18, 2016 at 5:25 PM, Jed Brown <jed at jedbrown.org> wrote:
> Hoang Giang Bui <hgbk2008 at gmail.com> writes:
> 
> > Why P2/P2 is not for co-located discretization?
> 
> Matt typed "P2/P2" when me meant "P2/P1".
> 
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 


From bsmith at mcs.anl.gov  Mon Jan 25 13:07:33 2016
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Mon, 25 Jan 2016 13:07:33 -0600
Subject: [petsc-users] MatSetValues
In-Reply-To: <CAFe60dcpNC2v_Y38nxRrYft4DKmBQ0sMVHeErp_xK+SUewuATQ@mail.gmail.com>
References: <CAFe60dcpNC2v_Y38nxRrYft4DKmBQ0sMVHeErp_xK+SUewuATQ@mail.gmail.com>
Message-ID: <73951966-F40A-4E10-99B6-8D925EAD78D5@mcs.anl.gov>


> On Jan 25, 2016, at 12:15 PM, Vasileios Kalantzis <kalan019 at umn.edu> wrote:
> 
> Dear all,
> 
> I am trying to form an approximation of the
> Schur complement S = C-E'*(B\E) of a matrix
> A = [B, E ; E', C]. Matrix C is stored as a 
> distributed Mat object while matrices B and E 
> are locally distributed to each processor (the 
> block partitioning comes from a Domain 
> Decomposition point-of-view). All matrices B, E,
> and C are sparse.
> 
> I already have a sparse version of -E'*(B\E) 
> computed. Moreover, -E'*(B\E)  is block-diagonal.
> The only issue now is how to merge (add) C and 
> -E'*(B\E). The way i do the addition right now is 
> based on checking every entry of -E'*(B\E) and, 
> if larger than a threshold value, add it to C using 
> the MatSetValue routine. The above is being done
> in parallel for each diagonal block of -E'*(B\E).
> 
> The code works fine numerically but my approach 
> is too slow if -E'*(B\E) is not highly sparse.

   This is a guess, but if you inserting new nonzero locations into C with MatSetValues() then this will be very slow. Are you inserting new locations?

   If so, here is what you need to do. Sweep through the rows of C/-E'*(B\E)  determining the number of nonzeros that will be in the result and then use MatCreateMPIAIJ() or MatMPIAIJSetPreallocation() to preallocate the space in a new matrix, say D. Then sweep through all the rows again actually calling the MatSetValues() and put the entries into D. Switching from non-preallocation to preallocation will speed it up dramatically (factors of 100's or more) if you were inserting new locations.

   Barry


> 
> I know that I can set many entries together by
> using the MatSetValues routine but I am not
> sure how to do it because the sparsity pattern of
> each column of -E'*(B\E) differs. Maybe I can 
> assemble the sparsified Schur complement
> column-by-column using MatSetValues but is
> there any other idea perhaps?
> 
> Thanks ! :)


From kalan019 at umn.edu  Mon Jan 25 13:37:58 2016
From: kalan019 at umn.edu (Vasileios Kalantzis)
Date: Mon, 25 Jan 2016 13:37:58 -0600
Subject: [petsc-users] MatSetValues
In-Reply-To: <73951966-F40A-4E10-99B6-8D925EAD78D5@mcs.anl.gov>
References: <CAFe60dcpNC2v_Y38nxRrYft4DKmBQ0sMVHeErp_xK+SUewuATQ@mail.gmail.com>
	<73951966-F40A-4E10-99B6-8D925EAD78D5@mcs.anl.gov>
Message-ID: <CAFe60dc9f+cfZsW6mzw0UEwLZEXyGks72SA5U29R8U=xiuZ_SQ@mail.gmail.com>

Hi Barry,

yes, I am inserting new locations. I was actually copying matrix
C to a new matrix D, and then I was inserting the "thresholded"
values -E*(B\E) in this matrix D.

I was pretty much sure that the missing pre-allocation was the
reason for the slow code -- when I commented out the MatSetValues()
part (i.e. I approximated the Schur complement only by matrix C),
this part was much much faster.

I will definitely follow your suggestion -- Thanks!!!


On Mon, Jan 25, 2016 at 1:07 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
> > On Jan 25, 2016, at 12:15 PM, Vasileios Kalantzis <kalan019 at umn.edu>
> wrote:
> >
> > Dear all,
> >
> > I am trying to form an approximation of the
> > Schur complement S = C-E'*(B\E) of a matrix
> > A = [B, E ; E', C]. Matrix C is stored as a
> > distributed Mat object while matrices B and E
> > are locally distributed to each processor (the
> > block partitioning comes from a Domain
> > Decomposition point-of-view). All matrices B, E,
> > and C are sparse.
> >
> > I already have a sparse version of -E'*(B\E)
> > computed. Moreover, -E'*(B\E)  is block-diagonal.
> > The only issue now is how to merge (add) C and
> > -E'*(B\E). The way i do the addition right now is
> > based on checking every entry of -E'*(B\E) and,
> > if larger than a threshold value, add it to C using
> > the MatSetValue routine. The above is being done
> > in parallel for each diagonal block of -E'*(B\E).
> >
> > The code works fine numerically but my approach
> > is too slow if -E'*(B\E) is not highly sparse.
>
>    This is a guess, but if you inserting new nonzero locations into C with
> MatSetValues() then this will be very slow. Are you inserting new locations?
>
>    If so, here is what you need to do. Sweep through the rows of
> C/-E'*(B\E)  determining the number of nonzeros that will be in the result
> and then use MatCreateMPIAIJ() or MatMPIAIJSetPreallocation() to
> preallocate the space in a new matrix, say D. Then sweep through all the
> rows again actually calling the MatSetValues() and put the entries into D.
> Switching from non-preallocation to preallocation will speed it up
> dramatically (factors of 100's or more) if you were inserting new locations.
>
>    Barry
>
>
>
> >
> > I know that I can set many entries together by
> > using the MatSetValues routine but I am not
> > sure how to do it because the sparsity pattern of
> > each column of -E'*(B\E) differs. Maybe I can
> > assemble the sparsified Schur complement
> > column-by-column using MatSetValues but is
> > there any other idea perhaps?
> >
> > Thanks ! :)
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160125/0f597078/attachment.html>

From hgbk2008 at gmail.com  Tue Jan 26 02:58:20 2016
From: hgbk2008 at gmail.com (Hoang Giang Bui)
Date: Tue, 26 Jan 2016 09:58:20 +0100
Subject: [petsc-users] Why use MATMPIBAIJ?
In-Reply-To: <AC95667B-741D-4F81-9CF8-A83AA6E17A09@mcs.anl.gov>
References: <CAP2=TMiuZV+DDPk6dqPjZ5Ndy8W5C5GaEreny7_FQuZ39i1gBw@mail.gmail.com>
	<2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov>
	<CAP2=TMhZeZ4JZ3AKARajab8CmFwdANnh0HOHqDDQRd8JfGkO3Q@mail.gmail.com>
	<DAA1C465-680F-4A5E-BA8F-9C6F7C055DAD@mcs.anl.gov>
	<CAP2=TMjwihEL1V7NPNByiMWfWfC5Q0MJyE1ByxnLokx1Df+OJA@mail.gmail.com>
	<CAJW_hKeU9MO8v+ub+5CjnaH3OpBkQNtGA8ZEiDtbYoQnSQ5rGQ@mail.gmail.com>
	<87fuy07zvi.fsf@jedbrown.org>
	<B1B62B13-BE6F-4B51-BC1A-F45FEF379854@mcs.anl.gov>
	<CAJW_hKcyTTCK1ErPFeu4hbeW+CD=PQJd_=H3xNXdO7TzhvfQOw@mail.gmail.com>
	<CAMYG4GkedkT9weo7UCUpTXP_f0jRBUYoySKXP5=q+os2Hpm=+Q@mail.gmail.com>
	<CAJW_hKfFoFSiwZ7kzCcXqFm3kkhLa4PBxTvb3NHHFWNLFqLyqA@mail.gmail.com>
	<87si1ug8hl.fsf@jedbrown.org>
	<CAJW_hKdZsTHkQdN_zs=Ja0e-Z7jY5EMY+QPuRBZny8pri_iYEw@mail.gmail.com>
	<CAMYG4GmYXAF3E0qys6ijCdWRZSRCbcxHTK1TS3vACLrd8-d5Cw@mail.gmail.com>
	<CAJW_hKfWVBiXvthz=go2UFN1nf8ngyFh64yqTHNq=_xav5PaLw@mail.gmail.com>
	<CAMYG4GmGt4wCmSzxUYPHw6xnVmdmnvTc7+NY+m7ws6iQ0up-DQ@mail.gmail.com>
	<CAJW_hKe5Y+f2DfuAKzzxkPniGanUHXXUtbcW4ZzOPBg1P0n3Zg@mail.gmail.com>
	<AC95667B-741D-4F81-9CF8-A83AA6E17A09@mcs.anl.gov>
Message-ID: <CAJW_hKdavcBvE1BdzJa1=y892pMiqObymSwUjYtRZJBLDWsUsQ@mail.gmail.com>

Hi

I assert this line to the hypre.c to see what block size it set to

/* special case for BoomerAMG */
  if (jac->setup == HYPRE_BoomerAMGSetup) {
    ierr = MatGetBlockSize(pc->pmat,&bs);CHKERRQ(ierr);

    // check block size passed to HYPRE
    PetscPrintf(PetscObjectComm((PetscObject)pc),"the block size passed to
HYPRE is %d\n",bs);

    if (bs > 1)
PetscStackCallStandard(HYPRE_BoomerAMGSetNumFunctions,(jac->hsolver,bs));
  }

It shows that the passing block size is 1. So my hypothesis is correct.

In the manual of MatSetBlockSize (
http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatSetBlockSize.html),
it has to be called before MatSetUp. Hence I guess the matrix passed to
HYPRE is created before I set the block size. Given that, I set the block
size after the call to PCFieldSplitSetIS

        ierr = PCFieldSplitSetIS(pc, "u", IS_u); CHKERRQ(ierr);
        ierr = PCFieldSplitSetIS(pc, "p", IS_p); CHKERRQ(ierr);

        /*
            Set block size for sub-matrix,
        */
        ierr = PCFieldSplitGetSubKSP(pc, &nsplits, &sub_ksp); CHKERRQ(ierr);
        ksp_U = sub_ksp[0];
        ierr = KSPGetOperators(ksp_U, &A_U, &P_U); CHKERRQ(ierr);
        ierr = MatSetBlockSize(A_U, 3); CHKERRQ(ierr);
        ierr = MatSetBlockSize(P_U, 3); CHKERRQ(ierr);

I guess the sub-matrices is created at PCFieldSplitSetIS. If that's correct
then it's not possible to set the block size this way.


Giang

On Mon, Jan 25, 2016 at 7:43 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
> > On Jan 25, 2016, at 11:13 AM, Hoang Giang Bui <hgbk2008 at gmail.com>
> wrote:
> >
> > OK, let's come back to my problem. I got your point about the
> interaction between components in one block. In my case, the interaction is
> strong.
> >
> > As you said, I try this:
> >
> >         ierr = KSPSetFromOptions(ksp); CHKERRQ(ierr);
> >         ierr = PCFieldSplitGetSubKSP(pc, &nsplits, &sub_ksp);
> CHKERRQ(ierr);
> >         ksp_U = sub_ksp[0];
> >         ierr = KSPGetOperators(ksp_U, &A_U, &P_U); CHKERRQ(ierr);
> >         ierr = MatSetBlockSize(A_U, 3); CHKERRQ(ierr);
> >         ierr = MatSetBlockSize(P_U, 3); CHKERRQ(ierr);
> >         ierr = PetscFree(sub_ksp); CHKERRQ(ierr);
> >
> > But it seems doesn't work. The output from -ksp_view shows that matrix
> passed to Hypre still has bs=1
>
>    Hmm, this is strange. MatSetBlockSize() should have either set the
> block size to 3 or generated an error.  Can you run in the debugger on one
> process and put a break point in MatSetBlockSize() and see what it is
> setting the block size to. Then in PCSetUp_hypre() you can see what it is
> passing to hypre as the block size and maybe figure out how it becomes 1.
>
>   Barry
>
>
> >
> > KSP Object:    (fieldsplit_u_)     8 MPI processes
> >       type: preonly
> >       maximum iterations=10000, initial guess is zero
> >       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
> >       left preconditioning
> >       using NONE norm type for convergence test
> >     PC Object:    (fieldsplit_u_)     8 MPI processes
> >       type: hypre
> >         HYPRE BoomerAMG preconditioning
> >         HYPRE BoomerAMG: Cycle type V
> >         HYPRE BoomerAMG: Maximum number of levels 25
> >         HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1
> >         HYPRE BoomerAMG: Convergence tolerance PER hypre call 0
> >         HYPRE BoomerAMG: Threshold for strong coupling 0.25
> >         HYPRE BoomerAMG: Interpolation truncation factor 0
> >         HYPRE BoomerAMG: Interpolation: max elements per row 0
> >         HYPRE BoomerAMG: Number of levels of aggressive coarsening 0
> >         HYPRE BoomerAMG: Number of paths for aggressive coarsening 1
> >         HYPRE BoomerAMG: Maximum row sums 0.9
> >         HYPRE BoomerAMG: Sweeps down         1
> >         HYPRE BoomerAMG: Sweeps up           1
> >         HYPRE BoomerAMG: Sweeps on coarse    1
> >         HYPRE BoomerAMG: Relax down          symmetric-SOR/Jacobi
> >         HYPRE BoomerAMG: Relax up            symmetric-SOR/Jacobi
> >         HYPRE BoomerAMG: Relax on coarse     Gaussian-elimination
> >         HYPRE BoomerAMG: Relax weight  (all)      1
> >         HYPRE BoomerAMG: Outer relax weight (all) 1
> >         HYPRE BoomerAMG: Using CF-relaxation
> >         HYPRE BoomerAMG: Measure type        local
> >         HYPRE BoomerAMG: Coarsen type        PMIS
> >         HYPRE BoomerAMG: Interpolation type  classical
> >       linear system matrix = precond matrix:
> >       Mat Object:      (fieldsplit_u_)       8 MPI processes
> >         type: mpiaij
> >         rows=792333, cols=792333
> >         total: nonzeros=1.39004e+08, allocated nonzeros=1.39004e+08
> >         total number of mallocs used during MatSetValues calls =0
> >           using I-node (on process 0) routines: found 30057 nodes, limit
> used is 5
> >
> > In other test, I can see the block size bs=3 in the section of Mat Object
> >
> > Regardless the setup cost of Hypre AMG, I saw it gives quite a radical
> performance, providing that the material parameters does not vary strongly,
> and the geometry is regular enough.
> >
> >
> > Giang
> >
> > On Fri, Jan 22, 2016 at 2:57 PM, Matthew Knepley <knepley at gmail.com>
> wrote:
> > On Fri, Jan 22, 2016 at 7:27 AM, Hoang Giang Bui <hgbk2008 at gmail.com>
> wrote:
> > DO you mean the option pc_fieldsplit_block_size? In this thread:
> >
> > http://petsc-users.mcs.anl.narkive.com/qSHIOFhh/fieldsplit-error
> >
> > No. "Block Size" is confusing on PETSc since it is used to do several
> things. Here block size
> > is being used to split the matrix. You do not need this since you are
> prescribing your splits. The
> > matrix block size is used two ways:
> >
> >   1) To indicate that matrix values come in logically dense blocks
> >
> >   2) To change the storage to match this logical arrangement
> >
> > After everything works, we can just indicate to the submatrix which is
> extracted that it has a
> > certain block size. However, for the Laplacian I expect it not to matter.
> >
> > It assumes you have a constant number of fields at each grid point, am I
> right? However, my field split is not constant, like
> > [u1_x   u1_y    u1_z    p_1    u2_x    u2_y    u2_z    u3_x    u3_y
> u3_z    p_3    u4_x    u4_y    u4_z]
> >
> > Subsequently the fieldsplit is
> > [u1_x   u1_y    u1_z    u2_x    u2_y    u2_z    u3_x    u3_y    u3_z
>  u4_x    u4_y    u4_z]
> > [p_1    p_3]
> >
> > Then what is the option to set block size 3 for split 0?
> >
> > Sorry, I search several forum threads but cannot figure out the options
> as you said.
> >
> >
> >
> > You can still do that. It can be done with options once the
> decomposition is working. Its true that these solvers
> > work better with the block size set. However, if its the P2 Laplacian it
> does not really matter since its uncoupled.
> >
> > Yes, I agree it's uncoupled with the other field, but the crucial factor
> defining the quality of the block preconditioner is the approximate
> inversion of individual block. I would merely try block Jacobi first,
> because it's quite simple. Nevertheless, fieldsplit implements other nice
> things, like Schur complement, etc.
> >
> > I think concepts are getting confused here. I was talking about the
> interaction of components in one block (the P2 block). You
> > are talking about interaction between blocks.
> >
> >   Thanks,
> >
> >      Matt
> >
> > Giang
> >
> >
> >
> > On Fri, Jan 22, 2016 at 11:15 AM, Matthew Knepley <knepley at gmail.com>
> wrote:
> > On Fri, Jan 22, 2016 at 3:40 AM, Hoang Giang Bui <hgbk2008 at gmail.com>
> wrote:
> > Hi Matt
> > I would rather like to set the block size for block P2 too. Why?
> >
> > Because in one of my test (for problem involves only [u_x u_y u_z]), the
> gmres + Hypre AMG converges in 50 steps with block size 3, whereby it
> increases to 140 if block size is 1 (see attached files).
> >
> > You can still do that. It can be done with options once the
> decomposition is working. Its true that these solvers
> > work better with the block size set. However, if its the P2 Laplacian it
> does not really matter since its uncoupled.
> >
> > This gives me the impression that AMG will give better inversion for
> "P2" block if I can set its block size to 3. Of course it's still an
> hypothesis but worth to try.
> >
> > Another question: In one of the Petsc presentation, you said the Hypre
> AMG does not scale well, because set up cost amortize the iterations. How
> is it quantified? and what is the memory overhead?
> >
> > I said the Hypre setup cost is not scalable, but it can be amortized
> over the iterations. You can quantify this
> > just by looking at the PCSetUp time as your increase the number of
> processes. I don't think they have a good
> > model for the memory usage, and if they do, I do not know what it is.
> However, generally Hypre takes more
> > memory than the agglomeration MG like ML or GAMG.
> >
> >   Thanks,
> >
> >     Matt
> >
> >
> > Giang
> >
> > On Mon, Jan 18, 2016 at 5:25 PM, Jed Brown <jed at jedbrown.org> wrote:
> > Hoang Giang Bui <hgbk2008 at gmail.com> writes:
> >
> > > Why P2/P2 is not for co-located discretization?
> >
> > Matt typed "P2/P2" when me meant "P2/P1".
> >
> >
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> > -- Norbert Wiener
> >
> >
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> > -- Norbert Wiener
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160126/27999d32/attachment-0001.html>

From mfadams at lbl.gov  Tue Jan 26 08:01:49 2016
From: mfadams at lbl.gov (Mark Adams)
Date: Tue, 26 Jan 2016 09:01:49 -0500
Subject: [petsc-users] Why use MATMPIBAIJ?
In-Reply-To: <CAJW_hKdavcBvE1BdzJa1=y892pMiqObymSwUjYtRZJBLDWsUsQ@mail.gmail.com>
References: <CAP2=TMiuZV+DDPk6dqPjZ5Ndy8W5C5GaEreny7_FQuZ39i1gBw@mail.gmail.com>
	<2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov>
	<CAP2=TMhZeZ4JZ3AKARajab8CmFwdANnh0HOHqDDQRd8JfGkO3Q@mail.gmail.com>
	<DAA1C465-680F-4A5E-BA8F-9C6F7C055DAD@mcs.anl.gov>
	<CAP2=TMjwihEL1V7NPNByiMWfWfC5Q0MJyE1ByxnLokx1Df+OJA@mail.gmail.com>
	<CAJW_hKeU9MO8v+ub+5CjnaH3OpBkQNtGA8ZEiDtbYoQnSQ5rGQ@mail.gmail.com>
	<87fuy07zvi.fsf@jedbrown.org>
	<B1B62B13-BE6F-4B51-BC1A-F45FEF379854@mcs.anl.gov>
	<CAJW_hKcyTTCK1ErPFeu4hbeW+CD=PQJd_=H3xNXdO7TzhvfQOw@mail.gmail.com>
	<CAMYG4GkedkT9weo7UCUpTXP_f0jRBUYoySKXP5=q+os2Hpm=+Q@mail.gmail.com>
	<CAJW_hKfFoFSiwZ7kzCcXqFm3kkhLa4PBxTvb3NHHFWNLFqLyqA@mail.gmail.com>
	<87si1ug8hl.fsf@jedbrown.org>
	<CAJW_hKdZsTHkQdN_zs=Ja0e-Z7jY5EMY+QPuRBZny8pri_iYEw@mail.gmail.com>
	<CAMYG4GmYXAF3E0qys6ijCdWRZSRCbcxHTK1TS3vACLrd8-d5Cw@mail.gmail.com>
	<CAJW_hKfWVBiXvthz=go2UFN1nf8ngyFh64yqTHNq=_xav5PaLw@mail.gmail.com>
	<CAMYG4GmGt4wCmSzxUYPHw6xnVmdmnvTc7+NY+m7ws6iQ0up-DQ@mail.gmail.com>
	<CAJW_hKe5Y+f2DfuAKzzxkPniGanUHXXUtbcW4ZzOPBg1P0n3Zg@mail.gmail.com>
	<AC95667B-741D-4F81-9CF8-A83AA6E17A09@mcs.anl.gov>
	<CAJW_hKdavcBvE1BdzJa1=y892pMiqObymSwUjYtRZJBLDWsUsQ@mail.gmail.com>
Message-ID: <CADOhEh6BYu0goehhhsHsXpTtOJQhNp18ccSmLjDV-DXG6raXDQ@mail.gmail.com>

On Tue, Jan 26, 2016 at 3:58 AM, Hoang Giang Bui <hgbk2008 at gmail.com> wrote:

> Hi
>
> I assert this line to the hypre.c to see what block size it set to
>
> /* special case for BoomerAMG */
>   if (jac->setup == HYPRE_BoomerAMGSetup) {
>     ierr = MatGetBlockSize(pc->pmat,&bs);CHKERRQ(ierr);
>
>     // check block size passed to HYPRE
>     PetscPrintf(PetscObjectComm((PetscObject)pc),"the block size passed to
> HYPRE is %d\n",bs);
>
>     if (bs > 1)
> PetscStackCallStandard(HYPRE_BoomerAMGSetNumFunctions,(jac->hsolver,bs));
>   }
>
> It shows that the passing block size is 1. So my hypothesis is correct.
>
> In the manual of MatSetBlockSize (
> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatSetBlockSize.html),
> it has to be called before MatSetUp. Hence I guess the matrix passed to
> HYPRE is created before I set the block size. Given that, I set the block
> size after the call to PCFieldSplitSetIS
>
>         ierr = PCFieldSplitSetIS(pc, "u", IS_u); CHKERRQ(ierr);
>         ierr = PCFieldSplitSetIS(pc, "p", IS_p); CHKERRQ(ierr);
>
>         /*
>             Set block size for sub-matrix,
>         */
>         ierr = PCFieldSplitGetSubKSP(pc, &nsplits, &sub_ksp);
> CHKERRQ(ierr);
>         ksp_U = sub_ksp[0];
>         ierr = KSPGetOperators(ksp_U, &A_U, &P_U); CHKERRQ(ierr);
>         ierr = MatSetBlockSize(A_U, 3); CHKERRQ(ierr);
>         ierr = MatSetBlockSize(P_U, 3); CHKERRQ(ierr);
>
> I guess the sub-matrices is created at PCFieldSplitSetIS. If that's
> correct then it's not possible to set the block size this way.
>

You set the block size in the ISs that you give to FieldSplit. FieldSplit
will give it to the matrices.


>
>
> Giang
>
> On Mon, Jan 25, 2016 at 7:43 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
>>
>> > On Jan 25, 2016, at 11:13 AM, Hoang Giang Bui <hgbk2008 at gmail.com>
>> wrote:
>> >
>> > OK, let's come back to my problem. I got your point about the
>> interaction between components in one block. In my case, the interaction is
>> strong.
>> >
>> > As you said, I try this:
>> >
>> >         ierr = KSPSetFromOptions(ksp); CHKERRQ(ierr);
>> >         ierr = PCFieldSplitGetSubKSP(pc, &nsplits, &sub_ksp);
>> CHKERRQ(ierr);
>> >         ksp_U = sub_ksp[0];
>> >         ierr = KSPGetOperators(ksp_U, &A_U, &P_U); CHKERRQ(ierr);
>> >         ierr = MatSetBlockSize(A_U, 3); CHKERRQ(ierr);
>> >         ierr = MatSetBlockSize(P_U, 3); CHKERRQ(ierr);
>> >         ierr = PetscFree(sub_ksp); CHKERRQ(ierr);
>> >
>> > But it seems doesn't work. The output from -ksp_view shows that matrix
>> passed to Hypre still has bs=1
>>
>>    Hmm, this is strange. MatSetBlockSize() should have either set the
>> block size to 3 or generated an error.  Can you run in the debugger on one
>> process and put a break point in MatSetBlockSize() and see what it is
>> setting the block size to. Then in PCSetUp_hypre() you can see what it is
>> passing to hypre as the block size and maybe figure out how it becomes 1.
>>
>>   Barry
>>
>>
>> >
>> > KSP Object:    (fieldsplit_u_)     8 MPI processes
>> >       type: preonly
>> >       maximum iterations=10000, initial guess is zero
>> >       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>> >       left preconditioning
>> >       using NONE norm type for convergence test
>> >     PC Object:    (fieldsplit_u_)     8 MPI processes
>> >       type: hypre
>> >         HYPRE BoomerAMG preconditioning
>> >         HYPRE BoomerAMG: Cycle type V
>> >         HYPRE BoomerAMG: Maximum number of levels 25
>> >         HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1
>> >         HYPRE BoomerAMG: Convergence tolerance PER hypre call 0
>> >         HYPRE BoomerAMG: Threshold for strong coupling 0.25
>> >         HYPRE BoomerAMG: Interpolation truncation factor 0
>> >         HYPRE BoomerAMG: Interpolation: max elements per row 0
>> >         HYPRE BoomerAMG: Number of levels of aggressive coarsening 0
>> >         HYPRE BoomerAMG: Number of paths for aggressive coarsening 1
>> >         HYPRE BoomerAMG: Maximum row sums 0.9
>> >         HYPRE BoomerAMG: Sweeps down         1
>> >         HYPRE BoomerAMG: Sweeps up           1
>> >         HYPRE BoomerAMG: Sweeps on coarse    1
>> >         HYPRE BoomerAMG: Relax down          symmetric-SOR/Jacobi
>> >         HYPRE BoomerAMG: Relax up            symmetric-SOR/Jacobi
>> >         HYPRE BoomerAMG: Relax on coarse     Gaussian-elimination
>> >         HYPRE BoomerAMG: Relax weight  (all)      1
>> >         HYPRE BoomerAMG: Outer relax weight (all) 1
>> >         HYPRE BoomerAMG: Using CF-relaxation
>> >         HYPRE BoomerAMG: Measure type        local
>> >         HYPRE BoomerAMG: Coarsen type        PMIS
>> >         HYPRE BoomerAMG: Interpolation type  classical
>> >       linear system matrix = precond matrix:
>> >       Mat Object:      (fieldsplit_u_)       8 MPI processes
>> >         type: mpiaij
>> >         rows=792333, cols=792333
>> >         total: nonzeros=1.39004e+08, allocated nonzeros=1.39004e+08
>> >         total number of mallocs used during MatSetValues calls =0
>> >           using I-node (on process 0) routines: found 30057 nodes,
>> limit used is 5
>> >
>> > In other test, I can see the block size bs=3 in the section of Mat
>> Object
>> >
>> > Regardless the setup cost of Hypre AMG, I saw it gives quite a radical
>> performance, providing that the material parameters does not vary strongly,
>> and the geometry is regular enough.
>> >
>> >
>> > Giang
>> >
>> > On Fri, Jan 22, 2016 at 2:57 PM, Matthew Knepley <knepley at gmail.com>
>> wrote:
>> > On Fri, Jan 22, 2016 at 7:27 AM, Hoang Giang Bui <hgbk2008 at gmail.com>
>> wrote:
>> > DO you mean the option pc_fieldsplit_block_size? In this thread:
>> >
>> > http://petsc-users.mcs.anl.narkive.com/qSHIOFhh/fieldsplit-error
>> >
>> > No. "Block Size" is confusing on PETSc since it is used to do several
>> things. Here block size
>> > is being used to split the matrix. You do not need this since you are
>> prescribing your splits. The
>> > matrix block size is used two ways:
>> >
>> >   1) To indicate that matrix values come in logically dense blocks
>> >
>> >   2) To change the storage to match this logical arrangement
>> >
>> > After everything works, we can just indicate to the submatrix which is
>> extracted that it has a
>> > certain block size. However, for the Laplacian I expect it not to
>> matter.
>> >
>> > It assumes you have a constant number of fields at each grid point, am
>> I right? However, my field split is not constant, like
>> > [u1_x   u1_y    u1_z    p_1    u2_x    u2_y    u2_z    u3_x    u3_y
>> u3_z    p_3    u4_x    u4_y    u4_z]
>> >
>> > Subsequently the fieldsplit is
>> > [u1_x   u1_y    u1_z    u2_x    u2_y    u2_z    u3_x    u3_y    u3_z
>>  u4_x    u4_y    u4_z]
>> > [p_1    p_3]
>> >
>> > Then what is the option to set block size 3 for split 0?
>> >
>> > Sorry, I search several forum threads but cannot figure out the options
>> as you said.
>> >
>> >
>> >
>> > You can still do that. It can be done with options once the
>> decomposition is working. Its true that these solvers
>> > work better with the block size set. However, if its the P2 Laplacian
>> it does not really matter since its uncoupled.
>> >
>> > Yes, I agree it's uncoupled with the other field, but the crucial
>> factor defining the quality of the block preconditioner is the approximate
>> inversion of individual block. I would merely try block Jacobi first,
>> because it's quite simple. Nevertheless, fieldsplit implements other nice
>> things, like Schur complement, etc.
>> >
>> > I think concepts are getting confused here. I was talking about the
>> interaction of components in one block (the P2 block). You
>> > are talking about interaction between blocks.
>> >
>> >   Thanks,
>> >
>> >      Matt
>> >
>> > Giang
>> >
>> >
>> >
>> > On Fri, Jan 22, 2016 at 11:15 AM, Matthew Knepley <knepley at gmail.com>
>> wrote:
>> > On Fri, Jan 22, 2016 at 3:40 AM, Hoang Giang Bui <hgbk2008 at gmail.com>
>> wrote:
>> > Hi Matt
>> > I would rather like to set the block size for block P2 too. Why?
>> >
>> > Because in one of my test (for problem involves only [u_x u_y u_z]),
>> the gmres + Hypre AMG converges in 50 steps with block size 3, whereby it
>> increases to 140 if block size is 1 (see attached files).
>> >
>> > You can still do that. It can be done with options once the
>> decomposition is working. Its true that these solvers
>> > work better with the block size set. However, if its the P2 Laplacian
>> it does not really matter since its uncoupled.
>> >
>> > This gives me the impression that AMG will give better inversion for
>> "P2" block if I can set its block size to 3. Of course it's still an
>> hypothesis but worth to try.
>> >
>> > Another question: In one of the Petsc presentation, you said the Hypre
>> AMG does not scale well, because set up cost amortize the iterations. How
>> is it quantified? and what is the memory overhead?
>> >
>> > I said the Hypre setup cost is not scalable, but it can be amortized
>> over the iterations. You can quantify this
>> > just by looking at the PCSetUp time as your increase the number of
>> processes. I don't think they have a good
>> > model for the memory usage, and if they do, I do not know what it is.
>> However, generally Hypre takes more
>> > memory than the agglomeration MG like ML or GAMG.
>> >
>> >   Thanks,
>> >
>> >     Matt
>> >
>> >
>> > Giang
>> >
>> > On Mon, Jan 18, 2016 at 5:25 PM, Jed Brown <jed at jedbrown.org> wrote:
>> > Hoang Giang Bui <hgbk2008 at gmail.com> writes:
>> >
>> > > Why P2/P2 is not for co-located discretization?
>> >
>> > Matt typed "P2/P2" when me meant "P2/P1".
>> >
>> >
>> >
>> >
>> > --
>> > What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> > -- Norbert Wiener
>> >
>> >
>> >
>> >
>> > --
>> > What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> > -- Norbert Wiener
>> >
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160126/6c3df0f6/attachment.html>

From hgbk2008 at gmail.com  Tue Jan 26 11:41:08 2016
From: hgbk2008 at gmail.com (Hoang Giang Bui)
Date: Tue, 26 Jan 2016 18:41:08 +0100
Subject: [petsc-users] Why use MATMPIBAIJ?
In-Reply-To: <CADOhEh6BYu0goehhhsHsXpTtOJQhNp18ccSmLjDV-DXG6raXDQ@mail.gmail.com>
References: <CAP2=TMiuZV+DDPk6dqPjZ5Ndy8W5C5GaEreny7_FQuZ39i1gBw@mail.gmail.com>
	<2301E44E-41A4-42B7-A97C-BF9C9D9AE7DC@mcs.anl.gov>
	<CAP2=TMhZeZ4JZ3AKARajab8CmFwdANnh0HOHqDDQRd8JfGkO3Q@mail.gmail.com>
	<DAA1C465-680F-4A5E-BA8F-9C6F7C055DAD@mcs.anl.gov>
	<CAP2=TMjwihEL1V7NPNByiMWfWfC5Q0MJyE1ByxnLokx1Df+OJA@mail.gmail.com>
	<CAJW_hKeU9MO8v+ub+5CjnaH3OpBkQNtGA8ZEiDtbYoQnSQ5rGQ@mail.gmail.com>
	<87fuy07zvi.fsf@jedbrown.org>
	<B1B62B13-BE6F-4B51-BC1A-F45FEF379854@mcs.anl.gov>
	<CAJW_hKcyTTCK1ErPFeu4hbeW+CD=PQJd_=H3xNXdO7TzhvfQOw@mail.gmail.com>
	<CAMYG4GkedkT9weo7UCUpTXP_f0jRBUYoySKXP5=q+os2Hpm=+Q@mail.gmail.com>
	<CAJW_hKfFoFSiwZ7kzCcXqFm3kkhLa4PBxTvb3NHHFWNLFqLyqA@mail.gmail.com>
	<87si1ug8hl.fsf@jedbrown.org>
	<CAJW_hKdZsTHkQdN_zs=Ja0e-Z7jY5EMY+QPuRBZny8pri_iYEw@mail.gmail.com>
	<CAMYG4GmYXAF3E0qys6ijCdWRZSRCbcxHTK1TS3vACLrd8-d5Cw@mail.gmail.com>
	<CAJW_hKfWVBiXvthz=go2UFN1nf8ngyFh64yqTHNq=_xav5PaLw@mail.gmail.com>
	<CAMYG4GmGt4wCmSzxUYPHw6xnVmdmnvTc7+NY+m7ws6iQ0up-DQ@mail.gmail.com>
	<CAJW_hKe5Y+f2DfuAKzzxkPniGanUHXXUtbcW4ZzOPBg1P0n3Zg@mail.gmail.com>
	<AC95667B-741D-4F81-9CF8-A83AA6E17A09@mcs.anl.gov>
	<CAJW_hKdavcBvE1BdzJa1=y892pMiqObymSwUjYtRZJBLDWsUsQ@mail.gmail.com>
	<CADOhEh6BYu0goehhhsHsXpTtOJQhNp18ccSmLjDV-DXG6raXDQ@mail.gmail.com>
Message-ID: <CAJW_hKf0DirEv=MXMx+S2=Fgd6NU+UzqpVb1QVfj8FUJXzOwKg@mail.gmail.com>

Clear enough. Thank you :-)

Giang

On Tue, Jan 26, 2016 at 3:01 PM, Mark Adams <mfadams at lbl.gov> wrote:

>
>
> On Tue, Jan 26, 2016 at 3:58 AM, Hoang Giang Bui <hgbk2008 at gmail.com>
> wrote:
>
>> Hi
>>
>> I assert this line to the hypre.c to see what block size it set to
>>
>> /* special case for BoomerAMG */
>>   if (jac->setup == HYPRE_BoomerAMGSetup) {
>>     ierr = MatGetBlockSize(pc->pmat,&bs);CHKERRQ(ierr);
>>
>>     // check block size passed to HYPRE
>>     PetscPrintf(PetscObjectComm((PetscObject)pc),"the block size passed
>> to HYPRE is %d\n",bs);
>>
>>     if (bs > 1)
>> PetscStackCallStandard(HYPRE_BoomerAMGSetNumFunctions,(jac->hsolver,bs));
>>   }
>>
>> It shows that the passing block size is 1. So my hypothesis is correct.
>>
>> In the manual of MatSetBlockSize (
>> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatSetBlockSize.html),
>> it has to be called before MatSetUp. Hence I guess the matrix passed to
>> HYPRE is created before I set the block size. Given that, I set the block
>> size after the call to PCFieldSplitSetIS
>>
>>         ierr = PCFieldSplitSetIS(pc, "u", IS_u); CHKERRQ(ierr);
>>         ierr = PCFieldSplitSetIS(pc, "p", IS_p); CHKERRQ(ierr);
>>
>>         /*
>>             Set block size for sub-matrix,
>>         */
>>         ierr = PCFieldSplitGetSubKSP(pc, &nsplits, &sub_ksp);
>> CHKERRQ(ierr);
>>         ksp_U = sub_ksp[0];
>>         ierr = KSPGetOperators(ksp_U, &A_U, &P_U); CHKERRQ(ierr);
>>         ierr = MatSetBlockSize(A_U, 3); CHKERRQ(ierr);
>>         ierr = MatSetBlockSize(P_U, 3); CHKERRQ(ierr);
>>
>> I guess the sub-matrices is created at PCFieldSplitSetIS. If that's
>> correct then it's not possible to set the block size this way.
>>
>
> You set the block size in the ISs that you give to FieldSplit. FieldSplit
> will give it to the matrices.
>
>
>>
>>
>> Giang
>>
>> On Mon, Jan 25, 2016 at 7:43 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>
>>>
>>> > On Jan 25, 2016, at 11:13 AM, Hoang Giang Bui <hgbk2008 at gmail.com>
>>> wrote:
>>> >
>>> > OK, let's come back to my problem. I got your point about the
>>> interaction between components in one block. In my case, the interaction is
>>> strong.
>>> >
>>> > As you said, I try this:
>>> >
>>> >         ierr = KSPSetFromOptions(ksp); CHKERRQ(ierr);
>>> >         ierr = PCFieldSplitGetSubKSP(pc, &nsplits, &sub_ksp);
>>> CHKERRQ(ierr);
>>> >         ksp_U = sub_ksp[0];
>>> >         ierr = KSPGetOperators(ksp_U, &A_U, &P_U); CHKERRQ(ierr);
>>> >         ierr = MatSetBlockSize(A_U, 3); CHKERRQ(ierr);
>>> >         ierr = MatSetBlockSize(P_U, 3); CHKERRQ(ierr);
>>> >         ierr = PetscFree(sub_ksp); CHKERRQ(ierr);
>>> >
>>> > But it seems doesn't work. The output from -ksp_view shows that matrix
>>> passed to Hypre still has bs=1
>>>
>>>    Hmm, this is strange. MatSetBlockSize() should have either set the
>>> block size to 3 or generated an error.  Can you run in the debugger on one
>>> process and put a break point in MatSetBlockSize() and see what it is
>>> setting the block size to. Then in PCSetUp_hypre() you can see what it is
>>> passing to hypre as the block size and maybe figure out how it becomes 1.
>>>
>>>   Barry
>>>
>>>
>>> >
>>> > KSP Object:    (fieldsplit_u_)     8 MPI processes
>>> >       type: preonly
>>> >       maximum iterations=10000, initial guess is zero
>>> >       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>> >       left preconditioning
>>> >       using NONE norm type for convergence test
>>> >     PC Object:    (fieldsplit_u_)     8 MPI processes
>>> >       type: hypre
>>> >         HYPRE BoomerAMG preconditioning
>>> >         HYPRE BoomerAMG: Cycle type V
>>> >         HYPRE BoomerAMG: Maximum number of levels 25
>>> >         HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1
>>> >         HYPRE BoomerAMG: Convergence tolerance PER hypre call 0
>>> >         HYPRE BoomerAMG: Threshold for strong coupling 0.25
>>> >         HYPRE BoomerAMG: Interpolation truncation factor 0
>>> >         HYPRE BoomerAMG: Interpolation: max elements per row 0
>>> >         HYPRE BoomerAMG: Number of levels of aggressive coarsening 0
>>> >         HYPRE BoomerAMG: Number of paths for aggressive coarsening 1
>>> >         HYPRE BoomerAMG: Maximum row sums 0.9
>>> >         HYPRE BoomerAMG: Sweeps down         1
>>> >         HYPRE BoomerAMG: Sweeps up           1
>>> >         HYPRE BoomerAMG: Sweeps on coarse    1
>>> >         HYPRE BoomerAMG: Relax down          symmetric-SOR/Jacobi
>>> >         HYPRE BoomerAMG: Relax up            symmetric-SOR/Jacobi
>>> >         HYPRE BoomerAMG: Relax on coarse     Gaussian-elimination
>>> >         HYPRE BoomerAMG: Relax weight  (all)      1
>>> >         HYPRE BoomerAMG: Outer relax weight (all) 1
>>> >         HYPRE BoomerAMG: Using CF-relaxation
>>> >         HYPRE BoomerAMG: Measure type        local
>>> >         HYPRE BoomerAMG: Coarsen type        PMIS
>>> >         HYPRE BoomerAMG: Interpolation type  classical
>>> >       linear system matrix = precond matrix:
>>> >       Mat Object:      (fieldsplit_u_)       8 MPI processes
>>> >         type: mpiaij
>>> >         rows=792333, cols=792333
>>> >         total: nonzeros=1.39004e+08, allocated nonzeros=1.39004e+08
>>> >         total number of mallocs used during MatSetValues calls =0
>>> >           using I-node (on process 0) routines: found 30057 nodes,
>>> limit used is 5
>>> >
>>> > In other test, I can see the block size bs=3 in the section of Mat
>>> Object
>>> >
>>> > Regardless the setup cost of Hypre AMG, I saw it gives quite a radical
>>> performance, providing that the material parameters does not vary strongly,
>>> and the geometry is regular enough.
>>> >
>>> >
>>> > Giang
>>> >
>>> > On Fri, Jan 22, 2016 at 2:57 PM, Matthew Knepley <knepley at gmail.com>
>>> wrote:
>>> > On Fri, Jan 22, 2016 at 7:27 AM, Hoang Giang Bui <hgbk2008 at gmail.com>
>>> wrote:
>>> > DO you mean the option pc_fieldsplit_block_size? In this thread:
>>> >
>>> > http://petsc-users.mcs.anl.narkive.com/qSHIOFhh/fieldsplit-error
>>> >
>>> > No. "Block Size" is confusing on PETSc since it is used to do several
>>> things. Here block size
>>> > is being used to split the matrix. You do not need this since you are
>>> prescribing your splits. The
>>> > matrix block size is used two ways:
>>> >
>>> >   1) To indicate that matrix values come in logically dense blocks
>>> >
>>> >   2) To change the storage to match this logical arrangement
>>> >
>>> > After everything works, we can just indicate to the submatrix which is
>>> extracted that it has a
>>> > certain block size. However, for the Laplacian I expect it not to
>>> matter.
>>> >
>>> > It assumes you have a constant number of fields at each grid point, am
>>> I right? However, my field split is not constant, like
>>> > [u1_x   u1_y    u1_z    p_1    u2_x    u2_y    u2_z    u3_x    u3_y
>>> u3_z    p_3    u4_x    u4_y    u4_z]
>>> >
>>> > Subsequently the fieldsplit is
>>> > [u1_x   u1_y    u1_z    u2_x    u2_y    u2_z    u3_x    u3_y    u3_z
>>>  u4_x    u4_y    u4_z]
>>> > [p_1    p_3]
>>> >
>>> > Then what is the option to set block size 3 for split 0?
>>> >
>>> > Sorry, I search several forum threads but cannot figure out the
>>> options as you said.
>>> >
>>> >
>>> >
>>> > You can still do that. It can be done with options once the
>>> decomposition is working. Its true that these solvers
>>> > work better with the block size set. However, if its the P2 Laplacian
>>> it does not really matter since its uncoupled.
>>> >
>>> > Yes, I agree it's uncoupled with the other field, but the crucial
>>> factor defining the quality of the block preconditioner is the approximate
>>> inversion of individual block. I would merely try block Jacobi first,
>>> because it's quite simple. Nevertheless, fieldsplit implements other nice
>>> things, like Schur complement, etc.
>>> >
>>> > I think concepts are getting confused here. I was talking about the
>>> interaction of components in one block (the P2 block). You
>>> > are talking about interaction between blocks.
>>> >
>>> >   Thanks,
>>> >
>>> >      Matt
>>> >
>>> > Giang
>>> >
>>> >
>>> >
>>> > On Fri, Jan 22, 2016 at 11:15 AM, Matthew Knepley <knepley at gmail.com>
>>> wrote:
>>> > On Fri, Jan 22, 2016 at 3:40 AM, Hoang Giang Bui <hgbk2008 at gmail.com>
>>> wrote:
>>> > Hi Matt
>>> > I would rather like to set the block size for block P2 too. Why?
>>> >
>>> > Because in one of my test (for problem involves only [u_x u_y u_z]),
>>> the gmres + Hypre AMG converges in 50 steps with block size 3, whereby it
>>> increases to 140 if block size is 1 (see attached files).
>>> >
>>> > You can still do that. It can be done with options once the
>>> decomposition is working. Its true that these solvers
>>> > work better with the block size set. However, if its the P2 Laplacian
>>> it does not really matter since its uncoupled.
>>> >
>>> > This gives me the impression that AMG will give better inversion for
>>> "P2" block if I can set its block size to 3. Of course it's still an
>>> hypothesis but worth to try.
>>> >
>>> > Another question: In one of the Petsc presentation, you said the Hypre
>>> AMG does not scale well, because set up cost amortize the iterations. How
>>> is it quantified? and what is the memory overhead?
>>> >
>>> > I said the Hypre setup cost is not scalable, but it can be amortized
>>> over the iterations. You can quantify this
>>> > just by looking at the PCSetUp time as your increase the number of
>>> processes. I don't think they have a good
>>> > model for the memory usage, and if they do, I do not know what it is.
>>> However, generally Hypre takes more
>>> > memory than the agglomeration MG like ML or GAMG.
>>> >
>>> >   Thanks,
>>> >
>>> >     Matt
>>> >
>>> >
>>> > Giang
>>> >
>>> > On Mon, Jan 18, 2016 at 5:25 PM, Jed Brown <jed at jedbrown.org> wrote:
>>> > Hoang Giang Bui <hgbk2008 at gmail.com> writes:
>>> >
>>> > > Why P2/P2 is not for co-located discretization?
>>> >
>>> > Matt typed "P2/P2" when me meant "P2/P1".
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> > -- Norbert Wiener
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> > -- Norbert Wiener
>>> >
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160126/90fd1d9c/attachment-0001.html>

From salazardetro1 at llnl.gov  Wed Jan 27 11:34:09 2016
From: salazardetro1 at llnl.gov (Salazar De Troya, Miguel)
Date: Wed, 27 Jan 2016 17:34:09 +0000
Subject: [petsc-users] Basic vector calculation question
Message-ID: <D2CE3F7F.F66%salazardetro1@llnl.gov>

Hello

Suppose I have four vectors A,B, C and D  with the same number of components. I would like to do the following component wise vector operation:

(A ? B)  / (C + D)

I could calculate A-B and C+D  and store the results in temporary PETSc vector temp_result1 and temp_result2 (I need to keep A,B, C and D unchanged) and then call VecPointwiseDivide(temp_result1,temp_result1,temp_result2) and get my result in temp_result1.

On the other hand, to avoid creating two temporary PETSc vectors, I could call VecGetArray()/VecRestoreArray() on the four vectors, iterate over the local indices and perform the same operations at once and store the result in another vector. What is the best way and why? I think that VecGetArray() creates temporary sequential vectors as well. I?m not sure if this is what the above mentioned PETSc routines internally do.

Thanks
Miguel


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160127/c680e0da/attachment.html>

From hong at aspiritech.org  Wed Jan 27 12:02:15 2016
From: hong at aspiritech.org (hong at aspiritech.org)
Date: Wed, 27 Jan 2016 12:02:15 -0600
Subject: [petsc-users] Basic vector calculation question
In-Reply-To: <D2CE3F7F.F66%salazardetro1@llnl.gov>
References: <D2CE3F7F.F66%salazardetro1@llnl.gov>
Message-ID: <CAGCphBuM9tVTDJnG3OrDOs7D-kRVdqv-f0vAt8e+ikM=2KZgog@mail.gmail.com>

Salazar:
Use VecGetArray()/VecRestoreArray(). They access arrays, do not create
any temporary
sequential vectors.

Hong

Hello
>
> Suppose I have four vectors A,B, C and D  with the same number of
> components. I would like to do the following component wise vector
> operation:
>
> (A ? B)  / (C + D)
>
> I could calculate A-B and C+D  and store the results in temporary PETSc
> vector temp_result1 and temp_result2 (I need to keep A,B, C and D
> unchanged) and then call
> VecPointwiseDivide(temp_result1,temp_result1,temp_result2) and get my
> result in temp_result1.
>
> On the other hand, to avoid creating two temporary PETSc vectors, I could
> call VecGetArray()/VecRestoreArray() on the four vectors, iterate over the
> local indices and perform the same operations at once and store the result
> in another vector. What is the best way and why? I think that VecGetArray()
> creates temporary sequential vectors as well. I?m not sure if this is what
> the above mentioned PETSc routines internally do.
>
> Thanks
> Miguel
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160127/b60c8e0d/attachment.html>

From epscodes at gmail.com  Wed Jan 27 15:49:02 2016
From: epscodes at gmail.com (Xiangdong)
Date: Wed, 27 Jan 2016 16:49:02 -0500
Subject: [petsc-users] repartition for dynamic load balancing
Message-ID: <CAAPpcpm0U9Hkc95SCbujsL5TQL+MLUh9zFXJj=mPmfHDFQKLLg@mail.gmail.com>

Hello everyone,

I have a question on dynamic load balance in petsc. I started running a
simulation with one partition. As the simulation goes on, that partition
may lead to load imbalance since it is a non-steady problem. If it is worth
to perform the load balance, is there an easy way to re-partition the mesh
and continue the simulation?

Thanks.

Xiangdong
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160127/eb3fc94b/attachment.html>

From jed at jedbrown.org  Wed Jan 27 23:20:21 2016
From: jed at jedbrown.org (Jed Brown)
Date: Wed, 27 Jan 2016 22:20:21 -0700
Subject: [petsc-users] repartition for dynamic load balancing
In-Reply-To: <CAAPpcpm0U9Hkc95SCbujsL5TQL+MLUh9zFXJj=mPmfHDFQKLLg@mail.gmail.com>
References: <CAAPpcpm0U9Hkc95SCbujsL5TQL+MLUh9zFXJj=mPmfHDFQKLLg@mail.gmail.com>
Message-ID: <87lh7ap9ey.fsf@jedbrown.org>

Xiangdong <epscodes at gmail.com> writes:

> I have a question on dynamic load balance in petsc. I started running a
> simulation with one partition. As the simulation goes on, that partition
> may lead to load imbalance since it is a non-steady problem. If it is worth
> to perform the load balance, is there an easy way to re-partition the mesh
> and continue the simulation?

Are you using a PETSc DM?  What "mesh"?  If you own it, then
repartitioning it is entirely your business.

In general, after adapting the mesh, you rebuild all algebraic data
structures.  Solvers can be reset (SNESReset, etc.).
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 818 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160127/770b02bb/attachment.pgp>

From bikash at umich.edu  Thu Jan 28 01:32:15 2016
From: bikash at umich.edu (Bikash Kanungo)
Date: Thu, 28 Jan 2016 02:32:15 -0500
Subject: [petsc-users] MPI_AllReduce error with -xcore-avx2 flags
Message-ID: <CA+H+vqLH-cYFNa3bY0uNwseH41T9fbcp73xKP+EjKMqrudkE5g@mail.gmail.com>

Hi,

I was trying to use BVOrthogonalize() function in SLEPc. For smaller
problems (10-20 vectors of length < 20,000) I'm able to use it without any
trouble. For larger problems ( > 150 vectors of length > 400,000) the code
aborts citing an MPI_AllReduce error with following message:

Scalar value must be same on all processes, argument # 3.

I was skeptical that the PETSc compilation might be faulty and tried to
build a minimalistic version omitting the previously used -xcore-avx2 flags
in CFLAGS abd CXXFLAGS. That seemed to have done the cure.

What perplexes me is that I have been using the same code with -xcore-avx2
flags in PETSc build on a local cluster at the University of Michigan
without any problem. It is only until recently when I moved to Xsede's
Comet machine, that I started getting this MPI_AllReduce error with
-xcore-avx2.

Do you have any clue on why the same PETSc build fails on two different
machines just because of a build flag?

Regards,
Bikash

-- 
Bikash S. Kanungo
PhD Student
Computational Materials Physics Group
Mechanical Engineering
University of Michigan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160128/1eb21ed7/attachment.html>

From jroman at dsic.upv.es  Thu Jan 28 01:56:42 2016
From: jroman at dsic.upv.es (Jose E. Roman)
Date: Thu, 28 Jan 2016 08:56:42 +0100
Subject: [petsc-users] MPI_AllReduce error with -xcore-avx2 flags
In-Reply-To: <CA+H+vqLH-cYFNa3bY0uNwseH41T9fbcp73xKP+EjKMqrudkE5g@mail.gmail.com>
References: <CA+H+vqLH-cYFNa3bY0uNwseH41T9fbcp73xKP+EjKMqrudkE5g@mail.gmail.com>
Message-ID: <C24B4D48-9855-4F28-B742-E532DD515DC7@dsic.upv.es>


> El 28 ene 2016, a las 8:32, Bikash Kanungo <bikash at umich.edu> escribi?:
> 
> Hi,
> 
> I was trying to use BVOrthogonalize() function in SLEPc. For smaller problems (10-20 vectors of length < 20,000) I'm able to use it without any trouble. For larger problems ( > 150 vectors of length > 400,000) the code aborts citing an MPI_AllReduce error with following message:
> 
> Scalar value must be same on all processes, argument # 3.
> 
> I was skeptical that the PETSc compilation might be faulty and tried to build a minimalistic version omitting the previously used -xcore-avx2 flags in CFLAGS abd CXXFLAGS. That seemed to have done the cure. 
> 
> What perplexes me is that I have been using the same code with -xcore-avx2 flags in PETSc build on a local cluster at the University of Michigan without any problem. It is only until recently when I moved to Xsede's Comet machine, that I started getting this MPI_AllReduce error with -xcore-avx2.
> 
> Do you have any clue on why the same PETSc build fails on two different machines just because of a build flag?
> 
> Regards,
> Bikash 
> 
> -- 
> Bikash S. Kanungo
> PhD Student
> Computational Materials Physics Group
> Mechanical Engineering 
> University of Michigan
> 

Without the complete error message I cannot tell the exact point where it is failing.
Jose


From bikash at umich.edu  Thu Jan 28 02:13:38 2016
From: bikash at umich.edu (Bikash Kanungo)
Date: Thu, 28 Jan 2016 03:13:38 -0500
Subject: [petsc-users] MPI_AllReduce error with -xcore-avx2 flags
In-Reply-To: <C24B4D48-9855-4F28-B742-E532DD515DC7@dsic.upv.es>
References: <CA+H+vqLH-cYFNa3bY0uNwseH41T9fbcp73xKP+EjKMqrudkE5g@mail.gmail.com>
	<C24B4D48-9855-4F28-B742-E532DD515DC7@dsic.upv.es>
Message-ID: <CA+H+vqKyau9eWQ06AVYKX4eSxbWYXwXaWro_0Ngk9sgW-NLabw@mail.gmail.com>

Hi Jose,

Here is the complete error message:

[0]PETSC ERROR: --------------------- Error Message
--------------------------------------------------------------
[0]PETSC ERROR: Invalid argument
[0]PETSC ERROR: Scalar value must be same on all processes, argument # 3
[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for
trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.5.2, Sep, 08, 2014
[0]PETSC ERROR: Unknown Name on a intel-openmpi_ib named
comet-03-60.sdsc.edu by bikashk Thu Jan 28 00:09:17 2016
[0]PETSC ERROR: Configure options CFLAGS="-fPIC -xcore-avx2" FFLAGS="-fPIC
-xcore-avx2" CXXFLAGS="-fPIC -xcore-avx2"
--prefix=/opt/petsc/intel/openmpi_ib --with-mpi=true
--download-pastix=../pastix_5.2.2.12.tar.bz2
--download-ptscotch=../scotch_6.0.0_esmumps.tar.gz
--with-blas-lib="-Wl,--start-group
/opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64/libmkl_intel_lp64.a
/opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64/libmkl_sequential.a
/opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64/libmkl_core.a
-Wl,--end-group -lpthread -lm" --with-lapack-lib="-Wl,--start-group
/opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64/libmkl_intel_lp64.a
/opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64/libmkl_sequential.a
/opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64/libmkl_core.a
-Wl,--end-group -lpthread -lm"
--with-superlu_dist-include=/opt/superlu/intel/openmpi_ib/include
--with-superlu_dist-lib="-L/opt/superlu/intel/openmpi_ib/lib -lsuperlu"
--with-parmetis-dir=/opt/parmetis/intel/openmpi_ib
--with-metis-dir=/opt/parmetis/intel/openmpi_ib
--with-mpi-dir=/opt/openmpi/intel/ib
--with-scalapack-dir=/opt/scalapack/intel/openmpi_ib
--download-mumps=../MUMPS_4.10.0-p3.tar.gz
--download-blacs=../blacs-dev.tar.gz
--download-fblaslapack=../fblaslapack-3.4.2.tar.gz --with-pic=true
--with-shared-libraries=1 --with-hdf5=true
--with-hdf5-dir=/opt/hdf5/intel/openmpi_ib --with-debugging=false
[0]PETSC ERROR: #1 BVScaleColumn() line 380 in
/scratch/build/git/math-roll/BUILD/sdsc-slepc_intel_openmpi_ib-3.5.3/slepc-3.5.3/src/sys/classes/bv/interface/bvops.c
[0]PETSC ERROR: #2 BVOrthogonalize_GS() line 474 in
/scratch/build/git/math-roll/BUILD/sdsc-slepc_intel_openmpi_ib-3.5.3/slepc-3.5.3/src/sys/classes/bv/interface/bvorthog.c
[0]PETSC ERROR: #3 BVOrthogonalize() line 535 in
/scratch/build/git/math-roll/BUILD/sdsc-slepc_intel_openmpi_ib-3.5.3/slepc-3.5.3/src/sys/classes/bv/interface/bvorthog.c
[comet-03-60:27927] *** Process received signal ***
[comet-03-60:27927] Signal: Aborted (6)


On Thu, Jan 28, 2016 at 2:56 AM, Jose E. Roman <jroman at dsic.upv.es> wrote:

>
> > El 28 ene 2016, a las 8:32, Bikash Kanungo <bikash at umich.edu> escribi?:
> >
> > Hi,
> >
> > I was trying to use BVOrthogonalize() function in SLEPc. For smaller
> problems (10-20 vectors of length < 20,000) I'm able to use it without any
> trouble. For larger problems ( > 150 vectors of length > 400,000) the code
> aborts citing an MPI_AllReduce error with following message:
> >
> > Scalar value must be same on all processes, argument # 3.
> >
> > I was skeptical that the PETSc compilation might be faulty and tried to
> build a minimalistic version omitting the previously used -xcore-avx2 flags
> in CFLAGS abd CXXFLAGS. That seemed to have done the cure.
> >
> > What perplexes me is that I have been using the same code with
> -xcore-avx2 flags in PETSc build on a local cluster at the University of
> Michigan without any problem. It is only until recently when I moved to
> Xsede's Comet machine, that I started getting this MPI_AllReduce error with
> -xcore-avx2.
> >
> > Do you have any clue on why the same PETSc build fails on two different
> machines just because of a build flag?
> >
> > Regards,
> > Bikash
> >
> > --
> > Bikash S. Kanungo
> > PhD Student
> > Computational Materials Physics Group
> > Mechanical Engineering
> > University of Michigan
> >
>
> Without the complete error message I cannot tell the exact point where it
> is failing.
> Jose
>
>


-- 
Bikash S. Kanungo
PhD Student
Computational Materials Physics Group
Mechanical Engineering
University of Michigan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160128/c84fe948/attachment-0001.html>

From victor.magri at dicea.unipd.it  Thu Jan 28 03:04:17 2016
From: victor.magri at dicea.unipd.it (victor.magri at dicea.unipd.it)
Date: Thu, 28 Jan 2016 10:04:17 +0100
Subject: [petsc-users] PETSc interface to MueLu
Message-ID: <12e160516897f8a313d892a35bd8a438@dicea.unipd.it>

 
Dear PETSc developers, 

is it possible to create an interface for MueLu (given its dependencies
to other Trilinos packages)? Do you plan to do that in the future? 

Thank you! 
-- 

Victor A. P. Magri - PhD student
Dept. of Civil, Environmental and Architectural Eng.
University of Padova
Via Marzolo, 9 - 35131 Padova, Italy 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160128/c05e7550/attachment.html>

From jroman at dsic.upv.es  Thu Jan 28 04:18:25 2016
From: jroman at dsic.upv.es (Jose E. Roman)
Date: Thu, 28 Jan 2016 11:18:25 +0100
Subject: [petsc-users] MPI_AllReduce error with -xcore-avx2 flags
In-Reply-To: <CA+H+vqKyau9eWQ06AVYKX4eSxbWYXwXaWro_0Ngk9sgW-NLabw@mail.gmail.com>
References: <CA+H+vqLH-cYFNa3bY0uNwseH41T9fbcp73xKP+EjKMqrudkE5g@mail.gmail.com>
	<C24B4D48-9855-4F28-B742-E532DD515DC7@dsic.upv.es>
	<CA+H+vqKyau9eWQ06AVYKX4eSxbWYXwXaWro_0Ngk9sgW-NLabw@mail.gmail.com>
Message-ID: <D3C9947A-DC4D-4E69-8000-46C5FA883C22@dsic.upv.es>


> El 28 ene 2016, a las 9:13, Bikash Kanungo <bikash at umich.edu> escribi?:
> 
> Hi Jose,
> 
> Here is the complete error message:
> 
> [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [0]PETSC ERROR: Invalid argument
> [0]PETSC ERROR: Scalar value must be same on all processes, argument # 3
> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.5.2, Sep, 08, 2014
> [0]PETSC ERROR: Unknown Name on a intel-openmpi_ib named comet-03-60.sdsc.edu by bikashk Thu Jan 28 00:09:17 2016
> [0]PETSC ERROR: Configure options CFLAGS="-fPIC -xcore-avx2" FFLAGS="-fPIC -xcore-avx2" CXXFLAGS="-fPIC -xcore-avx2" --prefix=/opt/petsc/intel/openmpi_ib --with-mpi=true --download-pastix=../pastix_5.2.2.12.tar.bz2 --download-ptscotch=../scotch_6.0.0_esmumps.tar.gz --with-blas-lib="-Wl,--start-group /opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64/libmkl_intel_lp64.a            /opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64/libmkl_sequential.a /opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64/libmkl_core.a            -Wl,--end-group -lpthread -lm" --with-lapack-lib="-Wl,--start-group /opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64/libmkl_intel_lp64.a            /opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64/libmkl_sequential.a /opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64/libmkl_core.a            -Wl,--end-group -lpthread -lm" --with-superlu_dist-include=/opt/superlu/intel/openmpi_ib/include --with-superlu_dist-lib="-L/opt/superlu/intel/openmpi_ib/lib -lsuperlu" --with-parmetis-dir=/opt/parmetis/intel/openmpi_ib --with-metis-dir=/opt/parmetis/intel/openmpi_ib --with-mpi-dir=/opt/openmpi/intel/ib --with-scalapack-dir=/opt/scalapack/intel/openmpi_ib --download-mumps=../MUMPS_4.10.0-p3.tar.gz --download-blacs=../blacs-dev.tar.gz --download-fblaslapack=../fblaslapack-3.4.2.tar.gz --with-pic=true --with-shared-libraries=1 --with-hdf5=true --with-hdf5-dir=/opt/hdf5/intel/openmpi_ib --with-debugging=false
> [0]PETSC ERROR: #1 BVScaleColumn() line 380 in /scratch/build/git/math-roll/BUILD/sdsc-slepc_intel_openmpi_ib-3.5.3/slepc-3.5.3/src/sys/classes/bv/interface/bvops.c
> [0]PETSC ERROR: #2 BVOrthogonalize_GS() line 474 in /scratch/build/git/math-roll/BUILD/sdsc-slepc_intel_openmpi_ib-3.5.3/slepc-3.5.3/src/sys/classes/bv/interface/bvorthog.c
> [0]PETSC ERROR: #3 BVOrthogonalize() line 535 in /scratch/build/git/math-roll/BUILD/sdsc-slepc_intel_openmpi_ib-3.5.3/slepc-3.5.3/src/sys/classes/bv/interface/bvorthog.c
> [comet-03-60:27927] *** Process received signal ***
> [comet-03-60:27927] Signal: Aborted (6)
> 
> 

Here are some comments:
- These kind of errors appear only in debugging mode. I don't know why you are getting them since you have --with-debugging=false
- The flag -xcore-avx2 enables fused multiply-add (FMA) instructions, which means you get slightly more accurate floating-point results. This could explain why you get different behaviour with/without this flag.
- The argument of BVScaleColumn() is guaranteed to be the same in all processes, so the only explanation is that it has become a NaN. [Note that in petsc-master (and hence petsc-3.7) NaN's no longer trigger this error.]
- My conclusion is that your column vectors of the BV object are not linearly independent, so eventually the vector norm is (almost) zero. The error will appear only if the computed value is exactly zero.

In summary: BVOrthogonalize() is new in SLEPc, and it is not very well tested. In particular, linearly dependent vectors are not handled well. For the next release I will add code to take into account rank-deficient BV's. In the meantime, you may want to try running with '-bv_orthog_block chol' (it uses a different orthogonalization algorithm).

Jose


From bikash at umich.edu  Thu Jan 28 04:45:44 2016
From: bikash at umich.edu (Bikash Kanungo)
Date: Thu, 28 Jan 2016 05:45:44 -0500
Subject: [petsc-users] MPI_AllReduce error with -xcore-avx2 flags
In-Reply-To: <D3C9947A-DC4D-4E69-8000-46C5FA883C22@dsic.upv.es>
References: <CA+H+vqLH-cYFNa3bY0uNwseH41T9fbcp73xKP+EjKMqrudkE5g@mail.gmail.com>
	<C24B4D48-9855-4F28-B742-E532DD515DC7@dsic.upv.es>
	<CA+H+vqKyau9eWQ06AVYKX4eSxbWYXwXaWro_0Ngk9sgW-NLabw@mail.gmail.com>
	<D3C9947A-DC4D-4E69-8000-46C5FA883C22@dsic.upv.es>
Message-ID: <CA+H+vq+j3sHLH2e0BbUTdFd+yVpzrW-oeiP3PYrBBWW=x=2_OA@mail.gmail.com>

Yeah I suspected linear dependence. But I was puzzled by the error
occurring in one machine and not the other. But even on the machine that it
failed, it failed for some runs and passed successfully for others. So it
suggests that the vector norm is almost zero in certain cases (i.e, in the
runs that survive) and zero in others (i.e., the runs that fail). I'll use
-bv_orthog_block chol to see if the error persists.

Thanks a ton, Jose.

Regards,
Bikash

On Thu, Jan 28, 2016 at 5:18 AM, Jose E. Roman <jroman at dsic.upv.es> wrote:

>
> > El 28 ene 2016, a las 9:13, Bikash Kanungo <bikash at umich.edu> escribi?:
> >
> > Hi Jose,
> >
> > Here is the complete error message:
> >
> > [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> > [0]PETSC ERROR: Invalid argument
> > [0]PETSC ERROR: Scalar value must be same on all processes, argument # 3
> > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> > [0]PETSC ERROR: Petsc Release Version 3.5.2, Sep, 08, 2014
> > [0]PETSC ERROR: Unknown Name on a intel-openmpi_ib named
> comet-03-60.sdsc.edu by bikashk Thu Jan 28 00:09:17 2016
> > [0]PETSC ERROR: Configure options CFLAGS="-fPIC -xcore-avx2"
> FFLAGS="-fPIC -xcore-avx2" CXXFLAGS="-fPIC -xcore-avx2"
> --prefix=/opt/petsc/intel/openmpi_ib --with-mpi=true
> --download-pastix=../pastix_5.2.2.12.tar.bz2
> --download-ptscotch=../scotch_6.0.0_esmumps.tar.gz
> --with-blas-lib="-Wl,--start-group
> /opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64/libmkl_intel_lp64.a
>
> /opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64/libmkl_sequential.a
> /opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64/libmkl_core.a
>     -Wl,--end-group -lpthread -lm" --with-lapack-lib="-Wl,--start-group
> /opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64/libmkl_intel_lp64.a
>
> /opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64/libmkl_sequential.a
> /opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64/libmkl_core.a
>     -Wl,--end-group -lpthread -lm"
> --with-superlu_dist-include=/opt/superlu/intel/openmpi_ib/include
> --with-superlu_dist-lib="-L/opt/superlu/intel/openmpi_ib/lib -lsuperlu"
> --with-parmetis-dir=/opt/parmetis/intel/openmpi_ib
> --with-metis-dir=/opt/parmetis/intel/openmpi_ib
> --with-mpi-dir=/opt/openmpi/intel/ib
> --with-scalapack-dir=/opt/scalapack/intel/openmpi_ib
> --download-mumps=../MUMPS_4.10.0-p3.tar.gz
> --download-blacs=../blacs-dev.tar.gz
> --download-fblaslapack=../fblaslapack-3.4.2.tar.gz --with-pic=true
> --with-shared-libraries=1 --with-hdf5=true
> --with-hdf5-dir=/opt/hdf5/intel/openmpi_ib --with-debugging=false
> > [0]PETSC ERROR: #1 BVScaleColumn() line 380 in
> /scratch/build/git/math-roll/BUILD/sdsc-slepc_intel_openmpi_ib-3.5.3/slepc-3.5.3/src/sys/classes/bv/interface/bvops.c
> > [0]PETSC ERROR: #2 BVOrthogonalize_GS() line 474 in
> /scratch/build/git/math-roll/BUILD/sdsc-slepc_intel_openmpi_ib-3.5.3/slepc-3.5.3/src/sys/classes/bv/interface/bvorthog.c
> > [0]PETSC ERROR: #3 BVOrthogonalize() line 535 in
> /scratch/build/git/math-roll/BUILD/sdsc-slepc_intel_openmpi_ib-3.5.3/slepc-3.5.3/src/sys/classes/bv/interface/bvorthog.c
> > [comet-03-60:27927] *** Process received signal ***
> > [comet-03-60:27927] Signal: Aborted (6)
> >
> >
>
> Here are some comments:
> - These kind of errors appear only in debugging mode. I don't know why you
> are getting them since you have --with-debugging=false
> - The flag -xcore-avx2 enables fused multiply-add (FMA) instructions,
> which means you get slightly more accurate floating-point results. This
> could explain why you get different behaviour with/without this flag.
> - The argument of BVScaleColumn() is guaranteed to be the same in all
> processes, so the only explanation is that it has become a NaN. [Note that
> in petsc-master (and hence petsc-3.7) NaN's no longer trigger this error.]
> - My conclusion is that your column vectors of the BV object are not
> linearly independent, so eventually the vector norm is (almost) zero. The
> error will appear only if the computed value is exactly zero.
>
> In summary: BVOrthogonalize() is new in SLEPc, and it is not very well
> tested. In particular, linearly dependent vectors are not handled well. For
> the next release I will add code to take into account rank-deficient BV's.
> In the meantime, you may want to try running with '-bv_orthog_block chol'
> (it uses a different orthogonalization algorithm).
>
> Jose
>
>


-- 
Bikash S. Kanungo
PhD Student
Computational Materials Physics Group
Mechanical Engineering
University of Michigan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160128/2394b47b/attachment.html>

From hzhang at mcs.anl.gov  Thu Jan 28 10:32:53 2016
From: hzhang at mcs.anl.gov (Hong)
Date: Thu, 28 Jan 2016 10:32:53 -0600
Subject: [petsc-users] PETSc interface to MueLu
In-Reply-To: <12e160516897f8a313d892a35bd8a438@dicea.unipd.it>
References: <12e160516897f8a313d892a35bd8a438@dicea.unipd.it>
Message-ID: <CAGCphBsbTF8_akpnqetXarg7tsu9nih4cKZZ9n8w3VWvzzggnA@mail.gmail.com>

Victor,
What are the differences between MueLu and ML?
Hong

On Thu, Jan 28, 2016 at 3:04 AM, <victor.magri at dicea.unipd.it> wrote:

> Dear PETSc developers,
>
> is it possible to create an interface for MueLu (given its dependencies to
> other Trilinos packages)? Do you plan to do that in the future?
>
> Thank you!
> --
> Victor A. P. Magri - PhD student
> Dept. of Civil, Environmental and Architectural Eng.
> University of Padova
> Via Marzolo, 9 - 35131 Padova, Italy
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160128/7e6e76b1/attachment.html>

From victor.magri at dicea.unipd.it  Thu Jan 28 11:09:17 2016
From: victor.magri at dicea.unipd.it (victor.magri at dicea.unipd.it)
Date: Thu, 28 Jan 2016 18:09:17 +0100
Subject: [petsc-users] PETSc interface to MueLu
In-Reply-To: <CAGCphBsbTF8_akpnqetXarg7tsu9nih4cKZZ9n8w3VWvzzggnA@mail.gmail.com>
References: <12e160516897f8a313d892a35bd8a438@dicea.unipd.it>
	<CAGCphBsbTF8_akpnqetXarg7tsu9nih4cKZZ9n8w3VWvzzggnA@mail.gmail.com>
Message-ID: <47f4ad28cd4241b61690f8bf2dc8e9e7@dicea.unipd.it>

 
Dear Hong, 

According to this link 

http://www.fastmath-scidac.org/software/mlmuelu.html [1] 

MueLu is the sucessor to ML and should support a larger number of scalar
types. Also according to this presentation 

https://cfwebprod.sandia.gov/cfdocs/CompResearch/docs/MueLuOverview_TUG2013.pdf
[2] 

I suppose that MueLu would give a cleaner implementation of ML's
features and possibly give a faster code. Also, as it supports the
Kokkos library, we would have the possibility to run on MPI+threads or
MPI+GPU. However, I think that implementing an interface for this could
be a problem since PETSc works better with pure MPI, please correct me
if I am wrong about this. 

Anyway, if it were possible, I just would like to try both multigrid
implementations through PETSc and see how they behave. 

Thank you! 

Il 28-01-2016 17:32 Hong ha scritto: 

> Victor, 
> What are the differences between MueLu and ML? 
> Hong
> 
> On Thu, Jan 28, 2016 at 3:04 AM, <victor.magri at dicea.unipd.it> wrote:
> 
>> Dear PETSc developers, 
>> 
>> is it possible to create an interface for MueLu (given its dependencies to other Trilinos packages)? Do you plan to do that in the future? 
>> 
>> Thank you! 
>> -- 
>> 
>> Victor A. P. Magri - PhD student
>> Dept. of Civil, Environmental and Architectural Eng.
>> University of Padova
>> Via Marzolo, 9 - 35131 Padova, Italy

-- 

Victor A. P. Magri - PhD student
Dept. of Civil, Environmental and Architectural Eng.
University of Padova
Via Marzolo, 9 - 35131 Padova, Italy 

Links:
------
[1] http://www.fastmath-scidac.org/software/mlmuelu.html
[2]
https://cfwebprod.sandia.gov/cfdocs/CompResearch/docs/MueLuOverview_TUG2013.pdf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160128/ab16b333/attachment-0001.html>

From epscodes at gmail.com  Thu Jan 28 11:11:50 2016
From: epscodes at gmail.com (Xiangdong)
Date: Thu, 28 Jan 2016 12:11:50 -0500
Subject: [petsc-users] repartition for dynamic load balancing
In-Reply-To: <87lh7ap9ey.fsf@jedbrown.org>
References: <CAAPpcpm0U9Hkc95SCbujsL5TQL+MLUh9zFXJj=mPmfHDFQKLLg@mail.gmail.com>
	<87lh7ap9ey.fsf@jedbrown.org>
Message-ID: <CAAPpcpmjV7kiG4aYWhUHEy89q6OT00TUatiNxgxYQQf=U2pq_Q@mail.gmail.com>

Yes, it can be either DMDA or DMPlex. For example, I have 1D DMDA with
Nx=10 and np=2. At the beginning each processor owns 5 cells. After some
simulation time, I found that repartition the 10 cells into 3 and 7 is
better for load balancing. Is there an easy/efficient way to migrate data
from one partition to another partition? I am wondering whether there are
some functions or libraries help me manage this redistribution.

Thanks.
Xiangdong

On Thu, Jan 28, 2016 at 12:20 AM, Jed Brown <jed at jedbrown.org> wrote:

> Xiangdong <epscodes at gmail.com> writes:
>
> > I have a question on dynamic load balance in petsc. I started running a
> > simulation with one partition. As the simulation goes on, that partition
> > may lead to load imbalance since it is a non-steady problem. If it is
> worth
> > to perform the load balance, is there an easy way to re-partition the
> mesh
> > and continue the simulation?
>
> Are you using a PETSc DM?  What "mesh"?  If you own it, then
> repartitioning it is entirely your business.
>
> In general, after adapting the mesh, you rebuild all algebraic data
> structures.  Solvers can be reset (SNESReset, etc.).
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160128/ef9b2ea4/attachment.html>

From bsmith at mcs.anl.gov  Thu Jan 28 11:21:52 2016
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Thu, 28 Jan 2016 11:21:52 -0600
Subject: [petsc-users] repartition for dynamic load balancing
In-Reply-To: <CAAPpcpmjV7kiG4aYWhUHEy89q6OT00TUatiNxgxYQQf=U2pq_Q@mail.gmail.com>
References: <CAAPpcpm0U9Hkc95SCbujsL5TQL+MLUh9zFXJj=mPmfHDFQKLLg@mail.gmail.com>
	<87lh7ap9ey.fsf@jedbrown.org>
	<CAAPpcpmjV7kiG4aYWhUHEy89q6OT00TUatiNxgxYQQf=U2pq_Q@mail.gmail.com>
Message-ID: <3BE17FF7-4D30-48DF-B1AE-979C22E145DF@mcs.anl.gov>


> On Jan 28, 2016, at 11:11 AM, Xiangdong <epscodes at gmail.com> wrote:
> 
> Yes, it can be either DMDA or DMPlex. For example, I have 1D DMDA with Nx=10 and np=2. At the beginning each processor owns 5 cells. After some simulation time, I found that repartition the 10 cells into 3 and 7 is better for load balancing. Is there an easy/efficient way to migrate data from one partition to another partition? I am wondering whether there are some functions or libraries help me manage this redistribution.

  For DMDA we don't provide tools for doing this, nor do we expect to. For this type of need for dynamic migration we recommend using DMPlex or some external mesh management system.

  Barry

> 
> Thanks.
> Xiangdong
> 
> On Thu, Jan 28, 2016 at 12:20 AM, Jed Brown <jed at jedbrown.org> wrote:
> Xiangdong <epscodes at gmail.com> writes:
> 
> > I have a question on dynamic load balance in petsc. I started running a
> > simulation with one partition. As the simulation goes on, that partition
> > may lead to load imbalance since it is a non-steady problem. If it is worth
> > to perform the load balance, is there an easy way to re-partition the mesh
> > and continue the simulation?
> 
> Are you using a PETSc DM?  What "mesh"?  If you own it, then
> repartitioning it is entirely your business.
> 
> In general, after adapting the mesh, you rebuild all algebraic data
> structures.  Solvers can be reset (SNESReset, etc.).
> 


From knepley at gmail.com  Thu Jan 28 11:25:07 2016
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 28 Jan 2016 11:25:07 -0600
Subject: [petsc-users] PETSc interface to MueLu
In-Reply-To: <12e160516897f8a313d892a35bd8a438@dicea.unipd.it>
References: <12e160516897f8a313d892a35bd8a438@dicea.unipd.it>
Message-ID: <CAMYG4GmX48VXu_KgZU9RdMfv7q4ijLTwpC=24nMb+wuuQQndbw@mail.gmail.com>

On Thu, Jan 28, 2016 at 3:04 AM, <victor.magri at dicea.unipd.it> wrote:

> Dear PETSc developers,
>
> is it possible to create an interface for MueLu (given its dependencies to
> other Trilinos packages)? Do you plan to do that in the future?
>
> Right now, itsa not clear that this provides anything our ML interface
does not.

  Matt


> Thank you!
> --
> Victor A. P. Magri - PhD student
> Dept. of Civil, Environmental and Architectural Eng.
> University of Padova
> Via Marzolo, 9 - 35131 Padova, Italy
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160128/c8c3c23c/attachment.html>

From epscodes at gmail.com  Thu Jan 28 11:36:42 2016
From: epscodes at gmail.com (Xiangdong)
Date: Thu, 28 Jan 2016 12:36:42 -0500
Subject: [petsc-users] repartition for dynamic load balancing
In-Reply-To: <3BE17FF7-4D30-48DF-B1AE-979C22E145DF@mcs.anl.gov>
References: <CAAPpcpm0U9Hkc95SCbujsL5TQL+MLUh9zFXJj=mPmfHDFQKLLg@mail.gmail.com>
	<87lh7ap9ey.fsf@jedbrown.org>
	<CAAPpcpmjV7kiG4aYWhUHEy89q6OT00TUatiNxgxYQQf=U2pq_Q@mail.gmail.com>
	<3BE17FF7-4D30-48DF-B1AE-979C22E145DF@mcs.anl.gov>
Message-ID: <CAAPpcpn+_ws-V1UK1ozP+xr4otEns2+L-NQXscf2xe10kjRgLw@mail.gmail.com>

What functions/tools can I use for dynamic migration in DMPlex framework?

Can you also name some external mesh management systems? Thanks.

Xiangdong

On Thu, Jan 28, 2016 at 12:21 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
> > On Jan 28, 2016, at 11:11 AM, Xiangdong <epscodes at gmail.com> wrote:
> >
> > Yes, it can be either DMDA or DMPlex. For example, I have 1D DMDA with
> Nx=10 and np=2. At the beginning each processor owns 5 cells. After some
> simulation time, I found that repartition the 10 cells into 3 and 7 is
> better for load balancing. Is there an easy/efficient way to migrate data
> from one partition to another partition? I am wondering whether there are
> some functions or libraries help me manage this redistribution.
>
>   For DMDA we don't provide tools for doing this, nor do we expect to. For
> this type of need for dynamic migration we recommend using DMPlex or some
> external mesh management system.
>
>   Barry
>
> >
> > Thanks.
> > Xiangdong
> >
> > On Thu, Jan 28, 2016 at 12:20 AM, Jed Brown <jed at jedbrown.org> wrote:
> > Xiangdong <epscodes at gmail.com> writes:
> >
> > > I have a question on dynamic load balance in petsc. I started running a
> > > simulation with one partition. As the simulation goes on, that
> partition
> > > may lead to load imbalance since it is a non-steady problem. If it is
> worth
> > > to perform the load balance, is there an easy way to re-partition the
> mesh
> > > and continue the simulation?
> >
> > Are you using a PETSc DM?  What "mesh"?  If you own it, then
> > repartitioning it is entirely your business.
> >
> > In general, after adapting the mesh, you rebuild all algebraic data
> > structures.  Solvers can be reset (SNESReset, etc.).
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160128/9e15a26d/attachment.html>

From knepley at gmail.com  Thu Jan 28 11:47:49 2016
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 28 Jan 2016 11:47:49 -0600
Subject: [petsc-users] repartition for dynamic load balancing
In-Reply-To: <CAAPpcpn+_ws-V1UK1ozP+xr4otEns2+L-NQXscf2xe10kjRgLw@mail.gmail.com>
References: <CAAPpcpm0U9Hkc95SCbujsL5TQL+MLUh9zFXJj=mPmfHDFQKLLg@mail.gmail.com>
	<87lh7ap9ey.fsf@jedbrown.org>
	<CAAPpcpmjV7kiG4aYWhUHEy89q6OT00TUatiNxgxYQQf=U2pq_Q@mail.gmail.com>
	<3BE17FF7-4D30-48DF-B1AE-979C22E145DF@mcs.anl.gov>
	<CAAPpcpn+_ws-V1UK1ozP+xr4otEns2+L-NQXscf2xe10kjRgLw@mail.gmail.com>
Message-ID: <CAMYG4GnXYtF6hz8uR=m2vaQ0bvT4MFEOXHjr01YvZd1HWiibZA@mail.gmail.com>

On Thu, Jan 28, 2016 at 11:36 AM, Xiangdong <epscodes at gmail.com> wrote:

> What functions/tools can I use for dynamic migration in DMPlex framework?
>

In this paper, http://arxiv.org/abs/1506.06194, we explain how to use the
DMPlexMigrate() function to redistribute data.
In the future, its likely we will add a function that wraps it up with
determination of the new partition at the same time.


> Can you also name some external mesh management systems? Thanks.
>

I will note that if load balance in the solve is your only concern,
PCTelescope can redistribute the DMDA solve.

  Thanks,

    Matt


>
> Xiangdong
>
> On Thu, Jan 28, 2016 at 12:21 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
>>
>> > On Jan 28, 2016, at 11:11 AM, Xiangdong <epscodes at gmail.com> wrote:
>> >
>> > Yes, it can be either DMDA or DMPlex. For example, I have 1D DMDA with
>> Nx=10 and np=2. At the beginning each processor owns 5 cells. After some
>> simulation time, I found that repartition the 10 cells into 3 and 7 is
>> better for load balancing. Is there an easy/efficient way to migrate data
>> from one partition to another partition? I am wondering whether there are
>> some functions or libraries help me manage this redistribution.
>>
>>   For DMDA we don't provide tools for doing this, nor do we expect to.
>> For this type of need for dynamic migration we recommend using DMPlex or
>> some external mesh management system.
>>
>>   Barry
>>
>> >
>> > Thanks.
>> > Xiangdong
>> >
>> > On Thu, Jan 28, 2016 at 12:20 AM, Jed Brown <jed at jedbrown.org> wrote:
>> > Xiangdong <epscodes at gmail.com> writes:
>> >
>> > > I have a question on dynamic load balance in petsc. I started running
>> a
>> > > simulation with one partition. As the simulation goes on, that
>> partition
>> > > may lead to load imbalance since it is a non-steady problem. If it is
>> worth
>> > > to perform the load balance, is there an easy way to re-partition the
>> mesh
>> > > and continue the simulation?
>> >
>> > Are you using a PETSc DM?  What "mesh"?  If you own it, then
>> > repartitioning it is entirely your business.
>> >
>> > In general, after adapting the mesh, you rebuild all algebraic data
>> > structures.  Solvers can be reset (SNESReset, etc.).
>> >
>>
>>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160128/c720907d/attachment-0001.html>

From dave.mayhem23 at gmail.com  Thu Jan 28 13:37:49 2016
From: dave.mayhem23 at gmail.com (Dave May)
Date: Thu, 28 Jan 2016 20:37:49 +0100
Subject: [petsc-users]  repartition for dynamic load balancing
In-Reply-To: <CAMYG4GnXYtF6hz8uR=m2vaQ0bvT4MFEOXHjr01YvZd1HWiibZA@mail.gmail.com>
References: <CAAPpcpm0U9Hkc95SCbujsL5TQL+MLUh9zFXJj=mPmfHDFQKLLg@mail.gmail.com>
	<87lh7ap9ey.fsf@jedbrown.org>
	<CAAPpcpmjV7kiG4aYWhUHEy89q6OT00TUatiNxgxYQQf=U2pq_Q@mail.gmail.com>
	<3BE17FF7-4D30-48DF-B1AE-979C22E145DF@mcs.anl.gov>
	<CAAPpcpn+_ws-V1UK1ozP+xr4otEns2+L-NQXscf2xe10kjRgLw@mail.gmail.com>
	<CAMYG4GnXYtF6hz8uR=m2vaQ0bvT4MFEOXHjr01YvZd1HWiibZA@mail.gmail.com>
Message-ID: <CAJ98EDpjr2JYBb-KRWQ1jJYCh-dgQ2D0_YyKo5NWAEV6RctLXQ@mail.gmail.com>

On Thursday, 28 January 2016, Matthew Knepley <knepley at gmail.com
<javascript:_e(%7B%7D,'cvml','knepley at gmail.com');>> wrote:

> On Thu, Jan 28, 2016 at 11:36 AM, Xiangdong <epscodes at gmail.com> wrote:
>
>> What functions/tools can I use for dynamic migration in DMPlex framework?
>>
>
> In this paper, http://arxiv.org/abs/1506.06194, we explain how to use the
> DMPlexMigrate() function to redistribute data.
> In the future, its likely we will add a function that wraps it up with
> determination of the new partition at the same time.
>
>
>> Can you also name some external mesh management systems? Thanks.
>>
>
> I will note that if load balance in the solve is your only concern,
> PCTelescope can redistribute the DMDA solve.
>

Currently Telescope will only repartition 2d and 3d DMDA's. It does perform
data migration and allows users to specify the number of ranks to be used
in each I,j,k direction via -xxx_grid_x etc. I wouldn't say it
supports "load balancing", as there is no mechanism to define number of
points in each sub-domain

Cheers
  Dave


>
>   Thanks,
>
>     Matt
>
>
>>
>> Xiangdong
>>
>> On Thu, Jan 28, 2016 at 12:21 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>
>>>
>>> > On Jan 28, 2016, at 11:11 AM, Xiangdong <epscodes at gmail.com> wrote:
>>> >
>>> > Yes, it can be either DMDA or DMPlex. For example, I have 1D DMDA with
>>> Nx=10 and np=2. At the beginning each processor owns 5 cells. After some
>>> simulation time, I found that repartition the 10 cells into 3 and 7 is
>>> better for load balancing. Is there an easy/efficient way to migrate data
>>> from one partition to another partition? I am wondering whether there are
>>> some functions or libraries help me manage this redistribution.
>>>
>>>   For DMDA we don't provide tools for doing this, nor do we expect to.
>>> For this type of need for dynamic migration we recommend using DMPlex or
>>> some external mesh management system.
>>>
>>>   Barry
>>>
>>> >
>>> > Thanks.
>>> > Xiangdong
>>> >
>>> > On Thu, Jan 28, 2016 at 12:20 AM, Jed Brown <jed at jedbrown.org> wrote:
>>> > Xiangdong <epscodes at gmail.com> writes:
>>> >
>>> > > I have a question on dynamic load balance in petsc. I started
>>> running a
>>> > > simulation with one partition. As the simulation goes on, that
>>> partition
>>> > > may lead to load imbalance since it is a non-steady problem. If it
>>> is worth
>>> > > to perform the load balance, is there an easy way to re-partition
>>> the mesh
>>> > > and continue the simulation?
>>> >
>>> > Are you using a PETSc DM?  What "mesh"?  If you own it, then
>>> > repartitioning it is entirely your business.
>>> >
>>> > In general, after adapting the mesh, you rebuild all algebraic data
>>> > structures.  Solvers can be reset (SNESReset, etc.).
>>> >
>>>
>>>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160128/fdb885f0/attachment.html>

From knepley at gmail.com  Thu Jan 28 13:41:45 2016
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 28 Jan 2016 13:41:45 -0600
Subject: [petsc-users] repartition for dynamic load balancing
In-Reply-To: <CAJ98EDpjr2JYBb-KRWQ1jJYCh-dgQ2D0_YyKo5NWAEV6RctLXQ@mail.gmail.com>
References: <CAAPpcpm0U9Hkc95SCbujsL5TQL+MLUh9zFXJj=mPmfHDFQKLLg@mail.gmail.com>
	<87lh7ap9ey.fsf@jedbrown.org>
	<CAAPpcpmjV7kiG4aYWhUHEy89q6OT00TUatiNxgxYQQf=U2pq_Q@mail.gmail.com>
	<3BE17FF7-4D30-48DF-B1AE-979C22E145DF@mcs.anl.gov>
	<CAAPpcpn+_ws-V1UK1ozP+xr4otEns2+L-NQXscf2xe10kjRgLw@mail.gmail.com>
	<CAMYG4GnXYtF6hz8uR=m2vaQ0bvT4MFEOXHjr01YvZd1HWiibZA@mail.gmail.com>
	<CAJ98EDpjr2JYBb-KRWQ1jJYCh-dgQ2D0_YyKo5NWAEV6RctLXQ@mail.gmail.com>
Message-ID: <CAMYG4Gm08h2v6jPUMrQaH16m=6y2yFzmbXprwRy7SCU=KmLbxg@mail.gmail.com>

On Thu, Jan 28, 2016 at 1:37 PM, Dave May <dave.mayhem23 at gmail.com> wrote:

>
>
> On Thursday, 28 January 2016, Matthew Knepley <knepley at gmail.com> wrote:
>
>> On Thu, Jan 28, 2016 at 11:36 AM, Xiangdong <epscodes at gmail.com> wrote:
>>
>>> What functions/tools can I use for dynamic migration in DMPlex framework?
>>>
>>
>> In this paper, http://arxiv.org/abs/1506.06194, we explain how to use
>> the DMPlexMigrate() function to redistribute data.
>> In the future, its likely we will add a function that wraps it up with
>> determination of the new partition at the same time.
>>
>>
>>> Can you also name some external mesh management systems? Thanks.
>>>
>>
>> I will note that if load balance in the solve is your only concern,
>> PCTelescope can redistribute the DMDA solve.
>>
>
> Currently Telescope will only repartition 2d and 3d DMDA's. It
> does perform data migration and allows users to specify the number of ranks
> to be used in each I,j,k direction via -xxx_grid_x etc. I wouldn't say it
> supports "load balancing", as there is no mechanism to define number of
> points in each sub-domain
>

Let me be more precise. All I have suggested for any of this are
redistribution tools. You will have to determine
the right weights for "load balance", which I think is always true. Using
the default weights is crazy.

  Matt


> Cheers
>   Dave
>
>
>
>>
>>   Thanks,
>>
>>     Matt
>>
>>
>>>
>>> Xiangdong
>>>
>>> On Thu, Jan 28, 2016 at 12:21 PM, Barry Smith <bsmith at mcs.anl.gov>
>>> wrote:
>>>
>>>>
>>>> > On Jan 28, 2016, at 11:11 AM, Xiangdong <epscodes at gmail.com> wrote:
>>>> >
>>>> > Yes, it can be either DMDA or DMPlex. For example, I have 1D DMDA
>>>> with Nx=10 and np=2. At the beginning each processor owns 5 cells. After
>>>> some simulation time, I found that repartition the 10 cells into 3 and 7 is
>>>> better for load balancing. Is there an easy/efficient way to migrate data
>>>> from one partition to another partition? I am wondering whether there are
>>>> some functions or libraries help me manage this redistribution.
>>>>
>>>>   For DMDA we don't provide tools for doing this, nor do we expect to.
>>>> For this type of need for dynamic migration we recommend using DMPlex or
>>>> some external mesh management system.
>>>>
>>>>   Barry
>>>>
>>>> >
>>>> > Thanks.
>>>> > Xiangdong
>>>> >
>>>> > On Thu, Jan 28, 2016 at 12:20 AM, Jed Brown <jed at jedbrown.org> wrote:
>>>> > Xiangdong <epscodes at gmail.com> writes:
>>>> >
>>>> > > I have a question on dynamic load balance in petsc. I started
>>>> running a
>>>> > > simulation with one partition. As the simulation goes on, that
>>>> partition
>>>> > > may lead to load imbalance since it is a non-steady problem. If it
>>>> is worth
>>>> > > to perform the load balance, is there an easy way to re-partition
>>>> the mesh
>>>> > > and continue the simulation?
>>>> >
>>>> > Are you using a PETSc DM?  What "mesh"?  If you own it, then
>>>> > repartitioning it is entirely your business.
>>>> >
>>>> > In general, after adapting the mesh, you rebuild all algebraic data
>>>> > structures.  Solvers can be reset (SNESReset, etc.).
>>>> >
>>>>
>>>>
>>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160128/bfa8940a/attachment.html>

From epscodes at gmail.com  Thu Jan 28 15:02:45 2016
From: epscodes at gmail.com (Xiangdong)
Date: Thu, 28 Jan 2016 16:02:45 -0500
Subject: [petsc-users] repartition for dynamic load balancing
In-Reply-To: <CAMYG4Gm08h2v6jPUMrQaH16m=6y2yFzmbXprwRy7SCU=KmLbxg@mail.gmail.com>
References: <CAAPpcpm0U9Hkc95SCbujsL5TQL+MLUh9zFXJj=mPmfHDFQKLLg@mail.gmail.com>
	<87lh7ap9ey.fsf@jedbrown.org>
	<CAAPpcpmjV7kiG4aYWhUHEy89q6OT00TUatiNxgxYQQf=U2pq_Q@mail.gmail.com>
	<3BE17FF7-4D30-48DF-B1AE-979C22E145DF@mcs.anl.gov>
	<CAAPpcpn+_ws-V1UK1ozP+xr4otEns2+L-NQXscf2xe10kjRgLw@mail.gmail.com>
	<CAMYG4GnXYtF6hz8uR=m2vaQ0bvT4MFEOXHjr01YvZd1HWiibZA@mail.gmail.com>
	<CAJ98EDpjr2JYBb-KRWQ1jJYCh-dgQ2D0_YyKo5NWAEV6RctLXQ@mail.gmail.com>
	<CAMYG4Gm08h2v6jPUMrQaH16m=6y2yFzmbXprwRy7SCU=KmLbxg@mail.gmail.com>
Message-ID: <CAAPpcpmZQLPBqZNGcw2Tqo9-V3Jih9=35tAH5VgxiwpEwM1bCQ@mail.gmail.com>

I am thinking to use parmetis to repartition the mesh (based on new updated
weights for vertices), and use some functions (maybe DMPlexMigrate) to
redistribute the data. I will look into Matt's paper to see whether it is
possible.

Thanks.
Xiangdong


On Thu, Jan 28, 2016 at 2:41 PM, Matthew Knepley <knepley at gmail.com> wrote:

> On Thu, Jan 28, 2016 at 1:37 PM, Dave May <dave.mayhem23 at gmail.com> wrote:
>
>>
>>
>> On Thursday, 28 January 2016, Matthew Knepley <knepley at gmail.com> wrote:
>>
>>> On Thu, Jan 28, 2016 at 11:36 AM, Xiangdong <epscodes at gmail.com> wrote:
>>>
>>>> What functions/tools can I use for dynamic migration in DMPlex
>>>> framework?
>>>>
>>>
>>> In this paper, http://arxiv.org/abs/1506.06194, we explain how to use
>>> the DMPlexMigrate() function to redistribute data.
>>> In the future, its likely we will add a function that wraps it up with
>>> determination of the new partition at the same time.
>>>
>>>
>>>> Can you also name some external mesh management systems? Thanks.
>>>>
>>>
>>> I will note that if load balance in the solve is your only concern,
>>> PCTelescope can redistribute the DMDA solve.
>>>
>>
>> Currently Telescope will only repartition 2d and 3d DMDA's. It
>> does perform data migration and allows users to specify the number of ranks
>> to be used in each I,j,k direction via -xxx_grid_x etc. I wouldn't say it
>> supports "load balancing", as there is no mechanism to define number of
>> points in each sub-domain
>>
>
> Let me be more precise. All I have suggested for any of this are
> redistribution tools. You will have to determine
> the right weights for "load balance", which I think is always true. Using
> the default weights is crazy.
>
>   Matt
>
>
>> Cheers
>>   Dave
>>
>>
>>
>>>
>>>   Thanks,
>>>
>>>     Matt
>>>
>>>
>>>>
>>>> Xiangdong
>>>>
>>>> On Thu, Jan 28, 2016 at 12:21 PM, Barry Smith <bsmith at mcs.anl.gov>
>>>> wrote:
>>>>
>>>>>
>>>>> > On Jan 28, 2016, at 11:11 AM, Xiangdong <epscodes at gmail.com> wrote:
>>>>> >
>>>>> > Yes, it can be either DMDA or DMPlex. For example, I have 1D DMDA
>>>>> with Nx=10 and np=2. At the beginning each processor owns 5 cells. After
>>>>> some simulation time, I found that repartition the 10 cells into 3 and 7 is
>>>>> better for load balancing. Is there an easy/efficient way to migrate data
>>>>> from one partition to another partition? I am wondering whether there are
>>>>> some functions or libraries help me manage this redistribution.
>>>>>
>>>>>   For DMDA we don't provide tools for doing this, nor do we expect to.
>>>>> For this type of need for dynamic migration we recommend using DMPlex or
>>>>> some external mesh management system.
>>>>>
>>>>>   Barry
>>>>>
>>>>> >
>>>>> > Thanks.
>>>>> > Xiangdong
>>>>> >
>>>>> > On Thu, Jan 28, 2016 at 12:20 AM, Jed Brown <jed at jedbrown.org>
>>>>> wrote:
>>>>> > Xiangdong <epscodes at gmail.com> writes:
>>>>> >
>>>>> > > I have a question on dynamic load balance in petsc. I started
>>>>> running a
>>>>> > > simulation with one partition. As the simulation goes on, that
>>>>> partition
>>>>> > > may lead to load imbalance since it is a non-steady problem. If it
>>>>> is worth
>>>>> > > to perform the load balance, is there an easy way to re-partition
>>>>> the mesh
>>>>> > > and continue the simulation?
>>>>> >
>>>>> > Are you using a PETSc DM?  What "mesh"?  If you own it, then
>>>>> > repartitioning it is entirely your business.
>>>>> >
>>>>> > In general, after adapting the mesh, you rebuild all algebraic data
>>>>> > structures.  Solvers can be reset (SNESReset, etc.).
>>>>> >
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160128/0fd30bf9/attachment-0001.html>

From amneetb at live.unc.edu  Thu Jan 28 16:53:13 2016
From: amneetb at live.unc.edu (Bhalla, Amneet Pal S)
Date: Thu, 28 Jan 2016 22:53:13 +0000
Subject: [petsc-users] MatCreateSeqDense
Message-ID: <629BA9C4-5294-42B3-AA82-F404CBE027F1@ad.unc.edu>

Hi Folks,

Is there a way to get back the user allocated raw data pointer (column-major order) used in creating MatCreateSeqDense() from the Mat object?

Thanks,
--Amneet

From knepley at gmail.com  Thu Jan 28 17:06:34 2016
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 28 Jan 2016 17:06:34 -0600
Subject: [petsc-users] MatCreateSeqDense
In-Reply-To: <629BA9C4-5294-42B3-AA82-F404CBE027F1@ad.unc.edu>
References: <629BA9C4-5294-42B3-AA82-F404CBE027F1@ad.unc.edu>
Message-ID: <CAMYG4Gmvj-apPZ7Ta8LpxYbGPQdG-vkpzhoDp_=A0zwYVgOuEw@mail.gmail.com>

On Thu, Jan 28, 2016 at 4:53 PM, Bhalla, Amneet Pal S <amneetb at live.unc.edu>
wrote:

> Hi Folks,
>
> Is there a way to get back the user allocated raw data pointer
> (column-major order) used in creating MatCreateSeqDense() from the Mat
> object?
>

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatDenseGetArray.html

   Matt


> Thanks,
> --Amneet


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160128/98da36e4/attachment.html>

From amneetb at live.unc.edu  Thu Jan 28 18:23:17 2016
From: amneetb at live.unc.edu (Bhalla, Amneet Pal S)
Date: Fri, 29 Jan 2016 00:23:17 +0000
Subject: [petsc-users] MatCreateSeqDense
In-Reply-To: <CAMYG4Gmvj-apPZ7Ta8LpxYbGPQdG-vkpzhoDp_=A0zwYVgOuEw@mail.gmail.com>
References: <629BA9C4-5294-42B3-AA82-F404CBE027F1@ad.unc.edu>
	<CAMYG4Gmvj-apPZ7Ta8LpxYbGPQdG-vkpzhoDp_=A0zwYVgOuEw@mail.gmail.com>
Message-ID: <EDD36A8B-3C92-4727-9E9E-EC7821C68FF1@ad.unc.edu>

Thanks!

Another related question: If I do something like this:

double* data;

// do stuff with data
data[i] = ...

Mat A;
MatCreateSeqDense(...,data,..., &A);

// do more stuff with data
data[i] = ..

Now would the matrix A reflect the change (i.e updated A[i][j]) without making an explicit call to PetscObjectStateIncrease((PetscObject)A)?


On Jan 28, 2016, at 3:06 PM, Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>> wrote:

On Thu, Jan 28, 2016 at 4:53 PM, Bhalla, Amneet Pal S <amneetb at live.unc.edu<mailto:amneetb at live.unc.edu>> wrote:
Hi Folks,

Is there a way to get back the user allocated raw data pointer (column-major order) used in creating MatCreateSeqDense() from the Mat object?

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatDenseGetArray.html

   Matt

Thanks,
--Amneet


--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160129/a5ef887a/attachment.html>

From bsmith at mcs.anl.gov  Thu Jan 28 18:26:45 2016
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Thu, 28 Jan 2016 18:26:45 -0600
Subject: [petsc-users] MatCreateSeqDense
In-Reply-To: <EDD36A8B-3C92-4727-9E9E-EC7821C68FF1@ad.unc.edu>
References: <629BA9C4-5294-42B3-AA82-F404CBE027F1@ad.unc.edu>
	<CAMYG4Gmvj-apPZ7Ta8LpxYbGPQdG-vkpzhoDp_=A0zwYVgOuEw@mail.gmail.com>
	<EDD36A8B-3C92-4727-9E9E-EC7821C68FF1@ad.unc.edu>
Message-ID: <BA04CEE5-C3AA-4194-9936-617CA2A3F4B5@mcs.anl.gov>


> On Jan 28, 2016, at 6:23 PM, Bhalla, Amneet Pal S <amneetb at live.unc.edu> wrote:
> 
> Thanks!
> 
> Another related question: If I do something like this:
> 
> double* data;
> 
> // do stuff with data
> data[i] = ...
> 
> Mat A;
> MatCreateSeqDense(...,data,..., &A);
> 
> // do more stuff with data
> data[i] = ..
> 
> Now would the matrix A reflect the change (i.e updated A[i][j]) without making an explicit call to PetscObjectStateIncrease((PetscObject)A)?

   The values in the matrix will be different but the matrix object will not know you have changed anything and hence strange stuff might happen.

   We highly recommend that after you have created the matrix you do not use the data[] array anymore. Instead call MatDenseGetArray() change stuff MatDenseRestoreArray() this automatically increases the matrix state and is cleaner code anyways (calling MatDenseGetArray() is super fast because it only gets the pointer you already provided).

   Barry

> 
> 
>> On Jan 28, 2016, at 3:06 PM, Matthew Knepley <knepley at gmail.com> wrote:
>> 
>> On Thu, Jan 28, 2016 at 4:53 PM, Bhalla, Amneet Pal S <amneetb at live.unc.edu> wrote:
>> Hi Folks,
>> 
>> Is there a way to get back the user allocated raw data pointer (column-major order) used in creating MatCreateSeqDense() from the Mat object?
>> 
>> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatDenseGetArray.html
>> 
>>    Matt
>>  
>> Thanks,
>> --Amneet
>> 
>> 
>> 
>> -- 
>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>> -- Norbert Wiener
> 


From bisheshkh at gmail.com  Fri Jan 29 11:22:12 2016
From: bisheshkh at gmail.com (Bishesh Khanal)
Date: Fri, 29 Jan 2016 18:22:12 +0100
Subject: [petsc-users] no protocol specified
Message-ID: <CAEhex8iz2EVfeMHvFZgQGE3dNsSjZ_BwKPObxiV3RmPumpYgPg@mail.gmail.com>

Hello,
I installed petsc today in our new cluster environment, everything looked
fine except for several mpi related deprecated function warnings such as
below:

/data/asclepios/user/bkhanal/softwares/petsc/src/sys/objects/pname.c: In
function ?PetscErrorCode PetscObjectName(PetscObject)?:
/data/asclepios/user/bkhanal/softwares/petsc/src/sys/objects/pname.c:128:12:
warning: ?int MPI_Attr_get(MPI_Comm, int, void*, int*)? is deprecated
(declared at /opt/openmpi/gcc/current/include/mpi.h:1227): MPI_Attr_get is
superseded by MPI_Comm_get_attr in MPI-2.0 [-Wdeprecated-declarations]
     ierr =
MPI_Attr_get(obj->comm,Petsc_Counter_keyval,(void*)&counter,&flg);CHKERRQ(ierr);


When I ran make test after installation, I got the following results:

Running test examples to verify correct installation
Using PETSC_DIR=/data/asclepios/user/bkhanal/softwares/petscInstalledDebug
and PETSC_ARCH=
Possible error running C/C++ src/snes/examples/tutorials/ex19 with 1 MPI
process
See http://www.mcs.anl.gov/petsc/documentation/faq.html
No protocol specified
No protocol specified
lid velocity = 0.0016, prandtl # = 1, grashof # = 1
Number of SNES iterations = 2
Possible error running C/C++ src/snes/examples/tutorials/ex19 with 2 MPI
processes
See http://www.mcs.anl.gov/petsc/documentation/faq.html
No protocol specified
No protocol specified
lid velocity = 0.0016, prandtl # = 1, grashof # = 1
Number of SNES iterations = 2
Possible error running Fortran example src/snes/examples/tutorials/ex5f
with 1 MPI process
See http://www.mcs.anl.gov/petsc/documentation/faq.html
No protocol specified
No protocol specified
Number of SNES iterations =     4
Completed test examples
=========================================

I also tested one of my codes with  this new setup. It seems to give me
correct results but the output also displays No protocol specified (twice).

Is this a mere warning or should I worry about it ?

Thanks,
Bishesh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160129/551ecfef/attachment.html>

From knepley at gmail.com  Fri Jan 29 11:45:28 2016
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 29 Jan 2016 11:45:28 -0600
Subject: [petsc-users] no protocol specified
In-Reply-To: <CAEhex8iz2EVfeMHvFZgQGE3dNsSjZ_BwKPObxiV3RmPumpYgPg@mail.gmail.com>
References: <CAEhex8iz2EVfeMHvFZgQGE3dNsSjZ_BwKPObxiV3RmPumpYgPg@mail.gmail.com>
Message-ID: <CAMYG4G=NHUB0URwtqYOwu=aAKiOxqXrH7gArFiEa0B7OmmJ7Yg@mail.gmail.com>

On Fri, Jan 29, 2016 at 11:22 AM, Bishesh Khanal <bisheshkh at gmail.com>
wrote:

> Hello,
> I installed petsc today in our new cluster environment, everything looked
> fine except for several mpi related deprecated function warnings such as
> below:
>
> /data/asclepios/user/bkhanal/softwares/petsc/src/sys/objects/pname.c: In
> function ?PetscErrorCode PetscObjectName(PetscObject)?:
> /data/asclepios/user/bkhanal/softwares/petsc/src/sys/objects/pname.c:128:12:
> warning: ?int MPI_Attr_get(MPI_Comm, int, void*, int*)? is deprecated
> (declared at /opt/openmpi/gcc/current/include/mpi.h:1227): MPI_Attr_get is
> superseded by MPI_Comm_get_attr in MPI-2.0 [-Wdeprecated-declarations]
>      ierr =
> MPI_Attr_get(obj->comm,Petsc_Counter_keyval,(void*)&counter,&flg);CHKERRQ(ierr);
>
>
> When I ran make test after installation, I got the following results:
>
> Running test examples to verify correct installation
> Using PETSC_DIR=/data/asclepios/user/bkhanal/softwares/petscInstalledDebug
> and PETSC_ARCH=
> Possible error running C/C++ src/snes/examples/tutorials/ex19 with 1 MPI
> process
> See http://www.mcs.anl.gov/petsc/documentation/faq.html
> No protocol specified
> No protocol specified
> lid velocity = 0.0016, prandtl # = 1, grashof # = 1
> Number of SNES iterations = 2
> Possible error running C/C++ src/snes/examples/tutorials/ex19 with 2 MPI
> processes
> See http://www.mcs.anl.gov/petsc/documentation/faq.html
> No protocol specified
> No protocol specified
> lid velocity = 0.0016, prandtl # = 1, grashof # = 1
> Number of SNES iterations = 2
> Possible error running Fortran example src/snes/examples/tutorials/ex5f
> with 1 MPI process
> See http://www.mcs.anl.gov/petsc/documentation/faq.html
> No protocol specified
> No protocol specified
> Number of SNES iterations =     4
> Completed test examples
> =========================================
>
> I also tested one of my codes with  this new setup. It seems to give me
> correct results but the output also displays No protocol specified (twice).
>
> Is this a mere warning or should I worry about it ?
>

It looks like it is connected to your MPI configuration on this machine:


https://www-auth.cs.wisc.edu/lists/htcondor-users/2013-March/msg00022.shtml

  Thanks,

     Matt


> Thanks,
> Bishesh
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160129/4ccaf05a/attachment-0001.html>

From jed at jedbrown.org  Fri Jan 29 15:44:34 2016
From: jed at jedbrown.org (Jed Brown)
Date: Fri, 29 Jan 2016 14:44:34 -0700
Subject: [petsc-users] PETSc interface to MueLu
In-Reply-To: <47f4ad28cd4241b61690f8bf2dc8e9e7@dicea.unipd.it>
References: <12e160516897f8a313d892a35bd8a438@dicea.unipd.it>
	<CAGCphBsbTF8_akpnqetXarg7tsu9nih4cKZZ9n8w3VWvzzggnA@mail.gmail.com>
	<47f4ad28cd4241b61690f8bf2dc8e9e7@dicea.unipd.it>
Message-ID: <87oac4cb7h.fsf@jedbrown.org>

victor.magri at dicea.unipd.it writes:

> I suppose that MueLu would give a cleaner implementation of ML's
> features and possibly give a faster code. 

Last I heard, the MueLu implementation was slower.  I don't know if that
has been fixed more recently, but we would be more motivated to write
the interface _after_ they demonstrate some clear benefit in a direct
comparison.

> Also, as it supports the Kokkos library, we would have the possibility
> to run on MPI+threads or MPI+GPU. However, I think that implementing
> an interface for this could be a problem since PETSc works better with
> pure MPI, please correct me if I am wrong about this.

There's a fair chance their code also works better with pure MPI.
(There's no fundamental reason why MPI+threads should be faster than
MPI-only, and some reasons why it could be slower.)  That said, you can
see lots of implementation artifacts on particular machines or for
particular implementations.  Anyway, there is nothing preventing an
interface except the opportunity cost of working on something with such
dubious expected value and zero research value, versus things with
immediate, direct impact.

> Anyway, if it were possible, I just would like to try both multigrid
> implementations through PETSc and see how they behave. 

Patches welcome. (In the short term, and in lieu of some convincing
demonstration.)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 818 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160129/f44369eb/attachment.pgp>

From bikash at umich.edu  Sat Jan 30 18:56:26 2016
From: bikash at umich.edu (Bikash Kanungo)
Date: Sat, 30 Jan 2016 19:56:26 -0500
Subject: [petsc-users] Segmentation fault in MatAssembly
Message-ID: <CA+H+vqL6bpTtto5tZ1dfwH-RCQguCE5fNJvm0-zgXwNmmTtpwg@mail.gmail.com>

Hi,

I'm getting segmentation fault while assembling large matrices (~4000000 X
4000000) across 480 processors. The error usually shows up randomly only in
large problems (i.e, when I exceed matrix size of ~2000000x200000). There
are few rows in my matrix for which non-local contributions are added from
all other processors. So I believe the buffer size during MatSetValues gets
larger with matrix size which in some way signals a segmentation fault. So
is there a smart way of avoiding such error?

Regards,
Bikash

-- 
Bikash S. Kanungo
PhD Student
Computational Materials Physics Group
Mechanical Engineering
University of Michigan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160130/3ab10bfb/attachment.html>

From bsmith at mcs.anl.gov  Sat Jan 30 19:15:06 2016
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Sat, 30 Jan 2016 19:15:06 -0600
Subject: [petsc-users] Segmentation fault in MatAssembly
In-Reply-To: <CA+H+vqL6bpTtto5tZ1dfwH-RCQguCE5fNJvm0-zgXwNmmTtpwg@mail.gmail.com>
References: <CA+H+vqL6bpTtto5tZ1dfwH-RCQguCE5fNJvm0-zgXwNmmTtpwg@mail.gmail.com>
Message-ID: <11F56552-499F-465B-91D0-787BC6857B86@mcs.anl.gov>


  Bikash

   The most likely cause is due to integer overflow. For such large problems you should make a new PETSC_ARCH say arch-large and configure with the additional option --with-64-bit-indices then PetscInt will become a 64 bit integer which will never overflow. Make sure that your code always uses PetscInt for integers passed to PETSc and not int or Fortran integer.  If you do get crashes you can run in the debugger and likely you will find that somewhere you still have an int around instead of a PetscInt


   Barry


> On Jan 30, 2016, at 6:56 PM, Bikash Kanungo <bikash at umich.edu> wrote:
> 
> Hi,
> 
> I'm getting segmentation fault while assembling large matrices (~4000000 X 4000000) across 480 processors. The error usually shows up randomly only in large problems (i.e, when I exceed matrix size of ~2000000x200000). There are few rows in my matrix for which non-local contributions are added from all other processors. So I believe the buffer size during MatSetValues gets larger with matrix size which in some way signals a segmentation fault. So is there a smart way of avoiding such error?
> 
> Regards,
> Bikash  
> 
> -- 
> Bikash S. Kanungo
> PhD Student
> Computational Materials Physics Group
> Mechanical Engineering 
> University of Michigan
>