From berend at chalmers.se  Fri Feb  2 02:42:13 2007
From: berend at chalmers.se (Berend van Wachem)
Date: Fri, 2 Feb 2007 09:42:13 +0100
Subject: Using PETSc in structured c-grid for CFD and multigrid
In-Reply-To: <804ab5d40701311932o43d85cc2q9dbf7d6189a47cdd@mail.gmail.com>
References: <804ab5d40701251902x31cd3d29ye97ce5c0b2924e4d@mail.gmail.com> <Pine.OSX.4.64.0701311145430.309@barry-smiths-computer.local> <804ab5d40701311932o43d85cc2q9dbf7d6189a47cdd@mail.gmail.com>
Message-ID: <200702020942.13518.berend@chalmers.se>

Hi Ben,

It will probably work, but it will be more expensive. If you use an 
implicit algorithm to solve the flow, it really pays off to have the 
boundary conditions implicit as well. Explicit boundary conditions 
mean you will need additional iterations, which is really unnecessary 
in your case. Why not put the dependency directly in the matrix?

Berend.


> Hi,
>
> somone suggested that I treat that face as a dirichlet boundary
> condition. after 1 or a few iterations, the face value will be
> updated and it will be repeated until covergerence. I wonder if that
> is possible as well?
>
> It'll make the job much easier, although the iteration may take
> longer...
>
> On 2/1/07, Barry Smith <bsmith at mcs.anl.gov> wrote:
> > The glueing might be able to be handled by using periodic for that
> > dimension of the DA you create. But this gets tricky if you have
> > any nodes that have "an extra degree of freedom".
> >
> >   Barry
> >
> > On Wed, 31 Jan 2007, Berend van Wachem wrote:
> > > Hi Ben,
> > >
> > > The challenge in your problem is how you "glue" the C grid in
> > > the back; there you will need to do some additional scattering.
> > > I would set-up the IS for this, and then use that to scatter the
> > > values into "ghostcells" which will be present on the block(s).
> > >
> > > Berend.
> > >
> > > > Thank you Berend! I'll go through DA again. I'm also looking
> > > > at HYPRE. Its way of creating grids and linking them seems
> > > > intuitive. Btw, is there a mailing list for HYPRE similar to
> > > > PETSc to discuss problems? I find that their explanation are
> > > > quite brief.
> > > >
> > > > I tried to install HYPRE 2.0 on windows using cygwin but it
> > > > failed. I then install it as an external software thru PETSc.
> > > > I think it's installing HYPRE 1.0 or something. But similarly,
> > > > there's illegal operation.
> > > >
> > > > Installing HYPRE 2.0 on my school's linux worked, though
> > > > there's seems to be some minor error. So what's the best way
> > > > to employ multigird? Is it to install as an external software
> > > > thru PETSc or just use HYPRE on its own?
> > > >
> > > > Btw, it will be great if you can send me parts of your code
> > > > regarding DA.
> > > >
> > > > Thank you very much!
> > > >
> > > > On 1/26/07, Berend van Wachem <berend at tfd.chalmers.se> wrote:
> > > > > Hi,
> > > > >
> > > > > I am not an expert - but have used PETSc for both structured
> > > > > and unstructured grids.
> > > > >
> > > > > When you use an unstructured code for a structured grid,
> > > > > there is additional overhead (addressing, connectivity)
> > > > > which is redundant; this information is not required for
> > > > > solving on a structured grid. I would say this is maximum a
> > > > > 10% efficiency loss for bigger problems - it does not affect
> > > > > solving the matrix, only in gathering your coefficients. I
> > > > > would not rewrite my CFD code for this.
> > > > >
> > > > > If you only deal with structured grids, using the PETSc DA
> > > > > framework should work for you - you are not saving all
> > > > > connectivity. The DA framework is not difficult at all,
> > > > > according to my opinion. Look at a few examples that come
> > > > > with PETSc. I use a block structured solver - using multiple
> > > > > DA's within one problem. Let me know if you are interested
> > > > > in this, and I can send you parts of code.
> > > > >
> > > > > Multigrid is certainly possible (I would reccomend through
> > > > > HYPRE, discussed on the mailinglist, although I still have
> > > > > problems with it), but the question is how efficient it will
> > > > > be for your CFD problem. For an efficient multigrid in CFD,
> > > > > it is important to consider the coefficient structure
> > > > > arising from the momentum equations - the grouping of cells
> > > > > should occur following the advection term. Only then will
> > > > > you achieve linear scaling with the problem size. For
> > > > > instance, consider a rotating flow in a square box. Most
> > > > > multigrid algorithms will group cells in "squares" which
> > > > > will not lead to a significant improvement, as the flow
> > > > > (advection, pressure grad) does not move in these squares.
> > > > > In fact, to have an efficient multigrid algorithm, the cels
> > > > > should be grouped along the circular flow. As this cannot be
> > > > > seen directly from the pressure coefficients, I doubt any
> > > > > "automatic" multigrid algorithm (in Hypre or Petsc) would be
> > > > > able to capture this, but don't quote me on it - I am not
> > > > > 100% sure. So concluding, if you want to do efficient
> > > > > multigridding for CFD, you will need to point out which
> > > > > cells are grouped into which structure, based upon the
> > > > > upwind advection coefficients.
> > > > >
> > > > > Good luck,
> > > > >
> > > > > Berend.
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I was discussing with another user in another forum
> > > > > > (cfd-online.com)
> > > > >
> > > > > about
> > > > >
> > > > > > using PETSc in my cfd code. I am now using KSP to solve my
> > > > > > momentum and poisson eqn by inserting values into the
> > > > > > matrix. I was told that using PETSc
> > > > > > this way is only for unstructured grids. It is very
> > > > > > inefficient and much slower if I'm using it for my
> > > > > > structured grid because I am not
> > > > >
> > > > > exploiting
> > > > >
> > > > > > the regular structure of my grid.
> > > > > >
> > > > > > Is that true? I'm solving flow around airfoil using
> > > > > > c-grid.
> > > > > >
> > > > > > So how can I improve? Is it by using DA? I took a glance
> > > > > > and it seems quite
> > > > > > complicated.
> > > > > >
> > > > > > Also, is multigrid available in PETSc? Chapter 7 discusses
> > > > > > about it but
> > > > >
> > > > > it
> > > > >
> > > > > > seems very brief. Is there a more elaborate tutorial
> > > > > > besides that c examples?
> > > > > >
> > > > > > Hope someone can give me some ideas.
> > > > > >
> > > > > > Thank you.



From jinzishuai at yahoo.com  Fri Feb  2 15:22:21 2007
From: jinzishuai at yahoo.com (Shi Jin)
Date: Fri, 2 Feb 2007 13:22:21 -0800 (PST)
Subject: PETSc runs slower on a shared memory machine than on a cluster
Message-ID: <419301.13342.qm@web36213.mail.mud.yahoo.com>

Hi there,

I am fairly new to PETSc but have 5 years of MPI
programming already. I recently took on a project of
analyzing a finite element code written in C with
PETSc.
I found out that on a shared-memory machine (60GB RAM,
16    CPUS), the code runs around 4 times slower than
on a distributed memory cluster (4GB Ram, 4CPU/node),
although they yield identical results.
There are 1.6Million finite elements in the problem so
it is a fairly large calculation. The total memory
used is 3GBx16=48GB.

Both the two systems run Linux as OS and the same code
is compiled against the same version of MPICH-2 and
PETSc.
 
The shared-memory machine is actually a little faster
than the cluster machines in terms of single process
runs.

I am surprised at this result since we usually tend to
think that shared-memory would be much faster since
the in-memory operation is much faster that the
network communication.

However, I read the PETSc FAQ and found that "the
speed of sparse matrix computations is almost totally
determined by the speed of the memory, not the speed
of the CPU". 
This makes me wonder whether the poor performance of
my code on a shared-memory machine is due to the
competition of different process on the same memory
bus. Since the code is still MPI based, a lot of data
are moving around inside the memory. Is this a
reasonable explanation of what I observed?

Thank you very much.

Shi


 
____________________________________________________________________________________
Do you Yahoo!?
Everyone is raving about the all-new Yahoo! Mail beta.
http://new.mail.yahoo.com



From bsmith at mcs.anl.gov  Fri Feb  2 15:38:39 2007
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Fri, 2 Feb 2007 15:38:39 -0600 (CST)
Subject: Non-uniform 2D mesh questions
In-Reply-To: <20070130230718.29179.qmail@s402.sureserver.com>
References: <20070130230718.29179.qmail@s402.sureserver.com>
Message-ID: <Pine.OSX.4.64.0702021524220.20722@barry-smiths-computer.local>


  Yaron,

   Anything is possible :-) and maybe not terribly difficult to get started.

You could use DAGetMatrx() to give you the properly pre-allocated "huge" Mat.

Have each process loop over the "rectangular portion[s] of the domain" that
it mostly owns (that is if a rectangular portion lies across two processes just
assign it to one of them for this loop.)

Then loop over the locations inside the rectangular portion calling
MatSetValuesStencil() for that row of the huge matrix to put the entries from 
the smaller matrix INTO the huge matrix using the natural grid i,j coordindates 
(so not have to map the coordinates from the grid location to the location in 
the matrix).

This may require some thought to get right but should require little coding
(if you are writting hundreds and hundreds of lines of code then likely 
something is wrong).

  Good luck,

   Barry


On Tue, 30 Jan 2007, yaron at oak-research.com wrote:

> Barry-
> So far I only thought of having a single large sparse matrix.
> 
> Yaron
> 
> 
> 	-------Original Message-------
>   From: Barry Smith
>   Subject: Re: Non-uniform 2D mesh questions
>   Sent: 30 Jan '07 10:58
> 
> 
>    Yaron,
> 
>    Do you want to end up generating a single large sparse matrix? Like a
> MPIAIJ
>   matrix? Or do you want to somehow not store the entire huge matrix but
> still
>   be able to solve with the composed matrix? Or both?
> 
>    Barry
> 
> 
>   On Mon, 29 Jan 2007, [LINK:
> http://webmail.oak-research.com/compose.php?to=yaron at oak-research.com]
> yaron at oak-research.com wrote:
> 
>   > Barry-
>   > Yes, each block is a rectangular portion of the domain. Not so small
>   > though (more like 100 x 100 nodes)
>   >
>   > Yaron
>   >
>   >
>   > 	 -------Original Message-------
>   > From: Barry Smith
>   > Subject: Re: Non-uniform 2D mesh questions
>   > Sent: 29 Jan '07 19:40
>   >
>   >
>   > Yaron,
>   >
>   > Is each one of these "blocks" a small rectangular part of the
>   > domain (like a 4 by 5 set of nodes)? I don't understand what you
>   > want to do.
>   >
>   > Barry
>   >
>   >
>   > On Mon, 29 Jan 2007, [LINK:
>   > [LINK:
> http://webmail.oak-research.com/compose.php?to=yaron at oak-research.com]
> http://webmail.oak-research.com/compose.php?to=yaron at oak-research.com]
>   > [LINK:
> http://webmail.oak-research.com/compose.php?to=yaron at oak-research.com]
> yaron at oak-research.com wrote:
>   >
>   > > Hi all
>   > > I have a laplace-type problem that's physically built from repeating
>   > > instances of the same block.
>   > > I'm creaing matrices for the individual blocks, and I'd like to
> reuse
>   > > the individual block matrices in order to compose the complete
>   > problem.
>   > > (i.e if there 10K instances of 20 blocks, I'd like to build 20
>   > matrices,
>   > > then use them to compose the large complete matrix)
>   > > Is a 2D DA the right object to do that? And if so, where can I find
> a
>   > > small example of building the DA object in parallel, then using the
>   > > different (for every instance) mappings of local nodes to global
> nodes
>   > in
>   > > order to build the complete matrix?
>   > >
>   > >
>   > > Thanks
>   > > Yaron
>   > >
>   >
> 



From dalcinl at gmail.com  Fri Feb  2 15:47:56 2007
From: dalcinl at gmail.com (Lisandro Dalcin)
Date: Fri, 2 Feb 2007 18:47:56 -0300
Subject: PETSc runs slower on a shared memory machine than on a cluster
In-Reply-To: <419301.13342.qm@web36213.mail.mud.yahoo.com>
References: <419301.13342.qm@web36213.mail.mud.yahoo.com>
Message-ID: <e7ba66e40702021347t6d44145k1e5501d3953ce2bd@mail.gmail.com>

On 2/2/07, Shi Jin <jinzishuai at yahoo.com> wrote:
> I found out that on a shared-memory machine (60GB RAM,
> 16    CPUS), the code runs around 4 times slower than
> on a distributed memory cluster (4GB Ram, 4CPU/node),
> although they yield identical results.

> However, I read the PETSc FAQ and found that "the
> speed of sparse matrix computations is almost totally
> determined by the speed of the memory, not the speed
> of the CPU".

> This makes me wonder whether the poor performance of
> my code on a shared-memory machine is due to the
> competition of different process on the same memory
> bus. Since the code is still MPI based, a lot of data
> are moving around inside the memory. Is this a
> reasonable explanation of what I observed?

There is a point which is not clear for me.

When you run in your shared-memory machine...

- Are you running your as a 'sequential' program with a global,shared
memory space?

- Or are you running it through MPI, as a distributed memory
application using MPI message passing (where shared mem is the
underlying communication 'channel') ?


-- 
Lisandro Dalc?n
---------------
Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
PTLC - G?emes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



From balay at mcs.anl.gov  Fri Feb  2 15:55:02 2007
From: balay at mcs.anl.gov (Satish Balay)
Date: Fri, 2 Feb 2007 15:55:02 -0600 (CST)
Subject: PETSc runs slower on a shared memory machine than on a cluster
In-Reply-To: <419301.13342.qm@web36213.mail.mud.yahoo.com>
References: <419301.13342.qm@web36213.mail.mud.yahoo.com>
Message-ID: <Pine.LNX.4.64.0702021545170.26579@asterix>

There are 2 aspects to performance.

- MPI performance [while message passing]
- sequential performance for the numerical stuff.

So it could be that the SMP box has better MPI performance. This can
be verified with -log_summary from both the runs [and looking at
VecScatter times]

However with the sequential numerical codes - it primarily depends
upon the bandwidth between the CPU and the memory. On the SMP box -
depending upon how the memory subsystem is designed - the effective
memory bandwidth per cpu could be a small fraction of the peak memory
bandwidth [when all cpus are used]

So you'll have to look at the memory subsystem design of each of these
machines and compare the 'memory bandwidth per cpu]. The performance
from log_summary - for ex: in MatMult will reflect this. [ including
the above communication overhead]

Satish

On Fri, 2 Feb 2007, Shi Jin wrote:

> Hi there,
> 
> I am fairly new to PETSc but have 5 years of MPI
> programming already. I recently took on a project of
> analyzing a finite element code written in C with
> PETSc.
> I found out that on a shared-memory machine (60GB RAM,
> 16    CPUS), the code runs around 4 times slower than
> on a distributed memory cluster (4GB Ram, 4CPU/node),
> although they yield identical results.
> There are 1.6Million finite elements in the problem so
> it is a fairly large calculation. The total memory
> used is 3GBx16=48GB.
> 
> Both the two systems run Linux as OS and the same code
> is compiled against the same version of MPICH-2 and
> PETSc.
>  
> The shared-memory machine is actually a little faster
> than the cluster machines in terms of single process
> runs.
> 
> I am surprised at this result since we usually tend to
> think that shared-memory would be much faster since
> the in-memory operation is much faster that the
> network communication.
> 
> However, I read the PETSc FAQ and found that "the
> speed of sparse matrix computations is almost totally
> determined by the speed of the memory, not the speed
> of the CPU". 
> This makes me wonder whether the poor performance of
> my code on a shared-memory machine is due to the
> competition of different process on the same memory
> bus. Since the code is still MPI based, a lot of data
> are moving around inside the memory. Is this a
> reasonable explanation of what I observed?
> 
> Thank you very much.
> 
> Shi
> 
> 
>  
> ____________________________________________________________________________________
> Do you Yahoo!?
> Everyone is raving about the all-new Yahoo! Mail beta.
> http://new.mail.yahoo.com
> 
> 



From balay at mcs.anl.gov  Fri Feb  2 16:01:49 2007
From: balay at mcs.anl.gov (Satish Balay)
Date: Fri, 2 Feb 2007 16:01:49 -0600 (CST)
Subject: PETSc runs slower on a shared memory machine than on a cluster
In-Reply-To: <Pine.LNX.4.64.0702021545170.26579@asterix>
References: <419301.13342.qm@web36213.mail.mud.yahoo.com>
 <Pine.LNX.4.64.0702021545170.26579@asterix>
Message-ID: <Pine.LNX.4.64.0702021557440.26579@asterix>

On Fri, 2 Feb 2007, Satish Balay wrote:

> However with the sequential numerical codes - it primarily depends
> upon the bandwidth between the CPU and the memory. On the SMP box -
> depending upon how the memory subsystem is designed - the effective
> memory bandwidth per cpu could be a small fraction of the peak memory
> bandwidth [when all cpus are used]

> > The shared-memory machine is actually a little faster
> > than the cluster machines in terms of single process
> > runs.

To understand this better - think of comparing the performance in the
following 2 cases:

- run the sequential code when no other job is on the machine.
- run the sequential code when there is another [memory intensive] job
  using the other 15 nodes]

In a distributed cluster the performance numbers for both cases will
be same. For a SMP machine - the performance of the first run will be
much better than the second one [because of the sharing of memory
bandwidth with competing processors]

Satish



From jinzishuai at yahoo.com  Fri Feb  2 17:02:38 2007
From: jinzishuai at yahoo.com (Shi Jin)
Date: Fri, 2 Feb 2007 15:02:38 -0800 (PST)
Subject: PETSc runs slower on a shared memory machine than on a cluster
In-Reply-To: <e7ba66e40702021347t6d44145k1e5501d3953ce2bd@mail.gmail.com>
Message-ID: <20070202230238.92601.qmail@web36206.mail.mud.yahoo.com>

> There is a point which is not clear for me.
> 
> When you run in your shared-memory machine...
> 
> - Are you running your as a 'sequential' program
> with a global,shared
> memory space?
> 
> - Or are you running it through MPI, as a
> distributed memory
> application using MPI message passing (where shared
> mem is the
> underlying communication 'channel') ?

Thank you for replying.
I run the code on a shared memory machine through MPI,
just like what I do on a cluster. I simply did:
petscmpirun -np 18 ./code 

I am not 100% sure whether MPICH-2 will automatically
use shared memory as the underlying commnunication
channel instead of the network but I know most MPI
implementations are smart enough to do so (like
LAM-MPI I used before). Could anyone confirm this?
Thank you.

Shi



 
____________________________________________________________________________________
Sucker-punch spam with award-winning protection. 
Try the free Yahoo! Mail Beta.
http://advision.webevents.yahoo.com/mailbeta/features_spam.html



From dalcinl at gmail.com  Fri Feb  2 17:16:57 2007
From: dalcinl at gmail.com (Lisandro Dalcin)
Date: Fri, 2 Feb 2007 20:16:57 -0300
Subject: PETSc runs slower on a shared memory machine than on a cluster
In-Reply-To: <20070202230238.92601.qmail@web36206.mail.mud.yahoo.com>
References: <e7ba66e40702021347t6d44145k1e5501d3953ce2bd@mail.gmail.com>
	 <20070202230238.92601.qmail@web36206.mail.mud.yahoo.com>
Message-ID: <e7ba66e40702021516u5976390dk140c0769b23f73af@mail.gmail.com>

On 2/2/07, Shi Jin <jinzishuai at yahoo.com> wrote:
> Thank you for replying.
> I run the code on a shared memory machine through MPI,
> just like what I do on a cluster. I simply did:
> petscmpirun -np 18 ./code
>
> I am not 100% sure whether MPICH-2 will automatically
> use shared memory as the underlying commnunication
> channel instead of the network but I know most MPI
> implementations are smart enough to do so (like
> LAM-MPI I used before). Could anyone confirm this?

Please read the following...

http://www-unix.mcs.anl.gov/mpi/mpich/downloads/mpich2-doc-README.txt

I think for shared-memory you should try to configure MPICH2 with the following:

--with-device=ch3:shm

If not, perhaps configure will default to --with-device=ch3:sock and
MPICH2 will use TCP sockets.

I hope this help you.

Regards,

-- 
Lisandro Dalc?n
---------------
Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
PTLC - G?emes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



From knepley at gmail.com  Fri Feb  2 18:20:47 2007
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 2 Feb 2007 18:20:47 -0600
Subject: PETSc runs slower on a shared memory machine than on a cluster
In-Reply-To: <20070202230238.92601.qmail@web36206.mail.mud.yahoo.com>
References: <e7ba66e40702021347t6d44145k1e5501d3953ce2bd@mail.gmail.com>
	 <20070202230238.92601.qmail@web36206.mail.mud.yahoo.com>
Message-ID: <a9f269830702021620k18317884pd3eb8df436428ae3@mail.gmail.com>

On 2/2/07, Shi Jin <jinzishuai at yahoo.com> wrote:
>
> > There is a point which is not clear for me.
> >
> > When you run in your shared-memory machine...
> >
> > - Are you running your as a 'sequential' program
> > with a global,shared
> > memory space?
> >
> > - Or are you running it through MPI, as a
> > distributed memory
> > application using MPI message passing (where shared
> > mem is the
> > underlying communication 'channel') ?
>
> Thank you for replying.
> I run the code on a shared memory machine through MPI,
> just like what I do on a cluster. I simply did:
> petscmpirun -np 18 ./code
>
> I am not 100% sure whether MPICH-2 will automatically
> use shared memory as the underlying commnunication
> channel instead of the network but I know most MPI
> implementations are smart enough to do so (like
> LAM-MPI I used before). Could anyone confirm this?
> Thank you.


This is missing the point I think. It is just as Satish pointed out.
Sparse matrix multiply is completely dominated by memory bandwidth
and the shared memory machine has contention between the processes.
I guarantee you that the performance problem is in the effective memory
bandwidth per process.

   Matt

Shi
>
>
>
>
>
> ____________________________________________________________________________________
> Sucker-punch spam with award-winning protection.
> Try the free Yahoo! Mail Beta.
> http://advision.webevents.yahoo.com/mailbeta/features_spam.html
>
>


-- 
One trouble is that despite this system, anyone who reads journals widely
and critically is forced to realize that there are scarcely any bars to
eventual
publication. There seems to be no study too fragmented, no hypothesis too
trivial, no literature citation too biased or too egotistical, no design too
warped, no methodology too bungled, no presentation of results too
inaccurate, too obscure, and too contradictory, no analysis too
self-serving,
no argument too circular, no conclusions too trifling or too unjustified,
and
no grammar and syntax too offensive for a paper to end up in print. --
Drummond Rennie
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070202/f305e7e8/attachment.htm>

From jinzishuai at yahoo.com  Sat Feb  3 15:46:29 2007
From: jinzishuai at yahoo.com (Shi Jin)
Date: Sat, 3 Feb 2007 13:46:29 -0800 (PST)
Subject: PETSc runs slower on a shared memory machine than on a cluster
In-Reply-To: <Pine.LNX.4.64.0702021545170.26579@asterix>
Message-ID: <917118.25233.qm@web36202.mail.mud.yahoo.com>

Thank you.
I did the same runs again with -log_summary. Here is
the part that I think is most important.
On cluster:
--- Event Stage 5: Projection
Event                Count      Time (sec)    
Flops/sec                         --- Global ---  ---
Stage ---   Total
                   Max Ratio  Max     Ratio   Max 
Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M
%L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
[x]rhsLu              99 1.0 2.3875e+02 1.0 0.00e+00
0.0 0.0e+00 0.0e+00 9.9e+01  7  0  0  0  0  14  0  0 
0  0     0
VecMDot           133334 1.0 4.1386e+02 1.6 3.43e+08
1.6 0.0e+00 0.0e+00 1.3e+05 10 18  0  0 45  21 27  0 
0 49   883
VecNorm           137829 1.0 6.9839e+01 1.5 1.27e+08
1.5 0.0e+00 0.0e+00 1.4e+05  2  1  0  0 46   4  2  0 
0 51   350
VecScale          137928 1.0 5.5639e+00 1.1 5.79e+08
1.1 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0 
0  0  2197
VecCopy             4495 1.0 8.4510e-01 1.1 0.00e+00
0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0 
0  0     0
VecSet            142522 1.0 1.7712e+01 1.5 0.00e+00
0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  0  0 
0  0     0
VecAXPY             8990 1.0 9.9013e-01 1.1 4.34e+08
1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0 
0  0  1610
VecMAXPY          137829 1.0 2.1687e+02 1.1 4.92e+08
1.1 0.0e+00 0.0e+00 0.0e+00  6 20  0  0  0  12 29  0 
0  0  1793
VecScatterBegin   137829 1.0 2.1816e+01 1.9 0.00e+00
0.0 8.3e+05 3.4e+04 0.0e+00  0  0 91 74  0   1 
0100100  0     0
VecScatterEnd     137730 1.0 3.0302e+01 1.6 0.00e+00
0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0 
0  0     0
VecNormalize      137829 1.0 7.6565e+01 1.4 1.68e+08
1.4 0.0e+00 0.0e+00 1.4e+05  2  2  0  0 46   4  3  0 
0 51   479
MatMult           137730 1.0 3.5652e+02 1.3 2.58e+08
1.2 8.3e+05 3.4e+04 0.0e+00  9 15 91 74  0  19
21100100  0   815
MatSolve          137829 1.0 5.0916e+02 1.2 1.56e+08
1.2 0.0e+00 0.0e+00 0.0e+00 13 14  0  0  0  28 20  0 
0  0   531
MatGetRow        44110737 1.0 1.1846e+02 1.0 0.00e+00
0.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   7  0  0 
0  0     0
KSPGMRESOrthog    133334 1.0 6.0430e+02 1.3 3.87e+08
1.3 0.0e+00 0.0e+00 1.3e+05 15 37  0  0 45  32 54  0 
0 49  1209
KSPSolve              99 1.0 1.4336e+03 1.0 2.37e+08
1.0 8.3e+05 3.4e+04 2.7e+05 40 68 91 74 91 
86100100100100   944
PCSetUpOnBlocks       99 1.0 3.2687e-04 1.2 0.00e+00
0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0 
0  0     0
PCApply           137829 1.0 5.3316e+02 1.2 1.50e+08
1.2 0.0e+00 0.0e+00 0.0e+00 14 14  0  0  0  30 20  0 
0  0   507
---------------------------------------------------
On the shared memory machine:
--- Event Stage 5: Projection
Event                Count      Time (sec)    
Flops/sec                         --- Global ---  ---
Stage ---   Total
                   Max Ratio  Max     Ratio   Max 
Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M
%L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
[x]rhsLu              99 1.0 2.0673e+02 1.0 0.00e+00
0.0 0.0e+00 0.0e+00 9.9e+01  5  0  0  0  0   9  0  0 
0  0     0
VecMDot           133334 1.0 7.0932e+02 2.1 2.70e+08
2.1 0.0e+00 0.0e+00 1.3e+05 11 18  0  0 45  22 27  0 
0 49   515
VecNorm           137829 1.0 1.2860e+02 7.0 3.32e+08
7.0 0.0e+00 0.0e+00 1.4e+05  2  1  0  0 46   3  2  0 
0 51   190
VecScale          137928 1.0 5.0018e+00 1.0 6.36e+08
1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0 
0  0  2444
VecCopy             4495 1.0 1.4161e+00 1.8 0.00e+00
0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0 
0  0     0
VecSet            142522 1.0 1.9602e+01 2.1 0.00e+00
0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  0  0 
0  0     0
VecAXPY             8990 1.0 1.5128e+00 1.4 3.67e+08
1.4 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0 
0  0  1054
VecMAXPY          137829 1.0 3.5204e+02 1.4 3.82e+08
1.4 0.0e+00 0.0e+00 0.0e+00  7 20  0  0  0  13 29  0 
0  0  1105
VecScatterBegin   137829 1.0 1.4310e+01 2.2 0.00e+00
0.0 8.3e+05 3.4e+04 0.0e+00  0  0 91 74  0   0 
0100100  0     0
VecScatterEnd     137730 1.0 1.5035e+02 6.5 0.00e+00
0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   3  0  0 
0  0     0
VecNormalize      137829 1.0 1.3453e+02 5.6 3.80e+08
5.6 0.0e+00 0.0e+00 1.4e+05  2  2  0  0 46   3  3  0 
0 51   272
MatMult           137730 1.0 5.4179e+02 1.5 1.99e+08
1.4 8.3e+05 3.4e+04 0.0e+00 11 15 91 74  0  21
21100100  0   536
MatSolve          137829 1.0 7.9682e+02 1.4 1.18e+08
1.4 0.0e+00 0.0e+00 0.0e+00 16 14  0  0  0  30 20  0 
0  0   339
MatGetRow        44110737 1.0 1.0296e+02 1.0 0.00e+00
0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   5  0  0 
0  0     0
KSPGMRESOrthog    133334 1.0 9.4927e+02 1.4 2.75e+08
1.4 0.0e+00 0.0e+00 1.3e+05 18 37  0  0 45  34 54  0 
0 49   770
KSPSolve              99 1.0 2.0562e+03 1.0 1.65e+08
1.0 8.3e+05 3.4e+04 2.7e+05 47 68 91 74 91 
91100100100100   658
PCSetUpOnBlocks       99 1.0 3.3998e-04 1.5 0.00e+00
0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0 
0  0     0
PCApply           137829 1.0 8.2326e+02 1.4 1.14e+08
1.4 0.0e+00 0.0e+00 0.0e+00 16 14  0  0  0  31 20  0 
0  0   328

I do see that the cluster run is faster than the
shared-memory case. However, I am not sure how I can
tell the reason for this behavior is due to the memory
subsystem. I don't know what evidence in the log to
look for. 
Thanks again.

Shi
--- Satish Balay <balay at mcs.anl.gov> wrote:

> There are 2 aspects to performance.
> 
> - MPI performance [while message passing]
> - sequential performance for the numerical stuff.
> 
> So it could be that the SMP box has better MPI
> performance. This can
> be verified with -log_summary from both the runs
> [and looking at
> VecScatter times]
> 
> However with the sequential numerical codes - it
> primarily depends
> upon the bandwidth between the CPU and the memory.
> On the SMP box -
> depending upon how the memory subsystem is designed
> - the effective
> memory bandwidth per cpu could be a small fraction
> of the peak memory
> bandwidth [when all cpus are used]
> 
> So you'll have to look at the memory subsystem
> design of each of these
> machines and compare the 'memory bandwidth per cpu].
> The performance
> from log_summary - for ex: in MatMult will reflect
> this. [ including
> the above communication overhead]
> 
> Satish
> 
> On Fri, 2 Feb 2007, Shi Jin wrote:
> 
> > Hi there,
> > 
> > I am fairly new to PETSc but have 5 years of MPI
> > programming already. I recently took on a project
> of
> > analyzing a finite element code written in C with
> > PETSc.
> > I found out that on a shared-memory machine (60GB
> RAM,
> > 16    CPUS), the code runs around 4 times slower
> than
> > on a distributed memory cluster (4GB Ram,
> 4CPU/node),
> > although they yield identical results.
> > There are 1.6Million finite elements in the
> problem so
> > it is a fairly large calculation. The total memory
> > used is 3GBx16=48GB.
> > 
> > Both the two systems run Linux as OS and the same
> code
> > is compiled against the same version of MPICH-2
> and
> > PETSc.
> >  
> > The shared-memory machine is actually a little
> faster
> > than the cluster machines in terms of single
> process
> > runs.
> > 
> > I am surprised at this result since we usually
> tend to
> > think that shared-memory would be much faster
> since
> > the in-memory operation is much faster that the
> > network communication.
> > 
> > However, I read the PETSc FAQ and found that "the
> > speed of sparse matrix computations is almost
> totally
> > determined by the speed of the memory, not the
> speed
> > of the CPU". 
> > This makes me wonder whether the poor performance
> of
> > my code on a shared-memory machine is due to the
> > competition of different process on the same
> memory
> > bus. Since the code is still MPI based, a lot of
> data
> > are moving around inside the memory. Is this a
> > reasonable explanation of what I observed?
> > 
> > Thank you very much.
> > 
> > Shi
> > 
> > 
> >  
> >
>
____________________________________________________________________________________
> > Do you Yahoo!?
> > Everyone is raving about the all-new Yahoo! Mail
> beta.
> > http://new.mail.yahoo.com
> > 
> > 
> 
> 



 
____________________________________________________________________________________
Need a quick answer? Get one in minutes from people who know.
Ask your question on www.Answers.yahoo.com



From jinzishuai at yahoo.com  Sat Feb  3 15:50:01 2007
From: jinzishuai at yahoo.com (Shi Jin)
Date: Sat, 3 Feb 2007 13:50:01 -0800 (PST)
Subject: PETSc runs slower on a shared memory machine than on a cluster
In-Reply-To: <e7ba66e40702021516u5976390dk140c0769b23f73af@mail.gmail.com>
Message-ID: <323724.17234.qm@web36208.mail.mud.yahoo.com>

Thank you 
I rebuilt MPICH-2 with --with-device=ch3:shm and
--with-pm=gforker
I did see a slight improvement in speed. However,
compared with the cluster runs, the shared-memory
performance is still not as good at all.
So I think the problem is indeed in the memory
subsystem as Satith said.

Shi
--- Lisandro Dalcin <dalcinl at gmail.com> wrote:
> Please read the following...
> 
>
http://www-unix.mcs.anl.gov/mpi/mpich/downloads/mpich2-doc-README.txt
> 
> I think for shared-memory you should try to
> configure MPICH2 with the following:
> 
> --with-device=ch3:shm
> 
> If not, perhaps configure will default to
> --with-device=ch3:sock and
> MPICH2 will use TCP sockets.
> 
> I hope this help you.
> 
> Regards,
> 
> -- 
> Lisandro Dalc?n
> ---------------
> Centro Internacional de M?todos Computacionales en
> Ingenier?a (CIMEC)
> Instituto de Desarrollo Tecnol?gico para la
> Industria Qu?mica (INTEC)
> Consejo Nacional de Investigaciones Cient?ficas y
> T?cnicas (CONICET)
> PTLC - G?emes 3450, (3000) Santa Fe, Argentina
> Tel/Fax: +54-(0)342-451.1594
> 
> 



 
____________________________________________________________________________________
Food fight? Enjoy some healthy debate 
in the Yahoo! Answers Food & Drink Q&A.
http://answers.yahoo.com/dir/?link=list&sid=396545367



From bsmith at mcs.anl.gov  Sat Feb  3 18:57:29 2007
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Sat, 3 Feb 2007 18:57:29 -0600 (CST)
Subject: PETSc runs slower on a shared memory machine than on a cluster
In-Reply-To: <917118.25233.qm@web36202.mail.mud.yahoo.com>
References: <917118.25233.qm@web36202.mail.mud.yahoo.com>
Message-ID: <Pine.OSX.4.64.0702031845341.20722@barry-smiths-computer.local>


              Total Flop rate
  Cluster

VecMAXPY       1793
MatSolve        815

  Shared memory

VecMAXPY       1105
MatSolve        339

The vector operations in MAXPY and the triangular solves in MatSolve are 
memory bandwidth limited (triangular solves extremely). When all the processers
are demanding their needed memory bandwidth in the triangular solves the performance
suffers 339 vs 815 from the distributed memory case where each processor has its own
memory.

  Barry

On Sat, 3 Feb 2007, Shi Jin wrote:

> Thank you.
> I did the same runs again with -log_summary. Here is
> the part that I think is most important.
> On cluster:
> --- Event Stage 5: Projection
> Event                Count      Time (sec)    
> Flops/sec                         --- Global ---  ---
> Stage ---   Total
>                    Max Ratio  Max     Ratio   Max 
> Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M
> %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
> [x]rhsLu              99 1.0 2.3875e+02 1.0 0.00e+00
> 0.0 0.0e+00 0.0e+00 9.9e+01  7  0  0  0  0  14  0  0 
> 0  0     0
> VecMDot           133334 1.0 4.1386e+02 1.6 3.43e+08
> 1.6 0.0e+00 0.0e+00 1.3e+05 10 18  0  0 45  21 27  0 
> 0 49   883
> VecNorm           137829 1.0 6.9839e+01 1.5 1.27e+08
> 1.5 0.0e+00 0.0e+00 1.4e+05  2  1  0  0 46   4  2  0 
> 0 51   350
> VecScale          137928 1.0 5.5639e+00 1.1 5.79e+08
> 1.1 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0 
> 0  0  2197
> VecCopy             4495 1.0 8.4510e-01 1.1 0.00e+00
> 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0 
> 0  0     0
> VecSet            142522 1.0 1.7712e+01 1.5 0.00e+00
> 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  0  0 
> 0  0     0
> VecAXPY             8990 1.0 9.9013e-01 1.1 4.34e+08
> 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0 
> 0  0  1610
> VecMAXPY          137829 1.0 2.1687e+02 1.1 4.92e+08
> 1.1 0.0e+00 0.0e+00 0.0e+00  6 20  0  0  0  12 29  0 
> 0  0  1793
> VecScatterBegin   137829 1.0 2.1816e+01 1.9 0.00e+00
> 0.0 8.3e+05 3.4e+04 0.0e+00  0  0 91 74  0   1 
> 0100100  0     0
> VecScatterEnd     137730 1.0 3.0302e+01 1.6 0.00e+00
> 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0 
> 0  0     0
> VecNormalize      137829 1.0 7.6565e+01 1.4 1.68e+08
> 1.4 0.0e+00 0.0e+00 1.4e+05  2  2  0  0 46   4  3  0 
> 0 51   479
> MatMult           137730 1.0 3.5652e+02 1.3 2.58e+08
> 1.2 8.3e+05 3.4e+04 0.0e+00  9 15 91 74  0  19
> 21100100  0   815
> MatSolve          137829 1.0 5.0916e+02 1.2 1.56e+08
> 1.2 0.0e+00 0.0e+00 0.0e+00 13 14  0  0  0  28 20  0 
> 0  0   531
> MatGetRow        44110737 1.0 1.1846e+02 1.0 0.00e+00
> 0.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   7  0  0 
> 0  0     0
> KSPGMRESOrthog    133334 1.0 6.0430e+02 1.3 3.87e+08
> 1.3 0.0e+00 0.0e+00 1.3e+05 15 37  0  0 45  32 54  0 
> 0 49  1209
> KSPSolve              99 1.0 1.4336e+03 1.0 2.37e+08
> 1.0 8.3e+05 3.4e+04 2.7e+05 40 68 91 74 91 
> 86100100100100   944
> PCSetUpOnBlocks       99 1.0 3.2687e-04 1.2 0.00e+00
> 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0 
> 0  0     0
> PCApply           137829 1.0 5.3316e+02 1.2 1.50e+08
> 1.2 0.0e+00 0.0e+00 0.0e+00 14 14  0  0  0  30 20  0 
> 0  0   507
> ---------------------------------------------------
> On the shared memory machine:
> --- Event Stage 5: Projection
> Event                Count      Time (sec)    
> Flops/sec                         --- Global ---  ---
> Stage ---   Total
>                    Max Ratio  Max     Ratio   Max 
> Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M
> %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
> [x]rhsLu              99 1.0 2.0673e+02 1.0 0.00e+00
> 0.0 0.0e+00 0.0e+00 9.9e+01  5  0  0  0  0   9  0  0 
> 0  0     0
> VecMDot           133334 1.0 7.0932e+02 2.1 2.70e+08
> 2.1 0.0e+00 0.0e+00 1.3e+05 11 18  0  0 45  22 27  0 
> 0 49   515
> VecNorm           137829 1.0 1.2860e+02 7.0 3.32e+08
> 7.0 0.0e+00 0.0e+00 1.4e+05  2  1  0  0 46   3  2  0 
> 0 51   190
> VecScale          137928 1.0 5.0018e+00 1.0 6.36e+08
> 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0 
> 0  0  2444
> VecCopy             4495 1.0 1.4161e+00 1.8 0.00e+00
> 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0 
> 0  0     0
> VecSet            142522 1.0 1.9602e+01 2.1 0.00e+00
> 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  0  0 
> 0  0     0
> VecAXPY             8990 1.0 1.5128e+00 1.4 3.67e+08
> 1.4 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0 
> 0  0  1054
> VecMAXPY          137829 1.0 3.5204e+02 1.4 3.82e+08
> 1.4 0.0e+00 0.0e+00 0.0e+00  7 20  0  0  0  13 29  0 
> 0  0  1105
> VecScatterBegin   137829 1.0 1.4310e+01 2.2 0.00e+00
> 0.0 8.3e+05 3.4e+04 0.0e+00  0  0 91 74  0   0 
> 0100100  0     0
> VecScatterEnd     137730 1.0 1.5035e+02 6.5 0.00e+00
> 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   3  0  0 
> 0  0     0
> VecNormalize      137829 1.0 1.3453e+02 5.6 3.80e+08
> 5.6 0.0e+00 0.0e+00 1.4e+05  2  2  0  0 46   3  3  0 
> 0 51   272
> MatMult           137730 1.0 5.4179e+02 1.5 1.99e+08
> 1.4 8.3e+05 3.4e+04 0.0e+00 11 15 91 74  0  21
> 21100100  0   536
> MatSolve          137829 1.0 7.9682e+02 1.4 1.18e+08
> 1.4 0.0e+00 0.0e+00 0.0e+00 16 14  0  0  0  30 20  0 
> 0  0   339
> MatGetRow        44110737 1.0 1.0296e+02 1.0 0.00e+00
> 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   5  0  0 
> 0  0     0
> KSPGMRESOrthog    133334 1.0 9.4927e+02 1.4 2.75e+08
> 1.4 0.0e+00 0.0e+00 1.3e+05 18 37  0  0 45  34 54  0 
> 0 49   770
> KSPSolve              99 1.0 2.0562e+03 1.0 1.65e+08
> 1.0 8.3e+05 3.4e+04 2.7e+05 47 68 91 74 91 
> 91100100100100   658
> PCSetUpOnBlocks       99 1.0 3.3998e-04 1.5 0.00e+00
> 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0 
> 0  0     0
> PCApply           137829 1.0 8.2326e+02 1.4 1.14e+08
> 1.4 0.0e+00 0.0e+00 0.0e+00 16 14  0  0  0  31 20  0 
> 0  0   328
> 
> I do see that the cluster run is faster than the
> shared-memory case. However, I am not sure how I can
> tell the reason for this behavior is due to the memory
> subsystem. I don't know what evidence in the log to
> look for. 
> Thanks again.
> 
> Shi
> --- Satish Balay <balay at mcs.anl.gov> wrote:
> 
> > There are 2 aspects to performance.
> > 
> > - MPI performance [while message passing]
> > - sequential performance for the numerical stuff.
> > 
> > So it could be that the SMP box has better MPI
> > performance. This can
> > be verified with -log_summary from both the runs
> > [and looking at
> > VecScatter times]
> > 
> > However with the sequential numerical codes - it
> > primarily depends
> > upon the bandwidth between the CPU and the memory.
> > On the SMP box -
> > depending upon how the memory subsystem is designed
> > - the effective
> > memory bandwidth per cpu could be a small fraction
> > of the peak memory
> > bandwidth [when all cpus are used]
> > 
> > So you'll have to look at the memory subsystem
> > design of each of these
> > machines and compare the 'memory bandwidth per cpu].
> > The performance
> > from log_summary - for ex: in MatMult will reflect
> > this. [ including
> > the above communication overhead]
> > 
> > Satish
> > 
> > On Fri, 2 Feb 2007, Shi Jin wrote:
> > 
> > > Hi there,
> > > 
> > > I am fairly new to PETSc but have 5 years of MPI
> > > programming already. I recently took on a project
> > of
> > > analyzing a finite element code written in C with
> > > PETSc.
> > > I found out that on a shared-memory machine (60GB
> > RAM,
> > > 16    CPUS), the code runs around 4 times slower
> > than
> > > on a distributed memory cluster (4GB Ram,
> > 4CPU/node),
> > > although they yield identical results.
> > > There are 1.6Million finite elements in the
> > problem so
> > > it is a fairly large calculation. The total memory
> > > used is 3GBx16=48GB.
> > > 
> > > Both the two systems run Linux as OS and the same
> > code
> > > is compiled against the same version of MPICH-2
> > and
> > > PETSc.
> > >  
> > > The shared-memory machine is actually a little
> > faster
> > > than the cluster machines in terms of single
> > process
> > > runs.
> > > 
> > > I am surprised at this result since we usually
> > tend to
> > > think that shared-memory would be much faster
> > since
> > > the in-memory operation is much faster that the
> > > network communication.
> > > 
> > > However, I read the PETSc FAQ and found that "the
> > > speed of sparse matrix computations is almost
> > totally
> > > determined by the speed of the memory, not the
> > speed
> > > of the CPU". 
> > > This makes me wonder whether the poor performance
> > of
> > > my code on a shared-memory machine is due to the
> > > competition of different process on the same
> > memory
> > > bus. Since the code is still MPI based, a lot of
> > data
> > > are moving around inside the memory. Is this a
> > > reasonable explanation of what I observed?
> > > 
> > > Thank you very much.
> > > 
> > > Shi
> > > 
> > > 
> > >  
> > >
> >
> ____________________________________________________________________________________
> > > Do you Yahoo!?
> > > Everyone is raving about the all-new Yahoo! Mail
> > beta.
> > > http://new.mail.yahoo.com
> > > 
> > > 
> > 
> > 
> 
> 
> 
>  
> ____________________________________________________________________________________
> Need a quick answer? Get one in minutes from people who know.
> Ask your question on www.Answers.yahoo.com
> 
> 



From balay at mcs.anl.gov  Sat Feb  3 19:00:04 2007
From: balay at mcs.anl.gov (Satish Balay)
Date: Sat, 3 Feb 2007 19:00:04 -0600 (CST)
Subject: PETSc runs slower on a shared memory machine than on a cluster
In-Reply-To: <917118.25233.qm@web36202.mail.mud.yahoo.com>
References: <917118.25233.qm@web36202.mail.mud.yahoo.com>
Message-ID: <Pine.LNX.4.64.0702031836360.26579@asterix>


On Sat, 3 Feb 2007, Shi Jin wrote:

> I do see that the cluster run is faster than the shared-memory
> case. However, I am not sure how I can tell the reason for this
> behavior is due to the memory subsystem. I don't know what evidence
> in the log to look for.

There were too many linewraps in the e-mailed text. Its best to send
such text as attachments so that the format is preserved [and
readable]

Event                Count      Time (sec)    Flops/sec                         --- Global ---  ---Stage ---   Total
                   Max Ratio  Max     Ratio   Max Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
<cluster>
VecScatterBegin   137829 1.0 2.1816e+01 1.9 0.00e+00 0.0 8.3e+05 3.4e+04 0.0e+00  0  0 91 74  0   1 0100100  0     0
VecScatterEnd     137730 1.0 3.0302e+01 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0 0  0     0
MatMult           137730 1.0 3.5652e+02 1.3 2.58e+08 1.2 8.3e+05 3.4e+04 0.0e+00  9 15 91 74  0  1921100100  0   815
<SMP>
VecScatterBegin   137829 1.0 1.4310e+01 2.2 0.00e+00 0.0 8.3e+05 3.4e+04 0.0e+00  0  0 91 74  0   0 0100100  0     0
VecScatterEnd     137730 1.0 1.5035e+02 6.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   3  0  0 0  0     0
MatMult           137730 1.0 5.4179e+02 1.5 1.99e+08 1.4 8.3e+05 3.4e+04 0.0e+00 11 15 91 74  0  2121100100  0   536

Just looking at the time [in seconds] for VecScatterBegin()
,VecScatterEnd() ,MatMult() [which is the 4th column in the table]
we have:

[time in seconds]
                    Cluster     SMP 
VecScatterBegin      21         14
VecScatterEnd        30        150
MatMult             356        541
-----------------------------------

And MatMult is basically some local computation + Communication [which
is scatter time], then if you consider just the local coputation time
- and not the communication time, its its '356 -(21+30)' on the
cluster and '541-(14+150)' on the SMP box.

-----------------------------------
Communication cost   51        164
MatMult - (comm)    305        377

Considering this info - we can conclude the following:

** the communication cost on the the SMP box [164 seconds] is lot
higher than communication cost on the cluster [51 seconds]. Part of
the issue here is the load balance between all procs. [This is shown
by the 5th column in the table]

[load balance ratio]
                    Cluster      SMP
VecScatterBegin       1.9        2.2
VecScatterEnd         1.6        6.5
MatMult               1.3        1.5

Somehow things are more balanced on the cluster than on the SMP,
causing some procs to run slower than others - resulting in higher
communication cost on the SMP box. 

** The numerical part of MatMult is faster on the cluster [305
seconds] compared to the SMP box [377 seconds]. This is very likely
due to the memory bandwidth issues.


So both computation and communicaton times are better on the cluster
[for MatMult - which is an essential kernel in sparse matrix solve].

Satish



From dalcinl at gmail.com  Sat Feb  3 19:37:42 2007
From: dalcinl at gmail.com (Lisandro Dalcin)
Date: Sat, 3 Feb 2007 22:37:42 -0300
Subject: PETSc runs slower on a shared memory machine than on a cluster
In-Reply-To: <323724.17234.qm@web36208.mail.mud.yahoo.com>
References: <e7ba66e40702021516u5976390dk140c0769b23f73af@mail.gmail.com>
	 <323724.17234.qm@web36208.mail.mud.yahoo.com>
Message-ID: <e7ba66e40702031737l56bb2e3bo81df7b0fb64e68a4@mail.gmail.com>

On 2/3/07, Shi Jin <jinzishuai at yahoo.com> wrote:
> Thank you
> I rebuilt MPICH-2 with --with-device=ch3:shm and
> --with-pm=gforker
> I did see a slight improvement in speed. However,
> compared with the cluster runs, the shared-memory
> performance is still not as good at all.
> So I think the problem is indeed in the memory
> subsystem as Satith said.

Shi, can you provide me some more info about all this?

- What kind of problem are you solving?
- Are you using MATMPIAIJ or MATMPIBAIJ?
- What do you use to partition your problem (ParMetis)?
- How many processes do you have in your run (-np option) ?
- When you run in your cluster, you launc 1 process in each CPU of
your node? I mean, do you have 4 processes runing in each node?
- What kind of network do you have in your cluster? GiE? or something better?

I ask all this regarding previous comments of Barry and Shatish. If
you have 4 processes running on each node, them surely communicate
each other using the loopback interface, and this will have a
bandwidth similar to your memory bandwidth, so in your case not all
communication will go through the wires...

Sorry for my English,
Regards,


-- 
Lisandro Dalc?n
---------------
Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
PTLC - G?emes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



From jinzishuai at yahoo.com  Mon Feb  5 17:23:24 2007
From: jinzishuai at yahoo.com (Shi Jin)
Date: Mon, 5 Feb 2007 15:23:24 -0800 (PST)
Subject: PETSc runs slower on a shared memory machine than on a cluster
In-Reply-To: <419301.13342.qm@web36213.mail.mud.yahoo.com>
Message-ID: <321985.44474.qm@web36201.mail.mud.yahoo.com>

Hi there,

I have made some new progress on the issue of SMP
performance. Since my shared memory machine is a 8
dual-core Opteron machine. I think the two cores on a
single CPU chip shares the memory bandwidth.
Therefore, if I can avoid using the same core on the
chip, I can get some performance improvement. Indeed,
I am able to do this by the linux command taskset. 
Here is what I did:
petscmpirun -n 8 taskset -c 0,2,4,6,8,10,12,14 ../spAF
This way, I specifically ask the processes to be run
on the first core on the CPUs. 
By doing this, my performance is doubled compared with
the simple petscmpirun -n 8 ../spAF

So this test shows that we do suffer from the
competition of resources of multiple processes,
especially when we use 16 processes.

However, I should point out that even with the help
taskset, the shared-memory performance is still 30%
less than  that on the cluster.

I am not sure whether this problem exists specifically
for the AMD machines or it applys to any shared-memory
architecture.

Thanks.
Shi

--- Shi Jin <jinzishuai at yahoo.com> wrote:

> Hi there,
> 
> I am fairly new to PETSc but have 5 years of MPI
> programming already. I recently took on a project of
> analyzing a finite element code written in C with
> PETSc.
> I found out that on a shared-memory machine (60GB
> RAM,
> 16    CPUS), the code runs around 4 times slower
> than
> on a distributed memory cluster (4GB Ram,
> 4CPU/node),
> although they yield identical results.
> There are 1.6Million finite elements in the problem
> so
> it is a fairly large calculation. The total memory
> used is 3GBx16=48GB.
> 
> Both the two systems run Linux as OS and the same
> code
> is compiled against the same version of MPICH-2 and
> PETSc.
>  
> The shared-memory machine is actually a little
> faster
> than the cluster machines in terms of single process
> runs.
> 
> I am surprised at this result since we usually tend
> to
> think that shared-memory would be much faster since
> the in-memory operation is much faster that the
> network communication.
> 
> However, I read the PETSc FAQ and found that "the
> speed of sparse matrix computations is almost
> totally
> determined by the speed of the memory, not the speed
> of the CPU". 
> This makes me wonder whether the poor performance of
> my code on a shared-memory machine is due to the
> competition of different process on the same memory
> bus. Since the code is still MPI based, a lot of
> data
> are moving around inside the memory. Is this a
> reasonable explanation of what I observed?
> 
> Thank you very much.
> 
> Shi
> 
> 
>  
>
____________________________________________________________________________________
> Do you Yahoo!?
> Everyone is raving about the all-new Yahoo! Mail
> beta.
> http://new.mail.yahoo.com
> 
> 



 
____________________________________________________________________________________
Expecting? Get great news right away with email Auto-Check. 
Try the Yahoo! Mail Beta.
http://advision.webevents.yahoo.com/mailbeta/newmail_tools.html 



From balay at mcs.anl.gov  Mon Feb  5 18:33:15 2007
From: balay at mcs.anl.gov (Satish Balay)
Date: Mon, 5 Feb 2007 18:33:15 -0600 (CST)
Subject: PETSc runs slower on a shared memory machine than on a cluster
In-Reply-To: <321985.44474.qm@web36201.mail.mud.yahoo.com>
References: <321985.44474.qm@web36201.mail.mud.yahoo.com>
Message-ID: <Pine.LNX.4.64.0702051822060.14338@asterix>

A couple of comments:

- with the dual core opteron - the memorybandwith per core is now
reduced by half - so the performance suffers.  However memory
bandwidth across CPUs is scalable. [6.4 Gb/s per each node or 3.2Gb/s
per core]

- Current generation Intel Core 2 duo appears to claim having
sufficient bandwidth [15.3Gb/s per node = 7.6Gb/s per core?] so from
this bandwidth number - this chip might do better than the AMD
chip. However I'm not sure if there is a SMP with this chip - which
has scalable memory system [across say 8 nodes - as you currently
have..]

- Older intel SMP boxes has a single memory bank shared across all the
CPUs [so effective bandwidth per CPU was pretty small. Optrons'
scalable architecture looked much better than the older intel SMPs]

- From previous log_summary - part of the inefficiency of the SMP box
[when compared to the cluster] was in the MPI performance. Do you
still see this effect in the '-np 8' runs? If so this could be the
[part of the] reason for this 30% reduction in performance.

Satish

On Mon, 5 Feb 2007, Shi Jin wrote:

> Hi there,
> 
> I have made some new progress on the issue of SMP
> performance. Since my shared memory machine is a 8
> dual-core Opteron machine. I think the two cores on a
> single CPU chip shares the memory bandwidth.
> Therefore, if I can avoid using the same core on the
> chip, I can get some performance improvement. Indeed,
> I am able to do this by the linux command taskset. 
> Here is what I did:
> petscmpirun -n 8 taskset -c 0,2,4,6,8,10,12,14 ../spAF
> This way, I specifically ask the processes to be run
> on the first core on the CPUs. 
> By doing this, my performance is doubled compared with
> the simple petscmpirun -n 8 ../spAF
> 
> So this test shows that we do suffer from the
> competition of resources of multiple processes,
> especially when we use 16 processes.
> 
> However, I should point out that even with the help
> taskset, the shared-memory performance is still 30%
> less than  that on the cluster.
> 
> I am not sure whether this problem exists specifically
> for the AMD machines or it applys to any shared-memory
> architecture.
> 
> Thanks.
> Shi
> 
> --- Shi Jin <jinzishuai at yahoo.com> wrote:
> 
> > Hi there,
> > 
> > I am fairly new to PETSc but have 5 years of MPI
> > programming already. I recently took on a project of
> > analyzing a finite element code written in C with
> > PETSc.
> > I found out that on a shared-memory machine (60GB
> > RAM,
> > 16    CPUS), the code runs around 4 times slower
> > than
> > on a distributed memory cluster (4GB Ram,
> > 4CPU/node),
> > although they yield identical results.
> > There are 1.6Million finite elements in the problem
> > so
> > it is a fairly large calculation. The total memory
> > used is 3GBx16=48GB.
> > 
> > Both the two systems run Linux as OS and the same
> > code
> > is compiled against the same version of MPICH-2 and
> > PETSc.
> >  
> > The shared-memory machine is actually a little
> > faster
> > than the cluster machines in terms of single process
> > runs.
> > 
> > I am surprised at this result since we usually tend
> > to
> > think that shared-memory would be much faster since
> > the in-memory operation is much faster that the
> > network communication.
> > 
> > However, I read the PETSc FAQ and found that "the
> > speed of sparse matrix computations is almost
> > totally
> > determined by the speed of the memory, not the speed
> > of the CPU". 
> > This makes me wonder whether the poor performance of
> > my code on a shared-memory machine is due to the
> > competition of different process on the same memory
> > bus. Since the code is still MPI based, a lot of
> > data
> > are moving around inside the memory. Is this a
> > reasonable explanation of what I observed?
> > 
> > Thank you very much.
> > 
> > Shi
> > 
> > 
> >  
> >
> ____________________________________________________________________________________
> > Do you Yahoo!?
> > Everyone is raving about the all-new Yahoo! Mail
> > beta.
> > http://new.mail.yahoo.com
> > 
> > 
> 
> 
> 
>  
> ____________________________________________________________________________________
> Expecting? Get great news right away with email Auto-Check. 
> Try the Yahoo! Mail Beta.
> http://advision.webevents.yahoo.com/mailbeta/newmail_tools.html 
> 
> 



From balay at mcs.anl.gov  Mon Feb  5 19:05:46 2007
From: balay at mcs.anl.gov (Satish Balay)
Date: Mon, 5 Feb 2007 19:05:46 -0600 (CST)
Subject: PETSc runs slower on a shared memory machine than on a cluster
In-Reply-To: <Pine.LNX.4.64.0702051822060.14338@asterix>
References: <321985.44474.qm@web36201.mail.mud.yahoo.com>
 <Pine.LNX.4.64.0702051822060.14338@asterix>
Message-ID: <Pine.LNX.4.64.0702051850260.14338@asterix>

One more comment in regards to single core vs dual core opteron:

There are two ways to evaluate the performance. Performance per core -
or performance for the price [of the machine].

Ideally we'd like the performance per core be scalable [for publishing
pretty graphs]. However the dual core machine does not cost twice the
cost of single core machine. [Its probably costs 10-30% more]. So
realistically - if one can get the same factor of improvement in
performance with 16nodes vs 8nodes, one can consider the dual core
machine as providing reasonable performance.

Satish

On Mon, 5 Feb 2007, Satish Balay wrote:

> A couple of comments:
> 
> - with the dual core opteron - the memorybandwith per core is now
> reduced by half - so the performance suffers.  However memory
> bandwidth across CPUs is scalable. [6.4 Gb/s per each node or 3.2Gb/s
> per core]
> 
> - Current generation Intel Core 2 duo appears to claim having
> sufficient bandwidth [15.3Gb/s per node = 7.6Gb/s per core?] so from
> this bandwidth number - this chip might do better than the AMD
> chip. However I'm not sure if there is a SMP with this chip - which
> has scalable memory system [across say 8 nodes - as you currently
> have..]
> 
> - Older intel SMP boxes has a single memory bank shared across all the
> CPUs [so effective bandwidth per CPU was pretty small. Optrons'
> scalable architecture looked much better than the older intel SMPs]
> 
> - From previous log_summary - part of the inefficiency of the SMP box
> [when compared to the cluster] was in the MPI performance. Do you
> still see this effect in the '-np 8' runs? If so this could be the
> [part of the] reason for this 30% reduction in performance.
> 
> Satish
> 
> On Mon, 5 Feb 2007, Shi Jin wrote:
> 
> > Hi there,
> > 
> > I have made some new progress on the issue of SMP
> > performance. Since my shared memory machine is a 8
> > dual-core Opteron machine. I think the two cores on a
> > single CPU chip shares the memory bandwidth.
> > Therefore, if I can avoid using the same core on the
> > chip, I can get some performance improvement. Indeed,
> > I am able to do this by the linux command taskset. 
> > Here is what I did:
> > petscmpirun -n 8 taskset -c 0,2,4,6,8,10,12,14 ../spAF
> > This way, I specifically ask the processes to be run
> > on the first core on the CPUs. 
> > By doing this, my performance is doubled compared with
> > the simple petscmpirun -n 8 ../spAF
> > 
> > So this test shows that we do suffer from the
> > competition of resources of multiple processes,
> > especially when we use 16 processes.
> > 
> > However, I should point out that even with the help
> > taskset, the shared-memory performance is still 30%
> > less than  that on the cluster.
> > 
> > I am not sure whether this problem exists specifically
> > for the AMD machines or it applys to any shared-memory
> > architecture.
> > 
> > Thanks.
> > Shi
> > 
> > --- Shi Jin <jinzishuai at yahoo.com> wrote:
> > 
> > > Hi there,
> > > 
> > > I am fairly new to PETSc but have 5 years of MPI
> > > programming already. I recently took on a project of
> > > analyzing a finite element code written in C with
> > > PETSc.
> > > I found out that on a shared-memory machine (60GB
> > > RAM,
> > > 16    CPUS), the code runs around 4 times slower
> > > than
> > > on a distributed memory cluster (4GB Ram,
> > > 4CPU/node),
> > > although they yield identical results.
> > > There are 1.6Million finite elements in the problem
> > > so
> > > it is a fairly large calculation. The total memory
> > > used is 3GBx16=48GB.
> > > 
> > > Both the two systems run Linux as OS and the same
> > > code
> > > is compiled against the same version of MPICH-2 and
> > > PETSc.
> > >  
> > > The shared-memory machine is actually a little
> > > faster
> > > than the cluster machines in terms of single process
> > > runs.
> > > 
> > > I am surprised at this result since we usually tend
> > > to
> > > think that shared-memory would be much faster since
> > > the in-memory operation is much faster that the
> > > network communication.
> > > 
> > > However, I read the PETSc FAQ and found that "the
> > > speed of sparse matrix computations is almost
> > > totally
> > > determined by the speed of the memory, not the speed
> > > of the CPU". 
> > > This makes me wonder whether the poor performance of
> > > my code on a shared-memory machine is due to the
> > > competition of different process on the same memory
> > > bus. Since the code is still MPI based, a lot of
> > > data
> > > are moving around inside the memory. Is this a
> > > reasonable explanation of what I observed?
> > > 
> > > Thank you very much.
> > > 
> > > Shi
> > > 
> > > 
> > >  
> > >
> > ____________________________________________________________________________________
> > > Do you Yahoo!?
> > > Everyone is raving about the all-new Yahoo! Mail
> > > beta.
> > > http://new.mail.yahoo.com
> > > 
> > > 
> > 
> > 
> > 
> >  
> > ____________________________________________________________________________________
> > Expecting? Get great news right away with email Auto-Check. 
> > Try the Yahoo! Mail Beta.
> > http://advision.webevents.yahoo.com/mailbeta/newmail_tools.html 
> > 
> > 
> 
> 



From yaron at oak-research.com  Wed Feb  7 00:33:25 2007
From: yaron at oak-research.com (yaron at oak-research.com)
Date: Tue, 06 Feb 2007 22:33:25 -0800
Subject: Non-uniform 2D mesh questions
Message-ID: <20070207063325.20285.qmail@s402.sureserver.com>

Barry-
Maybe i'd better provide more details on what I'm trying to do

*) I'm modeling current flowing through several different "block types",
each of which describes a section of a semiconductor device. Each block
type has a different geometry, which is triangulated to create an AIJ
matrix (Each row/column in the matrix represents a coordinate, and the
matrix values represent electrical admittance). There are about 100
different types of these blocks, and since they have quite convoluted
geometries , their triangulation takes quite a while.

*) My complete problem is composed of many (up to 10K) tiles, each of
which is one of the 100 blocks . I want to reuse the triangulation which
was done for each of the block, do I'd like to have a way of taking the
matrix objects of the individual blcks, and combine them into a large
matrix, taking into account their relative locations.

*) This means that for each block instance, I would need to to translate
every internal coordinate/node, and map it to a global coordinate/node.

*) Once I have a mapping of local to global indices, I'd like to take the
matrix values of the instances, and combine them to form a large matrix
which describes the complete problem.


So my question is :
*) What data structures (DA/IS/AO/???) should I use to achieve the above?

Best Regards
Yaron


	-------Original Message-------
  From: Barry Smith
  Subject: Re: Non-uniform 2D mesh questions
  Sent: 02 Feb '07 13:38


   Yaron,

   Anything is possible :-) and maybe not terribly difficult to get
started.

  You could use DAGetMatrx() to give you the properly pre-allocated "huge"
Mat.

  Have each process loop over the "rectangular portion[s] of the domain"
that
  it mostly owns (that is if a rectangular portion lies across two
processes just
  assign it to one of them for this loop.)

  Then loop over the locations inside the rectangular portion calling
  MatSetValuesStencil() for that row of the huge matrix to put the entries
from
  the smaller matrix INTO the huge matrix using the natural grid i,j
coordindates
  (so not have to map the coordinates from the grid location to the
location in
  the matrix).

  This may require some thought to get right but should require little
coding
  (if you are writting hundreds and hundreds of lines of code then likely
  something is wrong).

   Good luck,

   Barry


  On Tue, 30 Jan 2007, [LINK:
http://webmail.oak-research.com/compose.php?to=yaron at oak-research.com]
yaron at oak-research.com wrote:

  > Barry-
  > So far I only thought of having a single large sparse matrix.
  >
  > Yaron
  >
  >
  > 	 -------Original Message-------
  > From: Barry Smith
  > Subject: Re: Non-uniform 2D mesh questions
  > Sent: 30 Jan '07 10:58
  >
  >
  > Yaron,
  >
  > Do you want to end up generating a single large sparse matrix? Like a
  > MPIAIJ
  > matrix? Or do you want to somehow not store the entire huge matrix but
  > still
  > be able to solve with the composed matrix? Or both?
  >
  > Barry
  >
  >
  > On Mon, 29 Jan 2007, [LINK:
  > [LINK:
http://webmail.oak-research.com/compose.php?to=yaron at oak-research.com]
http://webmail.oak-research.com/compose.php?to=yaron at oak-research.com]
  > [LINK:
http://webmail.oak-research.com/compose.php?to=yaron at oak-research.com]
yaron at oak-research.com wrote:
  >
  > > Barry-
  > > Yes, each block is a rectangular portion of the domain. Not so small
  > > though (more like 100 x 100 nodes)
  > >
  > > Yaron
  > >
  > >
  > > 	 -------Original Message-------
  > > From: Barry Smith
  > > Subject: Re: Non-uniform 2D mesh questions
  > > Sent: 29 Jan '07 19:40
  > >
  > >
  > > Yaron,
  > >
  > > Is each one of these "blocks" a small rectangular part of the
  > > domain (like a 4 by 5 set of nodes)? I don't understand what you
  > > want to do.
  > >
  > > Barry
  > >
  > >
  > > On Mon, 29 Jan 2007, [LINK:
  > > [LINK:
  > [LINK:
http://webmail.oak-research.com/compose.php?to=yaron at oak-research.com]
http://webmail.oak-research.com/compose.php?to=yaron at oak-research.com]
  > [LINK:
http://webmail.oak-research.com/compose.php?to=yaron at oak-research.com]
http://webmail.oak-research.com/compose.php?to=yaron at oak-research.com]
  > > [LINK:
  > [LINK:
http://webmail.oak-research.com/compose.php?to=yaron at oak-research.com]
http://webmail.oak-research.com/compose.php?to=yaron at oak-research.com]
  > [LINK:
http://webmail.oak-research.com/compose.php?to=yaron at oak-research.com]
yaron at oak-research.com wrote:
  > >
  > > > Hi all
  > > > I have a laplace-type problem that's physically built from
repeating
  > > > instances of the same block.
  > > > I'm creaing matrices for the individual blocks, and I'd like to
  > reuse
  > > > the individual block matrices in order to compose the complete
  > > problem.
  > > > (i.e if there 10K instances of 20 blocks, I'd like to build 20
  > > matrices,
  > > > then use them to compose the large complete matrix)
  > > > Is a 2D DA the right object to do that? And if so, where can I
find
  > a
  > > > small example of building the DA object in parallel, then using
the
  > > > different (for every instance) mappings of local nodes to global
  > nodes
  > > in
  > > > order to build the complete matrix?
  > > >
  > > >
  > > > Thanks
  > > > Yaron
  > > >
  > >
  >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070206/7bcf7768/attachment.htm>

From yaron at oak-research.com  Wed Feb  7 00:33:25 2007
From: yaron at oak-research.com (yaron at oak-research.com)
Date: Tue, 06 Feb 2007 22:33:25 -0800
Subject: Non-uniform 2D mesh questions
Message-ID: <20070207063325.20285.qmail@s402.sureserver.com>

Barry-
Maybe i'd better provide more details on what I'm trying to do

*) I'm modeling current flowing through several different "block types",
each of which describes a section of a semiconductor device. Each block
type has a different geometry, which is triangulated to create an AIJ
matrix (Each row/column in the matrix represents a coordinate, and the
matrix values represent electrical admittance). There are about 100
different types of these blocks, and since they have quite convoluted
geometries , their triangulation takes quite a while.

*) My complete problem is composed of many (up to 10K) tiles, each of
which is one of the 100 blocks . I want to reuse the triangulation which
was done for each of the block, do I'd like to have a way of taking the
matrix objects of the individual blcks, and combine them into a large
matrix, taking into account their relative locations.

*) This means that for each block instance, I would need to to translate
every internal coordinate/node, and map it to a global coordinate/node.

*) Once I have a mapping of local to global indices, I'd like to take the
matrix values of the instances, and combine them to form a large matrix
which describes the complete problem.


So my question is :
*) What data structures (DA/IS/AO/???) should I use to achieve the above?

Best Regards
Yaron


	-------Original Message-------
  From: Barry Smith
  Subject: Re: Non-uniform 2D mesh questions
  Sent: 02 Feb '07 13:38


   Yaron,

   Anything is possible :-) and maybe not terribly difficult to get
started.

  You could use DAGetMatrx() to give you the properly pre-allocated "huge"
Mat.

  Have each process loop over the "rectangular portion[s] of the domain"
that
  it mostly owns (that is if a rectangular portion lies across two
processes just
  assign it to one of them for this loop.)

  Then loop over the locations inside the rectangular portion calling
  MatSetValuesStencil() for that row of the huge matrix to put the entries
from
  the smaller matrix INTO the huge matrix using the natural grid i,j
coordindates
  (so not have to map the coordinates from the grid location to the
location in
  the matrix).

  This may require some thought to get right but should require little
coding
  (if you are writting hundreds and hundreds of lines of code then likely
  something is wrong).

   Good luck,

   Barry


  On Tue, 30 Jan 2007, [LINK:
http://webmail.oak-research.com/compose.php?to=yaron at oak-research.com]
yaron at oak-research.com wrote:

  > Barry-
  > So far I only thought of having a single large sparse matrix.
  >
  > Yaron
  >
  >
  > 	 -------Original Message-------
  > From: Barry Smith
  > Subject: Re: Non-uniform 2D mesh questions
  > Sent: 30 Jan '07 10:58
  >
  >
  > Yaron,
  >
  > Do you want to end up generating a single large sparse matrix? Like a
  > MPIAIJ
  > matrix? Or do you want to somehow not store the entire huge matrix but
  > still
  > be able to solve with the composed matrix? Or both?
  >
  > Barry
  >
  >
  > On Mon, 29 Jan 2007, [LINK:
  > [LINK:
http://webmail.oak-research.com/compose.php?to=yaron at oak-research.com]
http://webmail.oak-research.com/compose.php?to=yaron at oak-research.com]
  > [LINK:
http://webmail.oak-research.com/compose.php?to=yaron at oak-research.com]
yaron at oak-research.com wrote:
  >
  > > Barry-
  > > Yes, each block is a rectangular portion of the domain. Not so small
  > > though (more like 100 x 100 nodes)
  > >
  > > Yaron
  > >
  > >
  > > 	 -------Original Message-------
  > > From: Barry Smith
  > > Subject: Re: Non-uniform 2D mesh questions
  > > Sent: 29 Jan '07 19:40
  > >
  > >
  > > Yaron,
  > >
  > > Is each one of these "blocks" a small rectangular part of the
  > > domain (like a 4 by 5 set of nodes)? I don't understand what you
  > > want to do.
  > >
  > > Barry
  > >
  > >
  > > On Mon, 29 Jan 2007, [LINK:
  > > [LINK:
  > [LINK:
http://webmail.oak-research.com/compose.php?to=yaron at oak-research.com]
http://webmail.oak-research.com/compose.php?to=yaron at oak-research.com]
  > [LINK:
http://webmail.oak-research.com/compose.php?to=yaron at oak-research.com]
http://webmail.oak-research.com/compose.php?to=yaron at oak-research.com]
  > > [LINK:
  > [LINK:
http://webmail.oak-research.com/compose.php?to=yaron at oak-research.com]
http://webmail.oak-research.com/compose.php?to=yaron at oak-research.com]
  > [LINK:
http://webmail.oak-research.com/compose.php?to=yaron at oak-research.com]
yaron at oak-research.com wrote:
  > >
  > > > Hi all
  > > > I have a laplace-type problem that's physically built from
repeating
  > > > instances of the same block.
  > > > I'm creaing matrices for the individual blocks, and I'd like to
  > reuse
  > > > the individual block matrices in order to compose the complete
  > > problem.
  > > > (i.e if there 10K instances of 20 blocks, I'd like to build 20
  > > matrices,
  > > > then use them to compose the large complete matrix)
  > > > Is a 2D DA the right object to do that? And if so, where can I
find
  > a
  > > > small example of building the DA object in parallel, then using
the
  > > > different (for every instance) mappings of local nodes to global
  > nodes
  > > in
  > > > order to build the complete matrix?
  > > >
  > > >
  > > > Thanks
  > > > Yaron
  > > >
  > >
  >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070206/7bcf7768/attachment-0001.htm>

From jinzishuai at yahoo.com  Wed Feb  7 10:27:48 2007
From: jinzishuai at yahoo.com (Shi Jin)
Date: Wed, 7 Feb 2007 08:27:48 -0800 (PST)
Subject: PETSc runs slower on a shared memory machine than on a cluster
In-Reply-To: <Pine.LNX.4.64.0702051822060.14338@asterix>
Message-ID: <218214.27889.qm@web36214.mail.mud.yahoo.com>

Thank you very much, Satish.
You are right. From the log_summary, the communication
takes slightly more time on the shared memory than the
cluster even after using the taskset.
This is still hard to understand since I think
in-memory operations have to been orders of magnitude
faster than network opertations(gigabit ethernet).

By the way, I took a look my the specs of my
shared-memory machine( Sun Fire Server 4600).
It seems that each CPU socket has its own DIMMS of
RAM.
I wonder if there is a speed issue if one has to copy
data from the RAM of one CPU to another.

Thanks.

Shi
--- Satish Balay <balay at mcs.anl.gov> wrote:

> A couple of comments:
> 
> - with the dual core opteron - the memorybandwith
> per core is now
> reduced by half - so the performance suffers. 
> However memory
> bandwidth across CPUs is scalable. [6.4 Gb/s per
> each node or 3.2Gb/s
> per core]
> 
> - Current generation Intel Core 2 duo appears to
> claim having
> sufficient bandwidth [15.3Gb/s per node = 7.6Gb/s
> per core?] so from
> this bandwidth number - this chip might do better
> than the AMD
> chip. However I'm not sure if there is a SMP with
> this chip - which
> has scalable memory system [across say 8 nodes - as
> you currently
> have..]
> 
> - Older intel SMP boxes has a single memory bank
> shared across all the
> CPUs [so effective bandwidth per CPU was pretty
> small. Optrons'
> scalable architecture looked much better than the
> older intel SMPs]
> 
> - From previous log_summary - part of the
> inefficiency of the SMP box
> [when compared to the cluster] was in the MPI
> performance. Do you
> still see this effect in the '-np 8' runs? If so
> this could be the
> [part of the] reason for this 30% reduction in
> performance.
> 
> Satish
> 
> On Mon, 5 Feb 2007, Shi Jin wrote:
> 
> > Hi there,
> > 
> > I have made some new progress on the issue of SMP
> > performance. Since my shared memory machine is a 8
> > dual-core Opteron machine. I think the two cores
> on a
> > single CPU chip shares the memory bandwidth.
> > Therefore, if I can avoid using the same core on
> the
> > chip, I can get some performance improvement.
> Indeed,
> > I am able to do this by the linux command taskset.
> 
> > Here is what I did:
> > petscmpirun -n 8 taskset -c 0,2,4,6,8,10,12,14
> ../spAF
> > This way, I specifically ask the processes to be
> run
> > on the first core on the CPUs. 
> > By doing this, my performance is doubled compared
> with
> > the simple petscmpirun -n 8 ../spAF
> > 
> > So this test shows that we do suffer from the
> > competition of resources of multiple processes,
> > especially when we use 16 processes.
> > 
> > However, I should point out that even with the
> help
> > taskset, the shared-memory performance is still
> 30%
> > less than  that on the cluster.
> > 
> > I am not sure whether this problem exists
> specifically
> > for the AMD machines or it applys to any
> shared-memory
> > architecture.
> > 
> > Thanks.
> > Shi
> > 
> > --- Shi Jin <jinzishuai at yahoo.com> wrote:
> > 
> > > Hi there,
> > > 
> > > I am fairly new to PETSc but have 5 years of MPI
> > > programming already. I recently took on a
> project of
> > > analyzing a finite element code written in C
> with
> > > PETSc.
> > > I found out that on a shared-memory machine
> (60GB
> > > RAM,
> > > 16    CPUS), the code runs around 4 times slower
> > > than
> > > on a distributed memory cluster (4GB Ram,
> > > 4CPU/node),
> > > although they yield identical results.
> > > There are 1.6Million finite elements in the
> problem
> > > so
> > > it is a fairly large calculation. The total
> memory
> > > used is 3GBx16=48GB.
> > > 
> > > Both the two systems run Linux as OS and the
> same
> > > code
> > > is compiled against the same version of MPICH-2
> and
> > > PETSc.
> > >  
> > > The shared-memory machine is actually a little
> > > faster
> > > than the cluster machines in terms of single
> process
> > > runs.
> > > 
> > > I am surprised at this result since we usually
> tend
> > > to
> > > think that shared-memory would be much faster
> since
> > > the in-memory operation is much faster that the
> > > network communication.
> > > 
> > > However, I read the PETSc FAQ and found that
> "the
> > > speed of sparse matrix computations is almost
> > > totally
> > > determined by the speed of the memory, not the
> speed
> > > of the CPU". 
> > > This makes me wonder whether the poor
> performance of
> > > my code on a shared-memory machine is due to the
> > > competition of different process on the same
> memory
> > > bus. Since the code is still MPI based, a lot of
> > > data
> > > are moving around inside the memory. Is this a
> > > reasonable explanation of what I observed?
> > > 
> > > Thank you very much.
> > > 
> > > Shi
> > > 
> > > 
> > >  
> > >
> >
>
____________________________________________________________________________________
> > > Do you Yahoo!?
> > > Everyone is raving about the all-new Yahoo! Mail
> > > beta.
> > > http://new.mail.yahoo.com
> > > 
> > > 
> > 
> > 
> > 
> >  
> >
>
____________________________________________________________________________________
> > Expecting? Get great news right away with email
> Auto-Check. 
> > Try the Yahoo! Mail Beta.
> >
>
http://advision.webevents.yahoo.com/mailbeta/newmail_tools.html
> 
> > 
> > 
> 
> 



 
____________________________________________________________________________________
Don't get soaked.  Take a quick peak at the forecast
with the Yahoo! Search weather shortcut.
http://tools.search.yahoo.com/shortcuts/#loc_weather



From balay at mcs.anl.gov  Wed Feb  7 10:58:06 2007
From: balay at mcs.anl.gov (Satish Balay)
Date: Wed, 7 Feb 2007 10:58:06 -0600 (CST)
Subject: PETSc runs slower on a shared memory machine than on a cluster
In-Reply-To: <218214.27889.qm@web36214.mail.mud.yahoo.com>
References: <218214.27889.qm@web36214.mail.mud.yahoo.com>
Message-ID: <Pine.LNX.4.64.0702071052580.14338@asterix>

Can you run the app with the following options [use one per run] - and
see if it makes any difference in performance [in VecScatters]

-vecscatter_rr
-vecscatter_ssend
-vecscatter_sendfirst

Also - you might want to try using the latest mpich to see if there
are any improvements.

Regarding the hardware issues - yeah - AMD has a NUMA architecture
[i.e access from memory from a different cpu is slower than the memory
on the local CPU]. There could also be some OS issues wrt memory
layout for MPI messages - or some other contention [perhaps IO
interrupts from the OS?] that could be causing the slowdown. All of
this is just a guess..

Satish


On Wed, 7 Feb 2007, Shi Jin wrote:

> Thank you very much, Satish.
> You are right. From the log_summary, the communication
> takes slightly more time on the shared memory than the
> cluster even after using the taskset.
> This is still hard to understand since I think
> in-memory operations have to been orders of magnitude
> faster than network opertations(gigabit ethernet).
> 
> By the way, I took a look my the specs of my
> shared-memory machine( Sun Fire Server 4600).
> It seems that each CPU socket has its own DIMMS of
> RAM.
> I wonder if there is a speed issue if one has to copy
> data from the RAM of one CPU to another.
> 
> Thanks.
> 
> Shi
> --- Satish Balay <balay at mcs.anl.gov> wrote:
> 
> > A couple of comments:
> > 
> > - with the dual core opteron - the memorybandwith
> > per core is now
> > reduced by half - so the performance suffers. 
> > However memory
> > bandwidth across CPUs is scalable. [6.4 Gb/s per
> > each node or 3.2Gb/s
> > per core]
> > 
> > - Current generation Intel Core 2 duo appears to
> > claim having
> > sufficient bandwidth [15.3Gb/s per node = 7.6Gb/s
> > per core?] so from
> > this bandwidth number - this chip might do better
> > than the AMD
> > chip. However I'm not sure if there is a SMP with
> > this chip - which
> > has scalable memory system [across say 8 nodes - as
> > you currently
> > have..]
> > 
> > - Older intel SMP boxes has a single memory bank
> > shared across all the
> > CPUs [so effective bandwidth per CPU was pretty
> > small. Optrons'
> > scalable architecture looked much better than the
> > older intel SMPs]
> > 
> > - From previous log_summary - part of the
> > inefficiency of the SMP box
> > [when compared to the cluster] was in the MPI
> > performance. Do you
> > still see this effect in the '-np 8' runs? If so
> > this could be the
> > [part of the] reason for this 30% reduction in
> > performance.
> > 
> > Satish
> > 
> > On Mon, 5 Feb 2007, Shi Jin wrote:
> > 
> > > Hi there,
> > > 
> > > I have made some new progress on the issue of SMP
> > > performance. Since my shared memory machine is a 8
> > > dual-core Opteron machine. I think the two cores
> > on a
> > > single CPU chip shares the memory bandwidth.
> > > Therefore, if I can avoid using the same core on
> > the
> > > chip, I can get some performance improvement.
> > Indeed,
> > > I am able to do this by the linux command taskset.
> > 
> > > Here is what I did:
> > > petscmpirun -n 8 taskset -c 0,2,4,6,8,10,12,14
> > ../spAF
> > > This way, I specifically ask the processes to be
> > run
> > > on the first core on the CPUs. 
> > > By doing this, my performance is doubled compared
> > with
> > > the simple petscmpirun -n 8 ../spAF
> > > 
> > > So this test shows that we do suffer from the
> > > competition of resources of multiple processes,
> > > especially when we use 16 processes.
> > > 
> > > However, I should point out that even with the
> > help
> > > taskset, the shared-memory performance is still
> > 30%
> > > less than  that on the cluster.
> > > 
> > > I am not sure whether this problem exists
> > specifically
> > > for the AMD machines or it applys to any
> > shared-memory
> > > architecture.
> > > 
> > > Thanks.
> > > Shi
> > > 
> > > --- Shi Jin <jinzishuai at yahoo.com> wrote:
> > > 
> > > > Hi there,
> > > > 
> > > > I am fairly new to PETSc but have 5 years of MPI
> > > > programming already. I recently took on a
> > project of
> > > > analyzing a finite element code written in C
> > with
> > > > PETSc.
> > > > I found out that on a shared-memory machine
> > (60GB
> > > > RAM,
> > > > 16    CPUS), the code runs around 4 times slower
> > > > than
> > > > on a distributed memory cluster (4GB Ram,
> > > > 4CPU/node),
> > > > although they yield identical results.
> > > > There are 1.6Million finite elements in the
> > problem
> > > > so
> > > > it is a fairly large calculation. The total
> > memory
> > > > used is 3GBx16=48GB.
> > > > 
> > > > Both the two systems run Linux as OS and the
> > same
> > > > code
> > > > is compiled against the same version of MPICH-2
> > and
> > > > PETSc.
> > > >  
> > > > The shared-memory machine is actually a little
> > > > faster
> > > > than the cluster machines in terms of single
> > process
> > > > runs.
> > > > 
> > > > I am surprised at this result since we usually
> > tend
> > > > to
> > > > think that shared-memory would be much faster
> > since
> > > > the in-memory operation is much faster that the
> > > > network communication.
> > > > 
> > > > However, I read the PETSc FAQ and found that
> > "the
> > > > speed of sparse matrix computations is almost
> > > > totally
> > > > determined by the speed of the memory, not the
> > speed
> > > > of the CPU". 
> > > > This makes me wonder whether the poor
> > performance of
> > > > my code on a shared-memory machine is due to the
> > > > competition of different process on the same
> > memory
> > > > bus. Since the code is still MPI based, a lot of
> > > > data
> > > > are moving around inside the memory. Is this a
> > > > reasonable explanation of what I observed?
> > > > 
> > > > Thank you very much.
> > > > 
> > > > Shi
> > > > 
> > > > 
> > > >  
> > > >
> > >
> >
> ____________________________________________________________________________________
> > > > Do you Yahoo!?
> > > > Everyone is raving about the all-new Yahoo! Mail
> > > > beta.
> > > > http://new.mail.yahoo.com
> > > > 
> > > > 
> > > 
> > > 
> > > 
> > >  
> > >
> >
> ____________________________________________________________________________________
> > > Expecting? Get great news right away with email
> > Auto-Check. 
> > > Try the Yahoo! Mail Beta.
> > >
> >
> http://advision.webevents.yahoo.com/mailbeta/newmail_tools.html
> > 
> > > 
> > > 
> > 
> > 
> 
> 
> 
>  
> ____________________________________________________________________________________
> Don't get soaked.  Take a quick peak at the forecast
> with the Yahoo! Search weather shortcut.
> http://tools.search.yahoo.com/shortcuts/#loc_weather
> 
> 



From zonexo at gmail.com  Thu Feb  8 09:47:24 2007
From: zonexo at gmail.com (Ben Tay)
Date: Thu, 8 Feb 2007 23:47:24 +0800
Subject: understanding the output from -info
Message-ID: <804ab5d40702080747g69ee8b44h54cce509177ee0a8@mail.gmail.com>

Hi,

i'm trying to solve my cfd code using PETSc in parallel. Besides the linear
eqns for PETSc, other parts of the code has also been parallelized using
MPI.

however i find that the parallel version of the code running on 4 processors
is even slower than the sequential version.

in order to find out why, i've used the -info option to print out the
details. there are 2 linear equations being solved - momentum and poisson.
the momentum one is twice the size of the poisson. it is shown below:

[0] User provided function(): (Fortran):PETSc successfully started: procs 4
[1] User provided function(): (Fortran):PETSc successfully started: procs 4
[3] User provided function(): (Fortran):PETSc successfully started: procs 4
[2] User provided function(): (Fortran):PETSc successfully started: procs 4
[0] PetscGetHostName(): Rejecting domainname, likely is NIS
atlas2-c12.(none)
[0] User provided function(): Running on machine: atlas2-c12
[1] PetscGetHostName(): Rejecting domainname, likely is NIS
atlas2-c12.(none)
[1] User provided function(): Running on machine: atlas2-c12
[3] PetscGetHostName(): Rejecting domainname, likely is NIS
atlas2-c08.(none)
[3] User provided function(): Running on machine: atlas2-c08
[2] PetscGetHostName(): Rejecting domainname, likely is NIS
atlas2-c08.(none)
[2] User provided function(): Running on machine: atlas2-c08
[0] PetscCommDuplicate(): Duplicating a communicator 91 141 max tags =
1073741823
[1] PetscCommDuplicate(): Duplicating a communicator 91 141 max tags =
1073741823
[2] PetscCommDuplicate(): Duplicating a communicator 91 141 max tags =
1073741823
[3] PetscCommDuplicate(): Duplicating a communicator 91 141 max tags =
1073741823
[0] PetscCommDuplicate(): Duplicating a communicator 92 143 max tags =
1073741823
[2] PetscCommDuplicate(): Duplicating a communicator 92 143 max tags =
1073741823
[0] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[2] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[2] PetscCommDuplicate(): Using internal PETSc communicator 91 141
[0] PetscCommDuplicate(): Using internal PETSc communicator 91 141
[1] PetscCommDuplicate(): Duplicating a communicator 92 143 max tags =
1073741823
[3] PetscCommDuplicate(): Duplicating a communicator 92 143 max tags =
1073741823
[1] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[3] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[1] PetscCommDuplicate(): Using internal PETSc communicator 91 141
[3] PetscCommDuplicate(): Using internal PETSc communicator 91 141
[0] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[0] PetscCommDuplicate(): Using internal PETSc communicator 92 143
           0        3200
[1] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[1] PetscCommDuplicate(): Using internal PETSc communicator 92 143
        3200        6400
[2] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[2] PetscCommDuplicate(): Using internal PETSc communicator 92 143
        6400        9600
[3] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[3] PetscCommDuplicate(): Using internal PETSc communicator 92 143
        9600       12800
[3] PetscCommDuplicate(): Using internal PETSc communicator 91 141
[0] PetscCommDuplicate(): Using internal PETSc communicator 91 141
[1] PetscCommDuplicate(): Using internal PETSc communicator 91 141
[2] PetscCommDuplicate(): Using internal PETSc communicator 91 141
[2] PetscCommDuplicate(): Using internal PETSc communicator 91 141
[0] PetscCommDuplicate(): Using internal PETSc communicator 91 141
[3] PetscCommDuplicate(): Using internal PETSc communicator 91 141
[1] PetscCommDuplicate(): Using internal PETSc communicator 91 141
[1] PetscCommDuplicate(): Using internal PETSc communicator 91 141
[1] PetscCommDuplicate(): Using internal PETSc communicator 91 141
[1] PetscCommDuplicate(): Using internal PETSc communicator 91 141
[3] PetscCommDuplicate(): Using internal PETSc communicator 91 141
[1] PetscCommDuplicate(): Using internal PETSc communicator 91 141
[3] PetscCommDuplicate(): Using internal PETSc communicator 91 141
        3200        6400
[3] PetscCommDuplicate(): Using internal PETSc communicator 91 141
[3] PetscCommDuplicate(): Using internal PETSc communicator 91 141
[3] PetscCommDuplicate(): Using internal PETSc communicator 91 141
[3] PetscCommDuplicate(): Using internal PETSc communicator 91 141
        9600       12800
[0] PetscCommDuplicate(): Using internal PETSc communicator 91 141
[2] PetscCommDuplicate(): Using internal PETSc communicator 91 141
[0] PetscCommDuplicate(): Using internal PETSc communicator 91 141
[2] PetscCommDuplicate(): Using internal PETSc communicator 91 141
[0] PetscCommDuplicate(): Using internal PETSc communicator 91 141
[2] PetscCommDuplicate(): Using internal PETSc communicator 91 141
[0] PetscCommDuplicate(): Using internal PETSc communicator 91 141
[2] PetscCommDuplicate(): Using internal PETSc communicator 91 141
           0        3200
        6400        9600
[1] MatStashScatterBegin_Private(): No of messages: 0
[1] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
[3] MatStashScatterBegin_Private(): No of messages: 0
[3] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
[3] MatAssemblyEnd_SeqAIJ(): Matrix size: 3200 X 3200; storage space: 4064
unneeded,53536 used
[1] MatAssemblyEnd_SeqAIJ(): Matrix size: 3200 X 3200; storage space: 4064
unneeded,53536 used
[3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
[3] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 18
[1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
[1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 18
[3] Mat_CheckInode(): Found 1600 nodes of 3200. Limit used: 5. Using Inode
routines
[1] Mat_CheckInode(): Found 1600 nodes of 3200. Limit used: 5. Using Inode
routines
[0] MatStashScatterBegin_Private(): No of messages: 0
[0] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
[2] MatStashScatterBegin_Private(): No of messages: 0
[2] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
[2] MatAssemblyEnd_SeqAIJ(): Matrix size: 3200 X 3200; storage space: 4064
unneeded,53536 used
[2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
[2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 18
[0] MatAssemblyEnd_SeqAIJ(): Matrix size: 3200 X 3200; storage space: 3120
unneeded,54480 used
[0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
[0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 18
[2] Mat_CheckInode(): Found 1600 nodes of 3200. Limit used: 5. Using Inode
routines
[0] Mat_CheckInode(): Found 1600 nodes of 3200. Limit used: 5. Using Inode
routines
[0] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[0] MatSetUpMultiply_MPIAIJ(): Using block index set to define scatter
[2] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[3] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[1] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[0] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[1] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[2] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[3] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[1] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter
[1] MatSetOption_Inode(): Not using Inode routines due to
MatSetOption(MAT_DO_NOT_USE_INODES
[1] MatAssemblyEnd_SeqAIJ(): Matrix size: 3200 X 640; storage space: 53776
unneeded,3824 used
[1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
[1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 6
[1] Mat_CheckCompressedRow(): Found the ratio (num_zerorows
2560)/(num_localrows 3200) > 0.6. Use CompressedRow routines.
[0] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter
[0] VecScatterCreate(): General case: MPI to Seq
[0] MatSetOption_Inode(): Not using Inode routines due to
MatSetOption(MAT_DO_NOT_USE_INODES
[0] MatAssemblyEnd_SeqAIJ(): Matrix size: 3200 X 320; storage space: 55688
unneeded,1912 used
[0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
[0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 6
[0] Mat_CheckCompressedRow(): Found the ratio (num_zerorows
2880)/(num_localrows 3200) > 0.6. Use CompressedRow routines.
[0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 6
[0] Mat_CheckCompressedRow(): Found the ratio (num_zerorows
2880)/(num_localrows 3200) > 0.6. Use CompressedRow routines.
[3] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter
[3] MatSetOption_Inode(): Not using Inode routines due to
MatSetOption(MAT_DO_NOT_USE_INODES
[3] MatAssemblyEnd_SeqAIJ(): Matrix size: 3200 X 320; storage space: 55688
unneeded,1912 used
[3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
[3] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 6
[3] Mat_CheckCompressedRow(): Found the ratio (num_zerorows
2880)/(num_localrows 3200) > 0.6. Use CompressedRow routines.
[2] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter
[2] MatSetOption_Inode(): Not using Inode routines due to
MatSetOption(MAT_DO_NOT_USE_INODES
[2] MatAssemblyEnd_SeqAIJ(): Matrix size: 3200 X 640; storage space: 53776
unneeded,3824 used
[2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
[2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 6
[2] Mat_CheckCompressedRow(): Found the ratio (num_zerorows
2560)/(num_localrows 3200) > 0.6. Use CompressedRow routines.
[0] VecAssemblyBegin_MPI(): Stash has 0 entries, uses 0 mallocs.
[0] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs.
[2] VecAssemblyBegin_MPI(): Stash has 0 entries, uses 0 mallocs.
[2] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs.
[1] VecAssemblyBegin_MPI(): Stash has 0 entries, uses 0 mallocs.
[1] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs.
[3] VecAssemblyBegin_MPI(): Stash has 0 entries, uses 0 mallocs.
[3] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs.
[1] VecAssemblyBegin_MPI(): Stash has 0 entries, uses 0 mallocs.
[1] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs.
[3] VecAssemblyBegin_MPI(): Stash has 0 entries, uses 0 mallocs.
[3] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs.
[2] VecAssemblyBegin_MPI(): Stash has 0 entries, uses 0 mallocs.
[2] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs.
[0] VecAssemblyBegin_MPI(): Stash has 0 entries, uses 0 mallocs.
[0] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs.
[0] PCSetUp(): Setting up new PC
[0] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[2] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[0] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[2] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[2] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[2] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[0] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[0] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[2] PCSetUp(): Setting up new PC
[2] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[2] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[0] PCSetUp(): Setting up new PC
[2] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[2] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[0] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[0] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[2] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[0] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[0] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[0] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[1] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[3] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[1] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[3] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[3] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[3] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[1] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[1] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[1] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[1] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[3] PCSetUp(): Setting up new PC
[3] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[3] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[1] PCSetUp(): Setting up new PC
[3] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[3] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[1] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[1] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[3] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[1] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[1] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[1] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[0] KSPDefaultConverged(): user has provided nonzero initial guess,
computing 2-norm of preconditioned RHS
[0] KSPDefaultConverged(): Linear solver has converged. Residual norm
1.00217e-05 is less than relative tolerance 1e-05 times initial right hand
side norm 6.98447 at iteration 5
[0] MatStashScatterBegin_Private(): No of messages: 0
[0] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
[0] MatAssemblyEnd_SeqAIJ(): Matrix size: 1600 X 1600; storage space: 774
unneeded,13626 used
[0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
[0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 9
[0] Mat_CheckInode(): Found 1600 nodes out of 1600 rows. Not using Inode
routines
[1] MatStashScatterBegin_Private(): No of messages: 0
[1] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
[1] MatAssemblyEnd_SeqAIJ(): Matrix size: 1600 X 1600; storage space: 1016
unneeded,13384 used
[1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
[1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 9
[1] Mat_CheckInode(): Found 1600 nodes out of 1600 rows. Not using Inode
routines
[2] MatStashScatterBegin_Private(): No of messages: 0
[2] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
[2] MatAssemblyEnd_SeqAIJ(): Matrix size: 1600 X 1600; storage space: 1016
unneeded,13384 used
[2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
[2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 9
[2] Mat_CheckInode(): Found 1600 nodes out of 1600 rows. Not using Inode
routines
[3] MatStashScatterBegin_Private(): No of messages: 0
[3] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
[3] MatAssemblyEnd_SeqAIJ(): Matrix size: 1600 X 1600; storage space: 1016
unneeded,13384 used
[3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
[3] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 9
[3] Mat_CheckInode(): Found 1600 nodes out of 1600 rows. Not using Inode
routines
[3] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[1] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[2] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[0] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[0] MatSetUpMultiply_MPIAIJ(): Using block index set to define scatter
[1] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[3] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[0] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[2] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[0] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter
[2] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter
[2] MatSetOption_Inode(): Not using Inode routines due to
MatSetOption(MAT_DO_NOT_USE_INODES
[2] MatAssemblyEnd_SeqAIJ(): Matrix size: 1600 X 320; storage space: 13444
unneeded,956 used
[2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
[0] VecScatterCreate(): General case: MPI to Seq
[2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 3
[2] Mat_CheckCompressedRow(): Found the ratio (num_zerorows
1280)/(num_localrows 1600) > 0.6. Use CompressedRow routines.
[0] MatSetOption_Inode(): Not using Inode routines due to
MatSetOption(MAT_DO_NOT_USE_INODES
[0] MatAssemblyEnd_SeqAIJ(): Matrix size: 1600 X 160; storage space: 13922
unneeded,478 used
[0] MatSetOption_Inode(): Not using Inode routines due to
MatSetOption(MAT_DO_NOT_USE_INODES
[0] MatAssemblyEnd_SeqAIJ(): Matrix size: 1600 X 160; storage space: 13922
unneeded,478 used
[0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
[0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 3
[0] Mat_CheckCompressedRow(): Found the ratio (num_zerorows
1440)/(num_localrows 1600) > 0.6. Use CompressedRow routines.
[3] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter
[3] MatSetOption_Inode(): Not using Inode routines due to
MatSetOption(MAT_DO_NOT_USE_INODES
[3] MatAssemblyEnd_SeqAIJ(): Matrix size: 1600 X 160; storage space: 13922
unneeded,478 used
[3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
[3] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 3
[3] Mat_CheckCompressedRow(): Found the ratio (num_zerorows
1440)/(num_localrows 1600) > 0.6. Use CompressedRow routines.
[1] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter
[1] MatSetOption_Inode(): Not using Inode routines due to
MatSetOption(MAT_DO_NOT_USE_INODES
[1] MatAssemblyEnd_SeqAIJ(): Matrix size: 1600 X 320; storage space: 13444
unneeded,956 used
[1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
[1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 3
[1] Mat_CheckCompressedRow(): Found the ratio (num_zerorows
1280)/(num_localrows 1600) > 0.6. Use CompressedRow routines.
[1] VecAssemblyBegin_MPI(): Stash has 0 entries, uses 0 mallocs.
[3] VecAssemblyBegin_MPI(): Stash has 0 entries, uses 0 mallocs.
[1] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs.
[3] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs.
[2] VecAssemblyBegin_MPI(): Stash has 0 entries, uses 0 mallocs.
[2] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs.
[0] VecAssemblyBegin_MPI(): Stash has 0 entries, uses 0 mallocs.
[0] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs.
[0] VecAssemblyBegin_MPI(): Stash has 0 entries, uses 0 mallocs.
[2] VecAssemblyBegin_MPI(): Stash has 0 entries, uses 0 mallocs.
[2] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs.
[0] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs.
[1] VecAssemblyBegin_MPI(): Stash has 0 entries, uses 0 mallocs.
[3] VecAssemblyBegin_MPI(): Stash has 0 entries, uses 0 mallocs.
[3] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs.
[1] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs.
[3] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[3] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[3] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[3] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[3] PCSetUp(): Setting up new PC
[3] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[3] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[3] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[1] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[1] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[1] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[1] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[1] PCSetUp(): Setting up new PC
[1] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[1] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[1] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[2] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[2] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[2] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[2] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[2] PCSetUp(): Setting up new PC
[2] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[2] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[2] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[0] PCSetUp(): Setting up new PC
[0] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[0] PCSetUp(): Setting up new PC
[0] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[0] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[0] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[0] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[0] PCSetUp(): Setting up new PC
[0] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[0] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[0] PetscCommDuplicate(): Using internal PETSc communicator 92 143
[0] KSPDefaultConverged(): Linear solver has converged. Residual norm
8.84097e-05 is less than relative tolerance 1e-05 times initial right hand
side norm 8.96753 at iteration 212
           1  1.000000000000000E-002   1.15678640520876
  0.375502846664950


i saw some statements stating "seq". am i running in sequential or parallel
mode? have i preallocated too much space?

lastly, if Ax=b, A_sta and A_end from
MatGetOwnershipRange<file:///D:/PhD/CODES/petsc-2.3.2-p6/docs/manualpages/Mat/MatGetOwnershipRange.html#MatGetOwnershipRange>
and
b_sta and b_end from
VecGetOwnershipRange<file:///D:/PhD/CODES/petsc-2.3.2-p6/docs/manualpages/Vec/VecGetOwnershipRange.html#VecGetOwnershipRange>
should
always be the same value, right?

Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070208/58eaf902/attachment.htm>

From dalcinl at gmail.com  Thu Feb  8 10:50:17 2007
From: dalcinl at gmail.com (Lisandro Dalcin)
Date: Thu, 8 Feb 2007 13:50:17 -0300
Subject: understanding the output from -info
In-Reply-To: <804ab5d40702080747g69ee8b44h54cce509177ee0a8@mail.gmail.com>
References: <804ab5d40702080747g69ee8b44h54cce509177ee0a8@mail.gmail.com>
Message-ID: <e7ba66e40702080850j15f34518n468fbc0ab4cf26bb@mail.gmail.com>

On 2/8/07, Ben Tay <zonexo at gmail.com> wrote:
> i'm trying to solve my cfd code using PETSc in parallel. Besides the linear
> eqns for PETSc, other parts of the code has also been parallelized using
> MPI.

Finite elements or finite differences, or what?

> however i find that the parallel version of the code running on 4 processors
> is even slower than the sequential version.

Can you monitor the convergence and iteration count of momentum and
poisson steps?


> in order to find out why, i've used the -info option to print out the
> details. there are 2 linear equations being solved - momentum and poisson.
> the momentum one is twice the size of the poisson. it is shown below:

Can you use -log_summary command line option and send the output attached?

> i saw some statements stating "seq". am i running in sequential or parallel
> mode? have i preallocated too much space?

It seems you are running in parallel. The "Seq" are related to local,
internal objects. In PETSc, parallel matrices have inner sequential
matrices.

> lastly, if Ax=b, A_sta and A_end from  MatGetOwnershipRange and b_sta and
> b_end from VecGetOwnershipRange should always be the same value, right?

I should. If not, you are likely going to get an runtime error.

Regards,

-- 
Lisandro Dalc?n
---------------
Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
PTLC - G?emes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



From zonexo at gmail.com  Fri Feb  9 06:34:34 2007
From: zonexo at gmail.com (Ben Tay)
Date: Fri, 9 Feb 2007 20:34:34 +0800
Subject: understanding the output from -info
In-Reply-To: <e7ba66e40702080850j15f34518n468fbc0ab4cf26bb@mail.gmail.com>
References: <804ab5d40702080747g69ee8b44h54cce509177ee0a8@mail.gmail.com>
	 <e7ba66e40702080850j15f34518n468fbc0ab4cf26bb@mail.gmail.com>
Message-ID: <804ab5d40702090434w4f0674e6s1c936cb410f3744a@mail.gmail.com>

Hi,

I've tried to use log_summary but nothing came out? Did I miss out
something? It worked when I used -info...


On 2/9/07, Lisandro Dalcin <dalcinl at gmail.com> wrote:
>
> On 2/8/07, Ben Tay <zonexo at gmail.com> wrote:
> > i'm trying to solve my cfd code using PETSc in parallel. Besides the
> linear
> > eqns for PETSc, other parts of the code has also been parallelized using
> > MPI.
>
> Finite elements or finite differences, or what?
>
> > however i find that the parallel version of the code running on 4
> processors
> > is even slower than the sequential version.
>
> Can you monitor the convergence and iteration count of momentum and
> poisson steps?
>
>
> > in order to find out why, i've used the -info option to print out the
> > details. there are 2 linear equations being solved - momentum and
> poisson.
> > the momentum one is twice the size of the poisson. it is shown below:
>
> Can you use -log_summary command line option and send the output attached?
>
> > i saw some statements stating "seq". am i running in sequential or
> parallel
> > mode? have i preallocated too much space?
>
> It seems you are running in parallel. The "Seq" are related to local,
> internal objects. In PETSc, parallel matrices have inner sequential
> matrices.
>
> > lastly, if Ax=b, A_sta and A_end from  MatGetOwnershipRange and b_sta
> and
> > b_end from VecGetOwnershipRange should always be the same value, right?
>
> I should. If not, you are likely going to get an runtime error.
>
> Regards,
>
> --
> Lisandro Dalc?n
> ---------------
> Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
> Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
> Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
> PTLC - G?emes 3450, (3000) Santa Fe, Argentina
> Tel/Fax: +54-(0)342-451.1594
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070209/77850adc/attachment.htm>

From bsmith at mcs.anl.gov  Fri Feb  9 08:01:09 2007
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Fri, 9 Feb 2007 08:01:09 -0600 (CST)
Subject: understanding the output from -info
In-Reply-To: <804ab5d40702090434w4f0674e6s1c936cb410f3744a@mail.gmail.com>
References: <804ab5d40702080747g69ee8b44h54cce509177ee0a8@mail.gmail.com> 
 <e7ba66e40702080850j15f34518n468fbc0ab4cf26bb@mail.gmail.com>
 <804ab5d40702090434w4f0674e6s1c936cb410f3744a@mail.gmail.com>
Message-ID: <Pine.OSX.4.64.0702090801030.20722@barry-smiths-computer.local>


  -log_summary


On Fri, 9 Feb 2007, Ben Tay wrote:

> Hi,
> 
> I've tried to use log_summary but nothing came out? Did I miss out
> something? It worked when I used -info...
> 
> 
> On 2/9/07, Lisandro Dalcin <dalcinl at gmail.com> wrote:
> >
> > On 2/8/07, Ben Tay <zonexo at gmail.com> wrote:
> > > i'm trying to solve my cfd code using PETSc in parallel. Besides the
> > linear
> > > eqns for PETSc, other parts of the code has also been parallelized using
> > > MPI.
> >
> > Finite elements or finite differences, or what?
> >
> > > however i find that the parallel version of the code running on 4
> > processors
> > > is even slower than the sequential version.
> >
> > Can you monitor the convergence and iteration count of momentum and
> > poisson steps?
> >
> >
> > > in order to find out why, i've used the -info option to print out the
> > > details. there are 2 linear equations being solved - momentum and
> > poisson.
> > > the momentum one is twice the size of the poisson. it is shown below:
> >
> > Can you use -log_summary command line option and send the output attached?
> >
> > > i saw some statements stating "seq". am i running in sequential or
> > parallel
> > > mode? have i preallocated too much space?
> >
> > It seems you are running in parallel. The "Seq" are related to local,
> > internal objects. In PETSc, parallel matrices have inner sequential
> > matrices.
> >
> > > lastly, if Ax=b, A_sta and A_end from  MatGetOwnershipRange and b_sta
> > and
> > > b_end from VecGetOwnershipRange should always be the same value, right?
> >
> > I should. If not, you are likely going to get an runtime error.
> >
> > Regards,
> >
> > --
> > Lisandro Dalc??n
> > ---------------
> > Centro Internacional de M??todos Computacionales en Ingenier??a (CIMEC)
> > Instituto de Desarrollo Tecnol??gico para la Industria Qu??mica (INTEC)
> > Consejo Nacional de Investigaciones Cient??ficas y T??cnicas (CONICET)
> > PTLC - G??emes 3450, (3000) Santa Fe, Argentina
> > Tel/Fax: +54-(0)342-451.1594
> >
> >
> 

From zonexo at gmail.com  Fri Feb  9 08:20:47 2007
From: zonexo at gmail.com (Ben Tay)
Date: Fri, 9 Feb 2007 22:20:47 +0800
Subject: understanding the output from -info
In-Reply-To: <Pine.OSX.4.64.0702090801030.20722@barry-smiths-computer.local>
References: <804ab5d40702080747g69ee8b44h54cce509177ee0a8@mail.gmail.com>
	 <e7ba66e40702080850j15f34518n468fbc0ab4cf26bb@mail.gmail.com>
	 <804ab5d40702090434w4f0674e6s1c936cb410f3744a@mail.gmail.com>
	 <Pine.OSX.4.64.0702090801030.20722@barry-smiths-computer.local>
Message-ID: <804ab5d40702090620u5cf86c51s4e1b7b724eaf4f98@mail.gmail.com>

ya, i did use -log_summary. but no output.....

On 2/9/07, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
>
> -log_summary
>
>
> On Fri, 9 Feb 2007, Ben Tay wrote:
>
> > Hi,
> >
> > I've tried to use log_summary but nothing came out? Did I miss out
> > something? It worked when I used -info...
> >
> >
> > On 2/9/07, Lisandro Dalcin <dalcinl at gmail.com> wrote:
> > >
> > > On 2/8/07, Ben Tay <zonexo at gmail.com> wrote:
> > > > i'm trying to solve my cfd code using PETSc in parallel. Besides the
> > > linear
> > > > eqns for PETSc, other parts of the code has also been parallelized
> using
> > > > MPI.
> > >
> > > Finite elements or finite differences, or what?
> > >
> > > > however i find that the parallel version of the code running on 4
> > > processors
> > > > is even slower than the sequential version.
> > >
> > > Can you monitor the convergence and iteration count of momentum and
> > > poisson steps?
> > >
> > >
> > > > in order to find out why, i've used the -info option to print out
> the
> > > > details. there are 2 linear equations being solved - momentum and
> > > poisson.
> > > > the momentum one is twice the size of the poisson. it is shown
> below:
> > >
> > > Can you use -log_summary command line option and send the output
> attached?
> > >
> > > > i saw some statements stating "seq". am i running in sequential or
> > > parallel
> > > > mode? have i preallocated too much space?
> > >
> > > It seems you are running in parallel. The "Seq" are related to local,
> > > internal objects. In PETSc, parallel matrices have inner sequential
> > > matrices.
> > >
> > > > lastly, if Ax=b, A_sta and A_end from  MatGetOwnershipRange and
> b_sta
> > > and
> > > > b_end from VecGetOwnershipRange should always be the same value,
> right?
> > >
> > > I should. If not, you are likely going to get an runtime error.
> > >
> > > Regards,
> > >
> > > --
> > > Lisandro Dalc?n
> > > ---------------
> > > Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
> > > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
> > > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
> > > PTLC - G?emes 3450, (3000) Santa Fe, Argentina
> > > Tel/Fax: +54-(0)342-451.1594
> > >
> > >
> >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070209/16a9c0d7/attachment.htm>

From knepley at gmail.com  Fri Feb  9 08:59:16 2007
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 9 Feb 2007 08:59:16 -0600
Subject: understanding the output from -info
In-Reply-To: <804ab5d40702090620u5cf86c51s4e1b7b724eaf4f98@mail.gmail.com>
References: <804ab5d40702080747g69ee8b44h54cce509177ee0a8@mail.gmail.com>
	 <e7ba66e40702080850j15f34518n468fbc0ab4cf26bb@mail.gmail.com>
	 <804ab5d40702090434w4f0674e6s1c936cb410f3744a@mail.gmail.com>
	 <Pine.OSX.4.64.0702090801030.20722@barry-smiths-computer.local>
	 <804ab5d40702090620u5cf86c51s4e1b7b724eaf4f98@mail.gmail.com>
Message-ID: <a9f269830702090659n323f9381vf9b963aee438244e@mail.gmail.com>

Impossible, please check the spelling, and make sure your
command line was not truncated.

  Matt

On 2/9/07, Ben Tay <zonexo at gmail.com> wrote:
>
> ya, i did use -log_summary. but no output.....
>
> On 2/9/07, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> >
> > -log_summary
> >
> >
> > On Fri, 9 Feb 2007, Ben Tay wrote:
> >
> > > Hi,
> > >
> > > I've tried to use log_summary but nothing came out? Did I miss out
> > > something? It worked when I used -info...
> > >
> > >
> > > On 2/9/07, Lisandro Dalcin <dalcinl at gmail.com> wrote:
> > > >
> > > > On 2/8/07, Ben Tay < zonexo at gmail.com> wrote:
> > > > > i'm trying to solve my cfd code using PETSc in parallel. Besides
> > the
> > > > linear
> > > > > eqns for PETSc, other parts of the code has also been parallelized
> > using
> > > > > MPI.
> > > >
> > > > Finite elements or finite differences, or what?
> > > >
> > > > > however i find that the parallel version of the code running on 4
> > > > processors
> > > > > is even slower than the sequential version.
> > > >
> > > > Can you monitor the convergence and iteration count of momentum and
> > > > poisson steps?
> > > >
> > > >
> > > > > in order to find out why, i've used the -info option to print out
> > the
> > > > > details. there are 2 linear equations being solved - momentum and
> > > > poisson.
> > > > > the momentum one is twice the size of the poisson. it is shown
> > below:
> > > >
> > > > Can you use -log_summary command line option and send the output
> > attached?
> > > >
> > > > > i saw some statements stating "seq". am i running in sequential or
> > > > parallel
> > > > > mode? have i preallocated too much space?
> > > >
> > > > It seems you are running in parallel. The "Seq" are related to
> > local,
> > > > internal objects. In PETSc, parallel matrices have inner sequential
> > > > matrices.
> > > >
> > > > > lastly, if Ax=b, A_sta and A_end from  MatGetOwnershipRange and
> > b_sta
> > > > and
> > > > > b_end from VecGetOwnershipRange should always be the same value,
> > right?
> > > >
> > > > I should. If not, you are likely going to get an runtime error.
> > > >
> > > > Regards,
> > > >
> > > > --
> > > > Lisandro Dalc?n
> > > > ---------------
> > > > Centro Internacional de M?todos Computacionales en Ingenier?a
> > (CIMEC)
> > > > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica
> > (INTEC)
> > > > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
> > > > PTLC - G?emes 3450, (3000) Santa Fe, Argentina
> > > > Tel/Fax: +54-(0)342-451.1594
> > > >
> > > >
> > >
>
>
>


-- 
One trouble is that despite this system, anyone who reads journals widely
and critically is forced to realize that there are scarcely any bars to
eventual
publication. There seems to be no study too fragmented, no hypothesis too
trivial, no literature citation too biased or too egotistical, no design too
warped, no methodology too bungled, no presentation of results too
inaccurate, too obscure, and too contradictory, no analysis too
self-serving,
no argument too circular, no conclusions too trifling or too unjustified,
and
no grammar and syntax too offensive for a paper to end up in print. --
Drummond Rennie
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070209/0a5aaa01/attachment.htm>

From zonexo at gmail.com  Fri Feb  9 09:24:07 2007
From: zonexo at gmail.com (Ben Tay)
Date: Fri, 9 Feb 2007 23:24:07 +0800
Subject: understanding the output from -info
In-Reply-To: <a9f269830702090659n323f9381vf9b963aee438244e@mail.gmail.com>
References: <804ab5d40702080747g69ee8b44h54cce509177ee0a8@mail.gmail.com>
	 <e7ba66e40702080850j15f34518n468fbc0ab4cf26bb@mail.gmail.com>
	 <804ab5d40702090434w4f0674e6s1c936cb410f3744a@mail.gmail.com>
	 <Pine.OSX.4.64.0702090801030.20722@barry-smiths-computer.local>
	 <804ab5d40702090620u5cf86c51s4e1b7b724eaf4f98@mail.gmail.com>
	 <a9f269830702090659n323f9381vf9b963aee438244e@mail.gmail.com>
Message-ID: <804ab5d40702090724n73db6f8w574622903161eb4a@mail.gmail.com>

Well, I don't know what's wrong. I did the same thing for -info and it
worked. Anyway, is there any other way?

Like I can use -mat_view or call matview( ... ) to view a matrix. Is there a
similar subroutine for me to call?

Thank you.


On 2/9/07, Matthew Knepley <knepley at gmail.com> wrote:
>
> Impossible, please check the spelling, and make sure your
> command line was not truncated.
>
>   Matt
>
> On 2/9/07, Ben Tay < zonexo at gmail.com> wrote:
> >
> > ya, i did use -log_summary. but no output.....
> >
> > On 2/9/07, Barry Smith <bsmith at mcs.anl.gov> wrote:
> > >
> > >
> > > -log_summary
> > >
> > >
> > > On Fri, 9 Feb 2007, Ben Tay wrote:
> > >
> > > > Hi,
> > > >
> > > > I've tried to use log_summary but nothing came out? Did I miss out
> > > > something? It worked when I used -info...
> > > >
> > > >
> > > > On 2/9/07, Lisandro Dalcin <dalcinl at gmail.com > wrote:
> > > > >
> > > > > On 2/8/07, Ben Tay < zonexo at gmail.com> wrote:
> > > > > > i'm trying to solve my cfd code using PETSc in parallel. Besides
> > > the
> > > > > linear
> > > > > > eqns for PETSc, other parts of the code has also been
> > > parallelized using
> > > > > > MPI.
> > > > >
> > > > > Finite elements or finite differences, or what?
> > > > >
> > > > > > however i find that the parallel version of the code running on
> > > 4
> > > > > processors
> > > > > > is even slower than the sequential version.
> > > > >
> > > > > Can you monitor the convergence and iteration count of momentum
> > > and
> > > > > poisson steps?
> > > > >
> > > > >
> > > > > > in order to find out why, i've used the -info option to print
> > > out the
> > > > > > details. there are 2 linear equations being solved - momentum
> > > and
> > > > > poisson.
> > > > > > the momentum one is twice the size of the poisson. it is shown
> > > below:
> > > > >
> > > > > Can you use -log_summary command line option and send the output
> > > attached?
> > > > >
> > > > > > i saw some statements stating "seq". am i running in sequential
> > > or
> > > > > parallel
> > > > > > mode? have i preallocated too much space?
> > > > >
> > > > > It seems you are running in parallel. The "Seq" are related to
> > > local,
> > > > > internal objects. In PETSc, parallel matrices have inner
> > > sequential
> > > > > matrices.
> > > > >
> > > > > > lastly, if Ax=b, A_sta and A_end from  MatGetOwnershipRange and
> > > b_sta
> > > > > and
> > > > > > b_end from VecGetOwnershipRange should always be the same value,
> > > right?
> > > > >
> > > > > I should. If not, you are likely going to get an runtime error.
> > > > >
> > > > > Regards,
> > > > >
> > > > > --
> > > > > Lisandro Dalc?n
> > > > > ---------------
> > > > > Centro Internacional de M?todos Computacionales en Ingenier?a
> > > (CIMEC)
> > > > > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica
> > > (INTEC)
> > > > > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas
> > > (CONICET)
> > > > > PTLC - G?emes 3450, (3000) Santa Fe, Argentina
> > > > > Tel/Fax: +54-(0)342-451.1594
> > > > >
> > > > >
> > > >
> >
> >
> >
>
>
> --
> One trouble is that despite this system, anyone who reads journals widely
> and critically is forced to realize that there are scarcely any bars to
> eventual
> publication. There seems to be no study too fragmented, no hypothesis too
> trivial, no literature citation too biased or too egotistical, no design
> too
> warped, no methodology too bungled, no presentation of results too
> inaccurate, too obscure, and too contradictory, no analysis too
> self-serving,
> no argument too circular, no conclusions too trifling or too unjustified,
> and
> no grammar and syntax too offensive for a paper to end up in print. --
> Drummond Rennie
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070209/695a3f78/attachment.htm>

From knepley at gmail.com  Fri Feb  9 09:27:30 2007
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 9 Feb 2007 09:27:30 -0600
Subject: understanding the output from -info
In-Reply-To: <804ab5d40702090724n73db6f8w574622903161eb4a@mail.gmail.com>
References: <804ab5d40702080747g69ee8b44h54cce509177ee0a8@mail.gmail.com>
	 <e7ba66e40702080850j15f34518n468fbc0ab4cf26bb@mail.gmail.com>
	 <804ab5d40702090434w4f0674e6s1c936cb410f3744a@mail.gmail.com>
	 <Pine.OSX.4.64.0702090801030.20722@barry-smiths-computer.local>
	 <804ab5d40702090620u5cf86c51s4e1b7b724eaf4f98@mail.gmail.com>
	 <a9f269830702090659n323f9381vf9b963aee438244e@mail.gmail.com>
	 <804ab5d40702090724n73db6f8w574622903161eb4a@mail.gmail.com>
Message-ID: <a9f269830702090727q2e875049v8007b5282d1455f0@mail.gmail.com>

Problems do not go away by ignoring them. Something is wrong here, and it
may
affect the rest of your program. Please try to run an example:

  cd src/ksp/ksp/examples/tutorials
  make ex2
  ./ex2 -log_summary

     Matt

On 2/9/07, Ben Tay <zonexo at gmail.com> wrote:
>
> Well, I don't know what's wrong. I did the same thing for -info and it
> worked. Anyway, is there any other way?
>
> Like I can use -mat_view or call matview( ... ) to view a matrix. Is there
> a similar subroutine for me to call?
>
> Thank you.
>
>
> On 2/9/07, Matthew Knepley <knepley at gmail.com> wrote:
> >
> > Impossible, please check the spelling, and make sure your
> > command line was not truncated.
> >
> >   Matt
> >
> > On 2/9/07, Ben Tay < zonexo at gmail.com> wrote:
> > >
> > > ya, i did use -log_summary. but no output.....
> > >
> > > On 2/9/07, Barry Smith <bsmith at mcs.anl.gov> wrote:
> > > >
> > > >
> > > > -log_summary
> > > >
> > > >
> > > > On Fri, 9 Feb 2007, Ben Tay wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I've tried to use log_summary but nothing came out? Did I miss out
> > > >
> > > > > something? It worked when I used -info...
> > > > >
> > > > >
> > > > > On 2/9/07, Lisandro Dalcin <dalcinl at gmail.com > wrote:
> > > > > >
> > > > > > On 2/8/07, Ben Tay < zonexo at gmail.com> wrote:
> > > > > > > i'm trying to solve my cfd code using PETSc in parallel.
> > > > Besides the
> > > > > > linear
> > > > > > > eqns for PETSc, other parts of the code has also been
> > > > parallelized using
> > > > > > > MPI.
> > > > > >
> > > > > > Finite elements or finite differences, or what?
> > > > > >
> > > > > > > however i find that the parallel version of the code running
> > > > on 4
> > > > > > processors
> > > > > > > is even slower than the sequential version.
> > > > > >
> > > > > > Can you monitor the convergence and iteration count of momentum
> > > > and
> > > > > > poisson steps?
> > > > > >
> > > > > >
> > > > > > > in order to find out why, i've used the -info option to print
> > > > out the
> > > > > > > details. there are 2 linear equations being solved - momentum
> > > > and
> > > > > > poisson.
> > > > > > > the momentum one is twice the size of the poisson. it is shown
> > > > below:
> > > > > >
> > > > > > Can you use -log_summary command line option and send the output
> > > > attached?
> > > > > >
> > > > > > > i saw some statements stating "seq". am i running in
> > > > sequential or
> > > > > > parallel
> > > > > > > mode? have i preallocated too much space?
> > > > > >
> > > > > > It seems you are running in parallel. The "Seq" are related to
> > > > local,
> > > > > > internal objects. In PETSc, parallel matrices have inner
> > > > sequential
> > > > > > matrices.
> > > > > >
> > > > > > > lastly, if Ax=b, A_sta and A_end from  MatGetOwnershipRange
> > > > and b_sta
> > > > > > and
> > > > > > > b_end from VecGetOwnershipRange should always be the same
> > > > value, right?
> > > > > >
> > > > > > I should. If not, you are likely going to get an runtime error.
> > > > > >
> > > > > > Regards,
> > > > > >
> > > > > > --
> > > > > > Lisandro Dalc?n
> > > > > > ---------------
> > > > > > Centro Internacional de M?todos Computacionales en Ingenier?a
> > > > (CIMEC)
> > > > > > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica
> > > > (INTEC)
> > > > > > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas
> > > > (CONICET)
> > > > > > PTLC - G?emes 3450, (3000) Santa Fe, Argentina
> > > > > > Tel/Fax: +54-(0)342-451.1594
> > > > > >
> > > > > >
> > > > >
> > >
> > >
> > >
> >
> >
> > --
> > One trouble is that despite this system, anyone who reads journals
> > widely
> > and critically is forced to realize that there are scarcely any bars to
> > eventual
> > publication. There seems to be no study too fragmented, no hypothesis
> > too
> > trivial, no literature citation too biased or too egotistical, no design
> > too
> > warped, no methodology too bungled, no presentation of results too
> > inaccurate, too obscure, and too contradictory, no analysis too
> > self-serving,
> > no argument too circular, no conclusions too trifling or too
> > unjustified, and
> > no grammar and syntax too offensive for a paper to end up in print. --
> > Drummond Rennie
>
>
>


-- 
One trouble is that despite this system, anyone who reads journals widely
and critically is forced to realize that there are scarcely any bars to
eventual
publication. There seems to be no study too fragmented, no hypothesis too
trivial, no literature citation too biased or too egotistical, no design too
warped, no methodology too bungled, no presentation of results too
inaccurate, too obscure, and too contradictory, no analysis too
self-serving,
no argument too circular, no conclusions too trifling or too unjustified,
and
no grammar and syntax too offensive for a paper to end up in print. --
Drummond Rennie
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070209/f99fb87e/attachment.htm>

From zonexo at gmail.com  Fri Feb  9 10:16:56 2007
From: zonexo at gmail.com (Ben Tay)
Date: Sat, 10 Feb 2007 00:16:56 +0800
Subject: understanding the output from -info
In-Reply-To: <a9f269830702090727q2e875049v8007b5282d1455f0@mail.gmail.com>
References: <804ab5d40702080747g69ee8b44h54cce509177ee0a8@mail.gmail.com>
	 <e7ba66e40702080850j15f34518n468fbc0ab4cf26bb@mail.gmail.com>
	 <804ab5d40702090434w4f0674e6s1c936cb410f3744a@mail.gmail.com>
	 <Pine.OSX.4.64.0702090801030.20722@barry-smiths-computer.local>
	 <804ab5d40702090620u5cf86c51s4e1b7b724eaf4f98@mail.gmail.com>
	 <a9f269830702090659n323f9381vf9b963aee438244e@mail.gmail.com>
	 <804ab5d40702090724n73db6f8w574622903161eb4a@mail.gmail.com>
	 <a9f269830702090727q2e875049v8007b5282d1455f0@mail.gmail.com>
Message-ID: <804ab5d40702090816qb6d1325g1d311a0eb53eec26@mail.gmail.com>

ops.... it worked for ex2 and ex2f  ;-)

so what could be wrong? is there some commands or subroutine which i must
call? btw, i'm programming in fortran.

thank you.


On 2/9/07, Matthew Knepley <knepley at gmail.com> wrote:
>
> Problems do not go away by ignoring them. Something is wrong here, and it
> may
> affect the rest of your program. Please try to run an example:
>
>   cd src/ksp/ksp/examples/tutorials
>   make ex2
>   ./ex2 -log_summary
>
>      Matt
>
> On 2/9/07, Ben Tay <zonexo at gmail.com> wrote:
> >
> > Well, I don't know what's wrong. I did the same thing for -info and it
> > worked. Anyway, is there any other way?
> >
> > Like I can use -mat_view or call matview( ... ) to view a matrix. Is
> > there a similar subroutine for me to call?
> >
> > Thank you.
> >
> >
> >  On 2/9/07, Matthew Knepley <knepley at gmail.com > wrote:
> > >
> > > Impossible, please check the spelling, and make sure your
> > > command line was not truncated.
> > >
> > >   Matt
> > >
> > > On 2/9/07, Ben Tay < zonexo at gmail.com> wrote:
> > > >
> > > > ya, i did use -log_summary. but no output.....
> > > >
> > > > On 2/9/07, Barry Smith <bsmith at mcs.anl.gov> wrote:
> > > > >
> > > > >
> > > > > -log_summary
> > > > >
> > > > >
> > > > > On Fri, 9 Feb 2007, Ben Tay wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I've tried to use log_summary but nothing came out? Did I miss
> > > > > out
> > > > > > something? It worked when I used -info...
> > > > > >
> > > > > >
> > > > > > On 2/9/07, Lisandro Dalcin <dalcinl at gmail.com > wrote:
> > > > > > >
> > > > > > > On 2/8/07, Ben Tay < zonexo at gmail.com> wrote:
> > > > > > > > i'm trying to solve my cfd code using PETSc in parallel.
> > > > > Besides the
> > > > > > > linear
> > > > > > > > eqns for PETSc, other parts of the code has also been
> > > > > parallelized using
> > > > > > > > MPI.
> > > > > > >
> > > > > > > Finite elements or finite differences, or what?
> > > > > > >
> > > > > > > > however i find that the parallel version of the code running
> > > > > on 4
> > > > > > > processors
> > > > > > > > is even slower than the sequential version.
> > > > > > >
> > > > > > > Can you monitor the convergence and iteration count of
> > > > > momentum and
> > > > > > > poisson steps?
> > > > > > >
> > > > > > >
> > > > > > > > in order to find out why, i've used the -info option to
> > > > > print out the
> > > > > > > > details. there are 2 linear equations being solved -
> > > > > momentum and
> > > > > > > poisson.
> > > > > > > > the momentum one is twice the size of the poisson. it is
> > > > > shown below:
> > > > > > >
> > > > > > > Can you use -log_summary command line option and send the
> > > > > output attached?
> > > > > > >
> > > > > > > > i saw some statements stating "seq". am i running in
> > > > > sequential or
> > > > > > > parallel
> > > > > > > > mode? have i preallocated too much space?
> > > > > > >
> > > > > > > It seems you are running in parallel. The "Seq" are related to
> > > > > local,
> > > > > > > internal objects. In PETSc, parallel matrices have inner
> > > > > sequential
> > > > > > > matrices.
> > > > > > >
> > > > > > > > lastly, if Ax=b, A_sta and A_end from  MatGetOwnershipRange
> > > > > and b_sta
> > > > > > > and
> > > > > > > > b_end from VecGetOwnershipRange should always be the same
> > > > > value, right?
> > > > > > >
> > > > > > > I should. If not, you are likely going to get an runtime
> > > > > error.
> > > > > > >
> > > > > > > Regards,
> > > > > > >
> > > > > > > --
> > > > > > > Lisandro Dalc?n
> > > > > > > ---------------
> > > > > > > Centro Internacional de M?todos Computacionales en Ingenier?a
> > > > > (CIMEC)
> > > > > > > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica
> > > > > (INTEC)
> > > > > > > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas
> > > > > (CONICET)
> > > > > > > PTLC - G?emes 3450, (3000) Santa Fe, Argentina
> > > > > > > Tel/Fax: +54-(0)342-451.1594
> > > > > > >
> > > > > > >
> > > > > >
> > > >
> > > >
> > > >
> > >
> > >
> > > --
> > > One trouble is that despite this system, anyone who reads journals
> > > widely
> > > and critically is forced to realize that there are scarcely any bars
> > > to eventual
> > > publication. There seems to be no study too fragmented, no hypothesis
> > > too
> > > trivial, no literature citation too biased or too egotistical, no
> > > design too
> > > warped, no methodology too bungled, no presentation of results too
> > > inaccurate, too obscure, and too contradictory, no analysis too
> > > self-serving,
> > > no argument too circular, no conclusions too trifling or too
> > > unjustified, and
> > > no grammar and syntax too offensive for a paper to end up in print. --
> > > Drummond Rennie
> >
> >
> >
>
>
> --
> One trouble is that despite this system, anyone who reads journals widely
> and critically is forced to realize that there are scarcely any bars to
> eventual
> publication. There seems to be no study too fragmented, no hypothesis too
> trivial, no literature citation too biased or too egotistical, no design
> too
> warped, no methodology too bungled, no presentation of results too
> inaccurate, too obscure, and too contradictory, no analysis too
> self-serving,
> no argument too circular, no conclusions too trifling or too unjustified,
> and
> no grammar and syntax too offensive for a paper to end up in print. --
> Drummond Rennie
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070210/e2595200/attachment.htm>

From knepley at gmail.com  Fri Feb  9 10:20:13 2007
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 9 Feb 2007 10:20:13 -0600
Subject: understanding the output from -info
In-Reply-To: <804ab5d40702090816qb6d1325g1d311a0eb53eec26@mail.gmail.com>
References: <804ab5d40702080747g69ee8b44h54cce509177ee0a8@mail.gmail.com>
	 <e7ba66e40702080850j15f34518n468fbc0ab4cf26bb@mail.gmail.com>
	 <804ab5d40702090434w4f0674e6s1c936cb410f3744a@mail.gmail.com>
	 <Pine.OSX.4.64.0702090801030.20722@barry-smiths-computer.local>
	 <804ab5d40702090620u5cf86c51s4e1b7b724eaf4f98@mail.gmail.com>
	 <a9f269830702090659n323f9381vf9b963aee438244e@mail.gmail.com>
	 <804ab5d40702090724n73db6f8w574622903161eb4a@mail.gmail.com>
	 <a9f269830702090727q2e875049v8007b5282d1455f0@mail.gmail.com>
	 <804ab5d40702090816qb6d1325g1d311a0eb53eec26@mail.gmail.com>
Message-ID: <a9f269830702090820ha6a989br4ef18bedc1662665@mail.gmail.com>

On 2/9/07, Ben Tay <zonexo at gmail.com> wrote:
>
> ops.... it worked for ex2 and ex2f  ;-)
>
> so what could be wrong? is there some commands or subroutine which i must
> call? btw, i'm programming in fortran.
>

Yes, you must call PetscFinalize() in your code.

  Matt

thank you.
>
>
> On 2/9/07, Matthew Knepley <knepley at gmail.com> wrote:
> >
> > Problems do not go away by ignoring them. Something is wrong here, and
> > it may
> > affect the rest of your program. Please try to run an example:
> >
> >   cd src/ksp/ksp/examples/tutorials
> >   make ex2
> >   ./ex2 -log_summary
> >
> >      Matt
> >
> > On 2/9/07, Ben Tay <zonexo at gmail.com> wrote:
> > >
> > > Well, I don't know what's wrong. I did the same thing for -info and it
> > > worked. Anyway, is there any other way?
> > >
> > > Like I can use -mat_view or call matview( ... ) to view a matrix. Is
> > > there a similar subroutine for me to call?
> > >
> > > Thank you.
> > >
> > >
> > >  On 2/9/07, Matthew Knepley <knepley at gmail.com > wrote:
> > > >
> > > > Impossible, please check the spelling, and make sure your
> > > > command line was not truncated.
> > > >
> > > >   Matt
> > > >
> > > > On 2/9/07, Ben Tay < zonexo at gmail.com> wrote:
> > > > >
> > > > > ya, i did use -log_summary. but no output.....
> > > > >
> > > > > On 2/9/07, Barry Smith <bsmith at mcs.anl.gov> wrote:
> > > > > >
> > > > > >
> > > > > > -log_summary
> > > > > >
> > > > > >
> > > > > > On Fri, 9 Feb 2007, Ben Tay wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I've tried to use log_summary but nothing came out? Did I miss
> > > > > > out
> > > > > > > something? It worked when I used -info...
> > > > > > >
> > > > > > >
> > > > > > > On 2/9/07, Lisandro Dalcin <dalcinl at gmail.com > wrote:
> > > > > > > >
> > > > > > > > On 2/8/07, Ben Tay < zonexo at gmail.com> wrote:
> > > > > > > > > i'm trying to solve my cfd code using PETSc in parallel.
> > > > > > Besides the
> > > > > > > > linear
> > > > > > > > > eqns for PETSc, other parts of the code has also been
> > > > > > parallelized using
> > > > > > > > > MPI.
> > > > > > > >
> > > > > > > > Finite elements or finite differences, or what?
> > > > > > > >
> > > > > > > > > however i find that the parallel version of the code
> > > > > > running on 4
> > > > > > > > processors
> > > > > > > > > is even slower than the sequential version.
> > > > > > > >
> > > > > > > > Can you monitor the convergence and iteration count of
> > > > > > momentum and
> > > > > > > > poisson steps?
> > > > > > > >
> > > > > > > >
> > > > > > > > > in order to find out why, i've used the -info option to
> > > > > > print out the
> > > > > > > > > details. there are 2 linear equations being solved -
> > > > > > momentum and
> > > > > > > > poisson.
> > > > > > > > > the momentum one is twice the size of the poisson. it is
> > > > > > shown below:
> > > > > > > >
> > > > > > > > Can you use -log_summary command line option and send the
> > > > > > output attached?
> > > > > > > >
> > > > > > > > > i saw some statements stating "seq". am i running in
> > > > > > sequential or
> > > > > > > > parallel
> > > > > > > > > mode? have i preallocated too much space?
> > > > > > > >
> > > > > > > > It seems you are running in parallel. The "Seq" are related
> > > > > > to local,
> > > > > > > > internal objects. In PETSc, parallel matrices have inner
> > > > > > sequential
> > > > > > > > matrices.
> > > > > > > >
> > > > > > > > > lastly, if Ax=b, A_sta and A_end
> > > > > > from  MatGetOwnershipRange and b_sta
> > > > > > > > and
> > > > > > > > > b_end from VecGetOwnershipRange should always be the same
> > > > > > value, right?
> > > > > > > >
> > > > > > > > I should. If not, you are likely going to get an runtime
> > > > > > error.
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > >
> > > > > > > > --
> > > > > > > > Lisandro Dalc?n
> > > > > > > > ---------------
> > > > > > > > Centro Internacional de M?todos Computacionales en
> > > > > > Ingenier?a (CIMEC)
> > > > > > > > Instituto de Desarrollo Tecnol?gico para la Industria
> > > > > > Qu?mica (INTEC)
> > > > > > > > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas
> > > > > > (CONICET)
> > > > > > > > PTLC - G?emes 3450, (3000) Santa Fe, Argentina
> > > > > > > > Tel/Fax: +54-(0)342-451.1594
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > One trouble is that despite this system, anyone who reads journals
> > > > widely
> > > > and critically is forced to realize that there are scarcely any bars
> > > > to eventual
> > > > publication. There seems to be no study too fragmented, no
> > > > hypothesis too
> > > > trivial, no literature citation too biased or too egotistical, no
> > > > design too
> > > > warped, no methodology too bungled, no presentation of results too
> > > > inaccurate, too obscure, and too contradictory, no analysis too
> > > > self-serving,
> > > > no argument too circular, no conclusions too trifling or too
> > > > unjustified, and
> > > > no grammar and syntax too offensive for a paper to end up in print.
> > > > -- Drummond Rennie
> > >
> > >
> > >
> >
> >
> > --
> > One trouble is that despite this system, anyone who reads journals
> > widely
> > and critically is forced to realize that there are scarcely any bars to
> > eventual
> > publication. There seems to be no study too fragmented, no hypothesis
> > too
> > trivial, no literature citation too biased or too egotistical, no design
> > too
> > warped, no methodology too bungled, no presentation of results too
> > inaccurate, too obscure, and too contradictory, no analysis too
> > self-serving,
> > no argument too circular, no conclusions too trifling or too
> > unjustified, and
> > no grammar and syntax too offensive for a paper to end up in print. --
> > Drummond Rennie
> >
>
>


-- 
One trouble is that despite this system, anyone who reads journals widely
and critically is forced to realize that there are scarcely any bars to
eventual
publication. There seems to be no study too fragmented, no hypothesis too
trivial, no literature citation too biased or too egotistical, no design too
warped, no methodology too bungled, no presentation of results too
inaccurate, too obscure, and too contradictory, no analysis too
self-serving,
no argument too circular, no conclusions too trifling or too unjustified,
and
no grammar and syntax too offensive for a paper to end up in print. --
Drummond Rennie
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070209/ca79d809/attachment.htm>

From dimitri.lecas at c-s.fr  Fri Feb  9 11:33:40 2007
From: dimitri.lecas at c-s.fr (LECAS Dimitri)
Date: Fri, 09 Feb 2007 18:33:40 +0100
Subject: Partitioning on a mpiaij matrix
Message-ID: <38643393c5.393c538643@c-s.fr>

Hello,

I thinks i find a "bug". I try to use parmetis for partitioning a matrix
 created with MatCreateMPIAIJ. 

Here the output :

[0]PETSC ERROR: No support for this operation for this object type!
[0]PETSC ERROR: Mat type mpiadj!
[0]PETSC ERROR: MatSetValues() line 825 in src/mat/interface/matrix.c
[0]PETSC ERROR: MatConvert_Basic() line 34 in src/mat/utils/convert.c
[0]PETSC ERROR: MatConvert() line 3134 in src/mat/interface/matrix.c
[0]PETSC ERROR: MatPartitioningApply_Parmetis() line 47 in
src/mat/partition/impls/pmetis/pmetis.c
[0]PETSC ERROR: MatPartitioningApply() line 238 in
src/mat/partition/partition.c

If i understand correctly, MatPartitioningApply_Parmetis try to convert
the matrix in format MPIAdj and failed because we can't use MatSetValues
on a MPIAdj.

It's possible to easily avoid this bug ?

Best regards

-- 
Dimitri Lecas






From bsmith at mcs.anl.gov  Fri Feb  9 13:09:17 2007
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Fri, 9 Feb 2007 13:09:17 -0600 (CST)
Subject: Partitioning on a mpiaij matrix
In-Reply-To: <38643393c5.393c538643@c-s.fr>
References: <38643393c5.393c538643@c-s.fr>
Message-ID: <Pine.OSX.4.64.0702091305450.20722@barry-smiths-computer.local>


  MatConvert() checks for a variety of converts; from the code

    /* 3) See if a good general converter is registered for the desired class */
    conv = B->ops->convertfrom;
    ierr = MatDestroy(B);CHKERRQ(ierr);
    if (conv) goto foundconv;

now MATMPIADJ has a MatConvertFrom that SHOULD be listed in the function table
so it should not fall into the default MatConvert_Basic().

  What version of PETSc are you using? Maybe an older one that does not have
this converter? If you are using 2.3.2 or petsc-dev you can put a 
breakpoint in MatConvert() and try to see why it is not picking up the 
convertfrom function? It is possible some bug that we are not aware of
but I have difficulty seeing what could be going wrong.

   Good luck,

   Barry


On Fri, 9 Feb 2007, LECAS Dimitri wrote:

> Hello,
> 
> I thinks i find a "bug". I try to use parmetis for partitioning a matrix
>  created with MatCreateMPIAIJ. 
> 
> Here the output :
> 
> [0]PETSC ERROR: No support for this operation for this object type!
> [0]PETSC ERROR: Mat type mpiadj!
> [0]PETSC ERROR: MatSetValues() line 825 in src/mat/interface/matrix.c
> [0]PETSC ERROR: MatConvert_Basic() line 34 in src/mat/utils/convert.c
> [0]PETSC ERROR: MatConvert() line 3134 in src/mat/interface/matrix.c
> [0]PETSC ERROR: MatPartitioningApply_Parmetis() line 47 in
> src/mat/partition/impls/pmetis/pmetis.c
> [0]PETSC ERROR: MatPartitioningApply() line 238 in
> src/mat/partition/partition.c
> 
> If i understand correctly, MatPartitioningApply_Parmetis try to convert
> the matrix in format MPIAdj and failed because we can't use MatSetValues
> on a MPIAdj.
> 
> It's possible to easily avoid this bug ?
> 
> Best regards
> 
> 



From dimitri.lecas at c-s.fr  Fri Feb  9 14:21:03 2007
From: dimitri.lecas at c-s.fr (LECAS Dimitri)
Date: Fri, 09 Feb 2007 21:21:03 +0100
Subject: Partitioning on a mpiaij matrix
Message-ID: <399fc3cc1b.3cc1b399fc@c-s.fr>



----- Original Message -----
From: Barry Smith <bsmith at mcs.anl.gov>
Date: Friday, February 9, 2007 8:09 pm
Subject: Re: Partitioning on a mpiaij matrix

> 
>  MatConvert() checks for a variety of converts; from the code
> 
>    /* 3) See if a good general converter is registered for the 
> desired class */
>    conv = B->ops->convertfrom;
>    ierr = MatDestroy(B);CHKERRQ(ierr);
>    if (conv) goto foundconv;
> 
> now MATMPIADJ has a MatConvertFrom that SHOULD be listed in the 
> function table
> so it should not fall into the default MatConvert_Basic().
> 
>  What version of PETSc are you using? Maybe an older one that 
> does not have
> this converter? If you are using 2.3.2 or petsc-dev you can put a 
> breakpoint in MatConvert() and try to see why it is not picking up 
> the 
> convertfrom function? It is possible some bug that we are not 
> aware of
> but I have difficulty seeing what could be going wrong.
> 
>   Good luck,
> 
>   Barry
> 
> 
I used the 2.3.2-p8 from the lite package (the one without the
documentation). 

I'm sorry i'm not longer at work so i can't test anything before monday.

-- 
Dimitri Lecas



From jinzishuai at yahoo.com  Fri Feb  9 16:59:22 2007
From: jinzishuai at yahoo.com (Shi Jin)
Date: Fri, 9 Feb 2007 14:59:22 -0800 (PST)
Subject: A 3D example of KSPSolve?
Message-ID: <930330.25934.qm@web36210.mail.mud.yahoo.com>

Hi there,

I am tuning our 3D FEM CFD code written with PETSc.
The code doesn't scale very well. For example, with 8
processes on a linux cluster, the speedup we achieve
with a fairly large problem size(million of elements)
is only 3 to 4 using the Congugate gradient solver. We
can achieve a speed up of a 6.5 using a GMRes solver
but the wall clock time of a GMRes is longer than a CG
solver which indicates that CG is the faster solver
and it scales not as good as GMRes. Is this generally
true?

I then went to the examples and find a 2D example of
KSPSolve (ex2.c). I let the code ran with a 1000x1000
mesh and get a linear scaling of the CG solver and a
super linear scaling of the GMRes. These are both much
better than our code. However, I think the 2D nature
of the sample problem might help the scaling of the
code. So I would like to try some 3D example using the
KSPSolve. Unfortunately, I couldn't find such an
example either in the src/ksp/ksp/examples/tutorials
directory or by google search. There are a couple of
3D examples in the src/ksp/ksp/examples/tutorials but
they   are about the SNES not KSPSolve. If anyone can
provide me with such an example, I would really
appreciate it.
Thanks a lot.

Shi


 
____________________________________________________________________________________
Finding fabulous fares is fun.  
Let Yahoo! FareChase search your favorite travel sites to find flight and hotel bargains.
http://farechase.yahoo.com/promo-generic-14795097



From bsmith at mcs.anl.gov  Fri Feb  9 18:53:09 2007
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Fri, 9 Feb 2007 18:53:09 -0600 (CST)
Subject: A 3D example of KSPSolve?
In-Reply-To: <930330.25934.qm@web36210.mail.mud.yahoo.com>
References: <930330.25934.qm@web36210.mail.mud.yahoo.com>
Message-ID: <Pine.OSX.4.64.0702091850430.20722@barry-smiths-computer.local>


  Shi,

   There is never a better test problem then your actual problem.
Send the results from running on 1, 4, and 8 processes with the options
-log_summary -ksp_view (use the optimized version of PETSc (running 
config/configure.py --with-debugging=0))

  Barry


On Fri, 9 Feb 2007, Shi Jin wrote:

> Hi there,
> 
> I am tuning our 3D FEM CFD code written with PETSc.
> The code doesn't scale very well. For example, with 8
> processes on a linux cluster, the speedup we achieve
> with a fairly large problem size(million of elements)
> is only 3 to 4 using the Congugate gradient solver. We
> can achieve a speed up of a 6.5 using a GMRes solver
> but the wall clock time of a GMRes is longer than a CG
> solver which indicates that CG is the faster solver
> and it scales not as good as GMRes. Is this generally
> true?
> 
> I then went to the examples and find a 2D example of
> KSPSolve (ex2.c). I let the code ran with a 1000x1000
> mesh and get a linear scaling of the CG solver and a
> super linear scaling of the GMRes. These are both much
> better than our code. However, I think the 2D nature
> of the sample problem might help the scaling of the
> code. So I would like to try some 3D example using the
> KSPSolve. Unfortunately, I couldn't find such an
> example either in the src/ksp/ksp/examples/tutorials
> directory or by google search. There are a couple of
> 3D examples in the src/ksp/ksp/examples/tutorials but
> they   are about the SNES not KSPSolve. If anyone can
> provide me with such an example, I would really
> appreciate it.
> Thanks a lot.
> 
> Shi
> 
> 
>  
> ____________________________________________________________________________________
> Finding fabulous fares is fun.  
> Let Yahoo! FareChase search your favorite travel sites to find flight and hotel bargains.
> http://farechase.yahoo.com/promo-generic-14795097
> 
> 



From zonexo at gmail.com  Fri Feb  9 18:51:43 2007
From: zonexo at gmail.com (Ben Tay)
Date: Sat, 10 Feb 2007 08:51:43 +0800
Subject: understanding the output from -info
In-Reply-To: <a9f269830702090820ha6a989br4ef18bedc1662665@mail.gmail.com>
References: <804ab5d40702080747g69ee8b44h54cce509177ee0a8@mail.gmail.com>
	 <e7ba66e40702080850j15f34518n468fbc0ab4cf26bb@mail.gmail.com>
	 <804ab5d40702090434w4f0674e6s1c936cb410f3744a@mail.gmail.com>
	 <Pine.OSX.4.64.0702090801030.20722@barry-smiths-computer.local>
	 <804ab5d40702090620u5cf86c51s4e1b7b724eaf4f98@mail.gmail.com>
	 <a9f269830702090659n323f9381vf9b963aee438244e@mail.gmail.com>
	 <804ab5d40702090724n73db6f8w574622903161eb4a@mail.gmail.com>
	 <a9f269830702090727q2e875049v8007b5282d1455f0@mail.gmail.com>
	 <804ab5d40702090816qb6d1325g1d311a0eb53eec26@mail.gmail.com>
	 <a9f269830702090820ha6a989br4ef18bedc1662665@mail.gmail.com>
Message-ID: <804ab5d40702091651h6265a510jf5d4ca46cd526876@mail.gmail.com>

Ya, that's the mistake. I changed part of the code resulting in
PetscFinalize not being called.

Here's the output:


---------------------------------------------- PETSc Performance Summary:
----------------------------------------------

/home/enduser/g0306332/ns2d/a.out on a linux-mpi named
atlas00.nus.edu.sgwith 4 processors, by g0306332 Sat Feb 10 08:32:08
2007
Using Petsc Release Version 2.3.2, Patch 8, Tue Jan  2 14:33:59 PST 2007 HG
revision: ebeddcedcc065e32fc252af32cf1d01ed4fc7a80

                         Max       Max/Min        Avg      Total
Time (sec):           2.826e+02      2.08192   1.725e+02
Objects:              1.110e+02      1.00000   1.110e+02
Flops:                6.282e+08      1.00736   6.267e+08  2.507e+09
Flops/sec:            4.624e+06      2.08008   4.015e+06  1.606e+07
Memory:               1.411e+07      1.01142              5.610e+07
MPI Messages:         8.287e+03      1.90156   6.322e+03  2.529e+04
MPI Message Lengths:  6.707e+07      1.11755   1.005e+04  2.542e+08
MPI Reductions:       3.112e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N -->
2N flops
                            and VecAXPY() for complex vectors of length N
--> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---
-- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts
%Total     Avg         %Total   counts   %Total
 0:      Main Stage: 1.7247e+02 100.0%  2.5069e+09 100.0%  2.529e+04 100.0%
1.005e+04      100.0%  1.245e+04 100.0%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting
output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops/sec: Max - maximum over all processors
                       Ratio - ratio of maximum to minimum over all
processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this
phase
      %M - percent messages in this phase     %L - percent message lengths
in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over
all processors)
------------------------------------------------------------------------------------------------------------------------


      ##########################################################
      #                                                        #
      #                          WARNING!!!                    #
      #                                                        #
      #   This code was compiled with a debugging option,      #
      #   To get timing results run config/configure.py        #
      #   using --with-debugging=no, the performance will      #
      #   be generally two or three times faster.              #
      #                                                        #
      ##########################################################




      ##########################################################



     ##########################################################
      #                                                        #
      #                          WARNING!!!                    #
      #                                                        #
      #   This code was run without the PreLoadBegin()         #
      #   macros. To get timing results we always recommend    #
      #   preloading. otherwise timing numbers may be          #
      #   meaningless.                                         #
      ##########################################################


Event                Count      Time (sec)
Flops/sec                         --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             3927 1.0 2.4071e+01 1.3 6.14e+06 1.4 2.4e+04 1.3e+03
0.0e+00 12 18 93 12  0  12 18 93 12  0    19
MatSolve            3967 1.0 2.5914e+00 1.9 7.99e+07 1.9 0.0e+00 0.0e+00
0.0e+00  1 17  0  0  0   1 17  0  0  0   168
MatLUFactorNum        40 1.0 4.4779e-01 1.5 3.14e+07 1.5 0.0e+00 0.0e+00
0.0e+00  0  2  0  0  0   0  2  0  0  0    85
MatILUFactorSym        2 1.0 3.1099e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
4.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatScale              20 1.0 1.1487e-01 8.7 8.73e+07 8.9 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0    39
MatAssemblyBegin      40 1.0 7.8844e+00 1.3 0.00e+00 0.0 7.6e+02 2.8e+05
8.0e+01  4  0  3 83  1   4  0  3 83  1     0
MatAssemblyEnd        40 1.0 6.9408e+00 1.2 0.00e+00 0.0 1.2e+01 9.6e+02
6.4e+01  4  0  0  0  1   4  0  0  0  1     0
MatGetOrdering         2 1.0 8.0509e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
6.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatZeroEntries        21 1.0 1.4379e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecMDot             3792 1.0 4.7372e+01 1.4 5.20e+06 1.4 0.0e+00 0.0e+00
3.8e+03 24 29  0  0 30  24 29  0  0 30    15
VecNorm             3967 1.0 3.9513e+01 1.2 4.11e+05 1.2 0.0e+00 0.0e+00
4.0e+03 21  2  0  0 32  21  2  0  0 32     1
VecScale            3947 1.0 3.4941e-02 1.2 2.18e+08 1.2 0.0e+00 0.0e+00
0.0e+00  0  1  0  0  0   0  1  0  0  0   738
VecCopy              155 1.0 1.0029e-0125.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              4142 1.0 3.4638e-01 6.6 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY              290 1.0 5.9618e-03 1.2 2.14e+08 1.2 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0   709
VecMAXPY            3947 1.0 1.5566e+00 1.3 1.64e+08 1.3 0.0e+00 0.0e+00
0.0e+00  1 31  0  0  0   1 31  0  0  0   498
VecAssemblyBegin      80 1.0 4.1793e+00 1.1 0.00e+00 0.0 9.6e+02 1.4e+04
2.4e+02  2  0  4  5  2   2  0  4  5  2     0
VecAssemblyEnd        80 1.0 2.0682e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin     3927 1.0 2.8672e-01 3.9 0.00e+00 0.0 2.4e+04 1.3e+03
0.0e+00  0  0 93 12  0   0  0 93 12  0     0
VecScatterEnd       3927 1.0 2.2135e+01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 11  0  0  0  0  11  0  0  0  0     0
VecNormalize        3947 1.0 3.9593e+01 1.2 6.11e+05 1.2 0.0e+00 0.0e+00
3.9e+03 21  3  0  0 32  21  3  0  0 32     2
KSPGMRESOrthog      3792 1.0 4.8670e+01 1.3 9.92e+06 1.3 0.0e+00 0.0e+00
3.8e+03 25 58  0  0 30  25 58  0  0 30    30
KSPSetup              80 1.0 2.0014e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
2.0e+01  0  0  0  0  0   0  0  0  0  0     0
KSPSolve              40 1.0 1.0660e+02 1.0 5.90e+06 1.0 2.4e+04 1.3e+03
1.2e+04 62100 93 12 97  62100 93 12 97    23
PCSetUp               80 1.0 4.5669e-01 1.5 3.05e+07 1.5 0.0e+00 0.0e+00
1.4e+01  0  2  0  0  0   0  2  0  0  0    83
PCSetUpOnBlocks       40 1.0 4.5418e-01 1.5 3.07e+07 1.5 0.0e+00 0.0e+00
1.0e+01  0  2  0  0  0   0  2  0  0  0    84
PCApply             3967 1.0 4.1737e+00 2.0 5.30e+07 2.0 0.0e+00 0.0e+00
4.0e+03  2 17  0  0 32   2 17  0  0 32   104
------------------------------------------------------------------------------------------------------------------------


Memory usage is given in bytes:

Object Type          Creations   Destructions   Memory  Descendants' Mem.

--- Event Stage 0: Main Stage

              Matrix     8              8      21136     0
           Index Set    12             12      74952     0
                 Vec    81             81    1447476     0
         Vec Scatter     2              2          0     0
       Krylov Solver     4              4      33760     0
      Preconditioner     4              4        392     0
========================================================================================================================
Average time to get PetscTime(): 1.09673e-06
Average time for MPI_Barrier(): 3.90053e-05
Average time for zero size MPI_Send(): 1.65105e-05
OptionTable: -log_summary
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 4 sizeof(void*) 4
sizeof(PetscScalar) 8
Configure run at: Thu Jan 18 12:23:31 2007
Configure options: --with-vendor-compilers=intel --with-x=0 --with-shared
--with-blas-lapack-dir=/lsftmp/g0306332/inter/mkl/lib/32
--with-mpi-dir=/opt/mpich/myrinet/intel/
-----------------------------------------
Libraries compiled on Thu Jan 18 12:24:41 SGT 2007 on atlas1.nus.edu.sg
Machine characteristics: Linux atlas1.nus.edu.sg 2.4.21-20.ELsmp #1 SMP Wed
Sep 8 17:29:34 GMT 2004 i686 i686 i386 GNU/Linux
Using PETSc directory: /nas/lsftmp/g0306332/petsc-2.3.2-p8
Using PETSc arch: linux-mpif90
-----------------------------------------
Using C compiler: /opt/mpich/myrinet/intel/bin/mpicc -fPIC -g
Using Fortran compiler: /opt/mpich/myrinet/intel/bin/mpif90 -I. -fPIC -g
-w90 -w
-----------------------------------------
Using include paths:
-I/nas/lsftmp/g0306332/petsc-2.3.2-p8-I/nas/lsftmp/g0306332/petsc-
2.3.2-p8/bmake/linux-mpif90 -I/nas/lsftmp/g0306332/petsc-2.3.2-p8/include
-I/opt/mpich/myrinet/intel/include
------------------------------------------
Using C linker: /opt/mpich/myrinet/intel/bin/mpicc -fPIC -g
Using Fortran linker: /opt/mpich/myrinet/intel/bin/mpif90 -I. -fPIC -g -w90
-w
Using libraries:
-Wl,-rpath,/nas/lsftmp/g0306332/petsc-2.3.2-p8/lib/linux-mpif90
-L/nas/lsftmp/g0306332/petsc-2.3.2-p8/lib/linux-mpif90 -lpetscts -lpetscsnes
-lpetscksp -lpetscdm -lpetscmat -lpetscvec -lpetsc
-Wl,-rpath,/lsftmp/g0306332/inter/mkl/lib/32
-L/lsftmp/g0306332/inter/mkl/lib/32 -lmkl_lapack -lmkl_ia32 -lguide
-lPEPCF90 -Wl,-rpath,/opt/intel/compiler70/ia32/lib
-Wl,-rpath,/opt/mpich/myrinet/intel/lib -L/opt/mpich/myrinet/intel/lib
-Wl,-rpath,-rpath -Wl,-rpath,-ldl -L-ldl -lmpich -Wl,-rpath,-L -lgm
-lpthread -Wl,-rpath,/opt/intel/compiler70/ia32/lib
-Wl,-rpath,/opt/intel/compiler70/ia32/lib -L/opt/intel/compiler70/ia32/lib
-Wl,-rpath,/usr/lib -Wl,-rpath,/usr/lib -L/usr/lib -limf -lirc -lcprts -lcxa
-lunwind -ldl -lmpichf90 -Wl,-rpath,/opt/gm/lib -L/opt/gm/lib -lPEPCF90
-Wl,-rpath,/opt/intel/compiler70/ia32/lib -L/opt/intel/compiler70/ia32/lib
-Wl,-rpath,/usr/lib -L/usr/lib -lintrins -lIEPCF90 -lF90 -lm  -Wl,-rpath,\
-Wl,-rpath,\ -L\ -ldl -lmpich -Wl,-rpath,\ -L\ -lgm -lpthread
-Wl,-rpath,/opt/intel/compiler70/ia32/lib -L/opt/intel/compiler70/ia32/lib
-Wl,-rpath,/usr/lib -L/usr/lib -limf -lirc -lcprts -lcxa -lunwind -ldl


This is the result I get for running 20 steps. There are 2 matrix to be
solved. I've only parallize the solving of linear equations and kept the
rest of the code serial for this test. However, I found that it's much
slower than the sequential version.

From the ratio, it seems that MatScale and VecSet 's ratio are very high.
I've done a scaling of 0.5 for momentum eqn. Is that the reason for the
slowness? That is all I can decipher ....

Thank you.





On 2/10/07, Matthew Knepley <knepley at gmail.com> wrote:
>
> On 2/9/07, Ben Tay <zonexo at gmail.com> wrote:
> >
> > ops.... it worked for ex2 and ex2f  ;-)
> >
> > so what could be wrong? is there some commands or subroutine which i
> > must call? btw, i'm programming in fortran.
> >
>
> Yes, you must call PetscFinalize() in your code.
>
>   Matt
>
>
>  thank you.
> >
> >
> > On 2/9/07, Matthew Knepley <knepley at gmail.com > wrote:
> > >
> > > Problems do not go away by ignoring them. Something is wrong here, and
> > > it may
> > > affect the rest of your program. Please try to run an example:
> > >
> > >   cd src/ksp/ksp/examples/tutorials
> > >   make ex2
> > >   ./ex2 -log_summary
> > >
> > >      Matt
> > >
> > > On 2/9/07, Ben Tay <zonexo at gmail.com> wrote:
> > > >
> > > > Well, I don't know what's wrong. I did the same thing for -info and
> > > > it worked. Anyway, is there any other way?
> > > >
> > > > Like I can use -mat_view or call matview( ... ) to view a matrix. Is
> > > > there a similar subroutine for me to call?
> > > >
> > > > Thank you.
> > > >
> > > >
> > > >  On 2/9/07, Matthew Knepley <knepley at gmail.com > wrote:
> > > > >
> > > > > Impossible, please check the spelling, and make sure your
> > > > > command line was not truncated.
> > > > >
> > > > >   Matt
> > > > >
> > > > > On 2/9/07, Ben Tay < zonexo at gmail.com> wrote:
> > > > > >
> > > > > > ya, i did use -log_summary. but no output.....
> > > > > >
> > > > > > On 2/9/07, Barry Smith <bsmith at mcs.anl.gov> wrote:
> > > > > > >
> > > > > > >
> > > > > > > -log_summary
> > > > > > >
> > > > > > >
> > > > > > > On Fri, 9 Feb 2007, Ben Tay wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I've tried to use log_summary but nothing came out? Did I
> > > > > > > miss out
> > > > > > > > something? It worked when I used -info...
> > > > > > > >
> > > > > > > >
> > > > > > > > On 2/9/07, Lisandro Dalcin <dalcinl at gmail.com > wrote:
> > > > > > > > >
> > > > > > > > > On 2/8/07, Ben Tay < zonexo at gmail.com> wrote:
> > > > > > > > > > i'm trying to solve my cfd code using PETSc in parallel.
> > > > > > > Besides the
> > > > > > > > > linear
> > > > > > > > > > eqns for PETSc, other parts of the code has also been
> > > > > > > parallelized using
> > > > > > > > > > MPI.
> > > > > > > > >
> > > > > > > > > Finite elements or finite differences, or what?
> > > > > > > > >
> > > > > > > > > > however i find that the parallel version of the code
> > > > > > > running on 4
> > > > > > > > > processors
> > > > > > > > > > is even slower than the sequential version.
> > > > > > > > >
> > > > > > > > > Can you monitor the convergence and iteration count of
> > > > > > > momentum and
> > > > > > > > > poisson steps?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > in order to find out why, i've used the -info option to
> > > > > > > print out the
> > > > > > > > > > details. there are 2 linear equations being solved -
> > > > > > > momentum and
> > > > > > > > > poisson.
> > > > > > > > > > the momentum one is twice the size of the poisson. it is
> > > > > > > shown below:
> > > > > > > > >
> > > > > > > > > Can you use -log_summary command line option and send the
> > > > > > > output attached?
> > > > > > > > >
> > > > > > > > > > i saw some statements stating "seq". am i running in
> > > > > > > sequential or
> > > > > > > > > parallel
> > > > > > > > > > mode? have i preallocated too much space?
> > > > > > > > >
> > > > > > > > > It seems you are running in parallel. The "Seq" are
> > > > > > > related to local,
> > > > > > > > > internal objects. In PETSc, parallel matrices have inner
> > > > > > > sequential
> > > > > > > > > matrices.
> > > > > > > > >
> > > > > > > > > > lastly, if Ax=b, A_sta and A_end
> > > > > > > from  MatGetOwnershipRange and b_sta
> > > > > > > > > and
> > > > > > > > > > b_end from VecGetOwnershipRange should always be the
> > > > > > > same value, right?
> > > > > > > > >
> > > > > > > > > I should. If not, you are likely going to get an runtime
> > > > > > > error.
> > > > > > > > >
> > > > > > > > > Regards,
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Lisandro Dalc?n
> > > > > > > > > ---------------
> > > > > > > > > Centro Internacional de M?todos Computacionales en
> > > > > > > Ingenier?a (CIMEC)
> > > > > > > > > Instituto de Desarrollo Tecnol?gico para la Industria
> > > > > > > Qu?mica (INTEC)
> > > > > > > > > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas
> > > > > > > (CONICET)
> > > > > > > > > PTLC - G?emes 3450, (3000) Santa Fe, Argentina
> > > > > > > > > Tel/Fax: +54-(0)342-451.1594
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > One trouble is that despite this system, anyone who reads journals
> > > > > widely
> > > > > and critically is forced to realize that there are scarcely any
> > > > > bars to eventual
> > > > > publication. There seems to be no study too fragmented, no
> > > > > hypothesis too
> > > > > trivial, no literature citation too biased or too egotistical, no
> > > > > design too
> > > > > warped, no methodology too bungled, no presentation of results too
> > > > >
> > > > > inaccurate, too obscure, and too contradictory, no analysis too
> > > > > self-serving,
> > > > > no argument too circular, no conclusions too trifling or too
> > > > > unjustified, and
> > > > > no grammar and syntax too offensive for a paper to end up in
> > > > > print. -- Drummond Rennie
> > > >
> > > >
> > > >
> > >
> > >
> > > --
> > > One trouble is that despite this system, anyone who reads journals
> > > widely
> > > and critically is forced to realize that there are scarcely any bars
> > > to eventual
> > > publication. There seems to be no study too fragmented, no hypothesis
> > > too
> > > trivial, no literature citation too biased or too egotistical, no
> > > design too
> > > warped, no methodology too bungled, no presentation of results too
> > > inaccurate, too obscure, and too contradictory, no analysis too
> > > self-serving,
> > > no argument too circular, no conclusions too trifling or too
> > > unjustified, and
> > > no grammar and syntax too offensive for a paper to end up in print. --
> > > Drummond Rennie
> > >
> >
> >
>
>
> --
> One trouble is that despite this system, anyone who reads journals widely
> and critically is forced to realize that there are scarcely any bars to
> eventual
> publication. There seems to be no study too fragmented, no hypothesis too
> trivial, no literature citation too biased or too egotistical, no design
> too
> warped, no methodology too bungled, no presentation of results too
> inaccurate, too obscure, and too contradictory, no analysis too
> self-serving,
> no argument too circular, no conclusions too trifling or too unjustified,
> and
> no grammar and syntax too offensive for a paper to end up in print. --
> Drummond Rennie
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070210/841ee45b/attachment.htm>

From knepley at gmail.com  Fri Feb  9 19:15:49 2007
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 9 Feb 2007 19:15:49 -0600
Subject: understanding the output from -info
In-Reply-To: <804ab5d40702091651h6265a510jf5d4ca46cd526876@mail.gmail.com>
References: <804ab5d40702080747g69ee8b44h54cce509177ee0a8@mail.gmail.com>
	 <804ab5d40702090434w4f0674e6s1c936cb410f3744a@mail.gmail.com>
	 <Pine.OSX.4.64.0702090801030.20722@barry-smiths-computer.local>
	 <804ab5d40702090620u5cf86c51s4e1b7b724eaf4f98@mail.gmail.com>
	 <a9f269830702090659n323f9381vf9b963aee438244e@mail.gmail.com>
	 <804ab5d40702090724n73db6f8w574622903161eb4a@mail.gmail.com>
	 <a9f269830702090727q2e875049v8007b5282d1455f0@mail.gmail.com>
	 <804ab5d40702090816qb6d1325g1d311a0eb53eec26@mail.gmail.com>
	 <a9f269830702090820ha6a989br4ef18bedc1662665@mail.gmail.com>
	 <804ab5d40702091651h6265a510jf5d4ca46cd526876@mail.gmail.com>
Message-ID: <a9f269830702091715qfe72360nf4657f0019b858cf@mail.gmail.com>

1) These MFlop rates are terrible. It seems like your problem is way too
small.

2) The load balance is not good.

   Matt

On 2/9/07, Ben Tay <zonexo at gmail.com> wrote:
>
> Ya, that's the mistake. I changed part of the code resulting in
> PetscFinalize not being called.
>
> Here's the output:
>
>
> ---------------------------------------------- PETSc Performance Summary:
> ----------------------------------------------
>
> /home/enduser/g0306332/ns2d/a.out on a linux-mpi named atlas00.nus.edu.sgwith 4 processors, by g0306332 Sat Feb 10 08:32:08 2007
> Using Petsc Release Version 2.3.2, Patch 8, Tue Jan  2 14:33:59 PST 2007
> HG revision: ebeddcedcc065e32fc252af32cf1d01ed4fc7a80
>
>                          Max       Max/Min        Avg      Total
> Time (sec):           2.826e+02      2.08192   1.725e+02
> Objects:              1.110e+02      1.00000   1.110e+02
> Flops:                6.282e+08       1.00736   6.267e+08  2.507e+09
> Flops/sec:            4.624e+06      2.08008   4.015e+06  1.606e+07
> Memory:               1.411e+07      1.01142              5.610e+07
> MPI Messages:         8.287e+03      1.90156    6.322e+03  2.529e+04
> MPI Message Lengths:  6.707e+07      1.11755   1.005e+04  2.542e+08
> MPI Reductions:       3.112e+03      1.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
>                             e.g., VecAXPY() for real vectors of length N
> --> 2N flops
>                             and VecAXPY() for complex vectors of length N
> --> 8N flops
>
> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages
> ---  -- Message Lengths --  -- Reductions --
>                         Avg     %Total     Avg     %Total   counts
> %Total     Avg         %Total   counts   %Total
>  0:      Main Stage: 1.7247e+02 100.0%  2.5069e+09 100.0%  2.529e+04
> 100.0%  1.005e+04      100.0%  1.245e+04 100.0%
>
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
>    Count: number of times phase was executed
>    Time and Flops/sec: Max - maximum over all processors
>                        Ratio - ratio of maximum to minimum over all
> processors
>    Mess: number of messages sent
>    Avg. len: average message length
>    Reduct: number of global reductions
>    Global: entire computation
>    Stage: stages of a computation. Set stages with PetscLogStagePush() and
> PetscLogStagePop().
>       %T - percent time in this phase         %F - percent flops in this
> phase
>       %M - percent messages in this phase     %L - percent message lengths
> in this phase
>       %R - percent reductions in this phase
>    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
> over all processors)
>
> ------------------------------------------------------------------------------------------------------------------------
>
>
>       ##########################################################
>       #                                                        #
>       #                          WARNING!!!                    #
>       #                                                        #
>       #   This code was compiled with a debugging option,      #
>       #   To get timing results run config/configure.py        #
>       #   using --with-debugging=no, the performance will      #
>       #   be generally two or three times faster.              #
>       #                                                        #
>       ##########################################################
>
>
>
>
>       ##########################################################
>
>
>
>      ##########################################################
>       #                                                        #
>       #                          WARNING!!!                    #
>       #                                                        #
>       #   This code was run without the PreLoadBegin()         #
>       #   macros. To get timing results we always recommend    #
>       #   preloading. otherwise timing numbers may be          #
>       #   meaningless.                                         #
>       ##########################################################
>
>
> Event                Count      Time (sec)
> Flops/sec                         --- Global ---  --- Stage ---   Total
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> MatMult             3927 1.0 2.4071e+01 1.3 6.14e+06 1.4 2.4e+04 1.3e+03
> 0.0e+00 12 18 93 12  0  12 18 93 12  0    19
> MatSolve            3967 1.0 2.5914e+00 1.9 7.99e+07 1.9 0.0e+00 0.0e+00
> 0.0e+00  1 17  0  0  0   1 17  0  0  0   168
> MatLUFactorNum        40 1.0 4.4779e-01 1.5 3.14e+07 1.5 0.0e+00 0.0e+00
> 0.0e+00  0  2  0  0  0   0  2  0  0  0    85
> MatILUFactorSym        2 1.0 3.1099e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 4.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatScale              20 1.0 1.1487e-01 8.7 8.73e+07 8.9 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0    39
> MatAssemblyBegin      40 1.0 7.8844e+00 1.3 0.00e+00 0.0 7.6e+02 2.8e+05
> 8.0e+01  4  0  3 83  1   4  0  3 83  1     0
> MatAssemblyEnd        40 1.0 6.9408e+00 1.2 0.00e+00 0.0 1.2e+01 9.6e+02
> 6.4e+01  4  0  0  0  1   4  0  0  0  1     0
> MatGetOrdering         2 1.0 8.0509e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 6.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatZeroEntries        21 1.0 1.4379e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecMDot             3792 1.0 4.7372e+01 1.4 5.20e+06 1.4 0.0e+00 0.0e+00
> 3.8e+03 24 29  0  0 30  24 29  0  0 30    15
> VecNorm             3967 1.0 3.9513e+01 1.2 4.11e+05 1.2 0.0e+00 0.0e+00
> 4.0e+03 21  2  0  0 32  21  2  0  0 32     1
> VecScale            3947 1.0 3.4941e-02 1.2 2.18e+08 1.2 0.0e+00 0.0e+00
> 0.0e+00  0  1  0  0  0   0  1  0  0  0   738
> VecCopy              155 1.0 1.0029e-0125.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSet              4142 1.0 3.4638e-01 6.6 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAXPY              290 1.0 5.9618e-03 1.2 2.14e+08 1.2 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0   709
> VecMAXPY            3947 1.0 1.5566e+00 1.3 1.64e+08 1.3 0.0e+00 0.0e+00
> 0.0e+00  1 31  0  0  0   1 31  0  0  0   498
> VecAssemblyBegin      80 1.0 4.1793e+00 1.1 0.00e+00 0.0 9.6e+02 1.4e+04
> 2.4e+02  2  0  4  5  2   2  0  4  5  2     0
> VecAssemblyEnd        80 1.0 2.0682e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecScatterBegin     3927 1.0 2.8672e-01 3.9 0.00e+00 0.0 2.4e+04 1.3e+03
> 0.0e+00  0  0 93 12  0   0  0 93 12  0     0
> VecScatterEnd       3927 1.0 2.2135e+01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 11  0  0  0  0  11  0  0  0  0     0
> VecNormalize        3947 1.0 3.9593e+01 1.2 6.11e+05 1.2 0.0e+00 0.0e+00
> 3.9e+03 21  3  0  0 32  21  3  0  0 32     2
> KSPGMRESOrthog      3792 1.0 4.8670e+01 1.3 9.92e+06 1.3 0.0e+00 0.0e+00
> 3.8e+03 25 58  0  0 30  25 58  0  0 30    30
> KSPSetup              80 1.0 2.0014e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 2.0e+01  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve              40 1.0 1.0660e+02 1.0 5.90e+06 1.0 2.4e+04 1.3e+03
> 1.2e+04 62100 93 12 97  62100 93 12 97    23
> PCSetUp               80 1.0 4.5669e-01 1.5 3.05e+07 1.5 0.0e+00 0.0e+00
> 1.4e+01  0  2  0  0  0   0  2  0  0  0    83
> PCSetUpOnBlocks       40 1.0 4.5418e-01 1.5 3.07e+07 1.5 0.0e+00 0.0e+00
> 1.0e+01  0  2  0  0  0   0  2  0  0  0    84
> PCApply             3967 1.0 4.1737e+00 2.0 5.30e+07 2.0 0.0e+00 0.0e+00
> 4.0e+03  2 17  0  0 32   2 17  0  0 32   104
> ------------------------------------------------------------------------------------------------------------------------
>
>
>
> Memory usage is given in bytes:
>
> Object Type          Creations   Destructions   Memory  Descendants' Mem.
>
> --- Event Stage 0: Main Stage
>
>               Matrix     8              8      21136     0
>            Index Set    12             12      74952     0
>                  Vec    81             81    1447476     0
>          Vec Scatter     2              2          0     0
>        Krylov Solver     4              4      33760     0
>       Preconditioner     4              4        392     0
> ========================================================================================================================
>
> Average time to get PetscTime(): 1.09673e-06
> Average time for MPI_Barrier(): 3.90053e-05
> Average time for zero size MPI_Send(): 1.65105e-05
> OptionTable: -log_summary
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 4 sizeof(void*) 4
> sizeof(PetscScalar) 8
> Configure run at: Thu Jan 18 12:23:31 2007
> Configure options: --with-vendor-compilers=intel --with-x=0 --with-shared
> --with-blas-lapack-dir=/lsftmp/g0306332/inter/mkl/lib/32
> --with-mpi-dir=/opt/mpich/myrinet/intel/
> -----------------------------------------
> Libraries compiled on Thu Jan 18 12:24:41 SGT 2007 on atlas1.nus.edu.sg
> Machine characteristics: Linux atlas1.nus.edu.sg 2.4.21-20.ELsmp #1 SMP
> Wed Sep 8 17:29:34 GMT 2004 i686 i686 i386 GNU/Linux
> Using PETSc directory: /nas/lsftmp/g0306332/petsc-2.3.2-p8
> Using PETSc arch: linux-mpif90
> -----------------------------------------
> Using C compiler: /opt/mpich/myrinet/intel/bin/mpicc -fPIC -g
> Using Fortran compiler: /opt/mpich/myrinet/intel/bin/mpif90 -I. -fPIC -g
> -w90 -w
> -----------------------------------------
> Using include paths: -I/nas/lsftmp/g0306332/petsc- 2.3.2-p8-I/nas/lsftmp/g0306332/petsc-
> 2.3.2-p8/bmake/linux-mpif90 -I/nas/lsftmp/g0306332/petsc-2.3.2-p8/include
> -I/opt/mpich/myrinet/intel/include
> ------------------------------------------
> Using C linker: /opt/mpich/myrinet/intel/bin/mpicc -fPIC -g
> Using Fortran linker: /opt/mpich/myrinet/intel/bin/mpif90 -I. -fPIC -g
> -w90 -w
> Using libraries: -Wl,-rpath,/nas/lsftmp/g0306332/petsc-2.3.2-p8/lib/linux-mpif90
> -L/nas/lsftmp/g0306332/petsc-2.3.2-p8/lib/linux-mpif90 -lpetscts
> -lpetscsnes -lpetscksp -lpetscdm -lpetscmat -lpetscvec -lpetsc
> -Wl,-rpath,/lsftmp/g0306332/inter/mkl/lib/32
> -L/lsftmp/g0306332/inter/mkl/lib/32 -lmkl_lapack -lmkl_ia32 -lguide
> -lPEPCF90 -Wl,-rpath,/opt/intel/compiler70/ia32/lib
> -Wl,-rpath,/opt/mpich/myrinet/intel/lib -L/opt/mpich/myrinet/intel/lib
> -Wl,-rpath,-rpath -Wl,-rpath,-ldl -L-ldl -lmpich -Wl,-rpath,-L -lgm
> -lpthread -Wl,-rpath,/opt/intel/compiler70/ia32/lib
> -Wl,-rpath,/opt/intel/compiler70/ia32/lib -L/opt/intel/compiler70/ia32/lib
> -Wl,-rpath,/usr/lib -Wl,-rpath,/usr/lib -L/usr/lib -limf -lirc -lcprts -lcxa
> -lunwind -ldl -lmpichf90 -Wl,-rpath,/opt/gm/lib -L/opt/gm/lib -lPEPCF90
> -Wl,-rpath,/opt/intel/compiler70/ia32/lib -L/opt/intel/compiler70/ia32/lib
> -Wl,-rpath,/usr/lib -L/usr/lib -lintrins -lIEPCF90 -lF90 -lm  -Wl,-rpath,\
> -Wl,-rpath,\ -L\ -ldl -lmpich -Wl,-rpath,\ -L\ -lgm -lpthread
> -Wl,-rpath,/opt/intel/compiler70/ia32/lib -L/opt/intel/compiler70/ia32/lib
> -Wl,-rpath,/usr/lib -L/usr/lib -limf -lirc -lcprts -lcxa -lunwind -ldl
>
>
> This is the result I get for running 20 steps. There are 2 matrix to be
> solved. I've only parallize the solving of linear equations and kept the
> rest of the code serial for this test. However, I found that it's much
> slower than the sequential version.
>
> From the ratio, it seems that MatScale and VecSet 's ratio are very high.
> I've done a scaling of 0.5 for momentum eqn. Is that the reason for the
> slowness? That is all I can decipher ....
>
> Thank you.
>
>
>
>
>
> On 2/10/07, Matthew Knepley <knepley at gmail.com> wrote:
> >
> > On 2/9/07, Ben Tay < zonexo at gmail.com> wrote:
> > >
> > > ops.... it worked for ex2 and ex2f  ;-)
> > >
> > > so what could be wrong? is there some commands or subroutine which i
> > > must call? btw, i'm programming in fortran.
> > >
> >
> > Yes, you must call PetscFinalize() in your code.
> >
> >   Matt
> >
> >
> >  thank you.
> > >
> > >
> > > On 2/9/07, Matthew Knepley <knepley at gmail.com > wrote:
> > > >
> > > > Problems do not go away by ignoring them. Something is wrong here,
> > > > and it may
> > > > affect the rest of your program. Please try to run an example:
> > > >
> > > >   cd src/ksp/ksp/examples/tutorials
> > > >   make ex2
> > > >   ./ex2 -log_summary
> > > >
> > > >      Matt
> > > >
> > > > On 2/9/07, Ben Tay <zonexo at gmail.com> wrote:
> > > > >
> > > > > Well, I don't know what's wrong. I did the same thing for -info
> > > > > and it worked. Anyway, is there any other way?
> > > > >
> > > > > Like I can use -mat_view or call matview( ... ) to view a matrix.
> > > > > Is there a similar subroutine for me to call?
> > > > >
> > > > > Thank you.
> > > > >
> > > > >
> > > > >  On 2/9/07, Matthew Knepley <knepley at gmail.com > wrote:
> > > > > >
> > > > > > Impossible, please check the spelling, and make sure your
> > > > > > command line was not truncated.
> > > > > >
> > > > > >   Matt
> > > > > >
> > > > > > On 2/9/07, Ben Tay < zonexo at gmail.com> wrote:
> > > > > > >
> > > > > > > ya, i did use -log_summary. but no output.....
> > > > > > >
> > > > > > > On 2/9/07, Barry Smith <bsmith at mcs.anl.gov> wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > -log_summary
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, 9 Feb 2007, Ben Tay wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > I've tried to use log_summary but nothing came out? Did I
> > > > > > > > miss out
> > > > > > > > > something? It worked when I used -info...
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On 2/9/07, Lisandro Dalcin <dalcinl at gmail.com > wrote:
> > > > > > > > > >
> > > > > > > > > > On 2/8/07, Ben Tay < zonexo at gmail.com> wrote:
> > > > > > > > > > > i'm trying to solve my cfd code using PETSc in
> > > > > > > > parallel. Besides the
> > > > > > > > > > linear
> > > > > > > > > > > eqns for PETSc, other parts of the code has also been
> > > > > > > > parallelized using
> > > > > > > > > > > MPI.
> > > > > > > > > >
> > > > > > > > > > Finite elements or finite differences, or what?
> > > > > > > > > >
> > > > > > > > > > > however i find that the parallel version of the code
> > > > > > > > running on 4
> > > > > > > > > > processors
> > > > > > > > > > > is even slower than the sequential version.
> > > > > > > > > >
> > > > > > > > > > Can you monitor the convergence and iteration count of
> > > > > > > > momentum and
> > > > > > > > > > poisson steps?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > in order to find out why, i've used the -info option
> > > > > > > > to print out the
> > > > > > > > > > > details. there are 2 linear equations being solved -
> > > > > > > > momentum and
> > > > > > > > > > poisson.
> > > > > > > > > > > the momentum one is twice the size of the poisson. it
> > > > > > > > is shown below:
> > > > > > > > > >
> > > > > > > > > > Can you use -log_summary command line option and send
> > > > > > > > the output attached?
> > > > > > > > > >
> > > > > > > > > > > i saw some statements stating "seq". am i running in
> > > > > > > > sequential or
> > > > > > > > > > parallel
> > > > > > > > > > > mode? have i preallocated too much space?
> > > > > > > > > >
> > > > > > > > > > It seems you are running in parallel. The "Seq" are
> > > > > > > > related to local,
> > > > > > > > > > internal objects. In PETSc, parallel matrices have inner
> > > > > > > > sequential
> > > > > > > > > > matrices.
> > > > > > > > > >
> > > > > > > > > > > lastly, if Ax=b, A_sta and A_end
> > > > > > > > from  MatGetOwnershipRange and b_sta
> > > > > > > > > > and
> > > > > > > > > > > b_end from VecGetOwnershipRange should always be the
> > > > > > > > same value, right?
> > > > > > > > > >
> > > > > > > > > > I should. If not, you are likely going to get an runtime
> > > > > > > > error.
> > > > > > > > > >
> > > > > > > > > > Regards,
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Lisandro Dalc?n
> > > > > > > > > > ---------------
> > > > > > > > > > Centro Internacional de M?todos Computacionales en
> > > > > > > > Ingenier?a (CIMEC)
> > > > > > > > > > Instituto de Desarrollo Tecnol?gico para la Industria
> > > > > > > > Qu?mica (INTEC)
> > > > > > > > > > Consejo Nacional de Investigaciones Cient?ficas y
> > > > > > > > T?cnicas (CONICET)
> > > > > > > > > > PTLC - G?emes 3450, (3000) Santa Fe, Argentina
> > > > > > > > > > Tel/Fax: +54-(0)342-451.1594
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > One trouble is that despite this system, anyone who reads
> > > > > > journals widely
> > > > > > and critically is forced to realize that there are scarcely any
> > > > > > bars to eventual
> > > > > > publication. There seems to be no study too fragmented, no
> > > > > > hypothesis too
> > > > > > trivial, no literature citation too biased or too egotistical,
> > > > > > no design too
> > > > > > warped, no methodology too bungled, no presentation of results
> > > > > > too
> > > > > > inaccurate, too obscure, and too contradictory, no analysis too
> > > > > > self-serving,
> > > > > > no argument too circular, no conclusions too trifling or too
> > > > > > unjustified, and
> > > > > > no grammar and syntax too offensive for a paper to end up in
> > > > > > print. -- Drummond Rennie
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > One trouble is that despite this system, anyone who reads journals
> > > > widely
> > > > and critically is forced to realize that there are scarcely any bars
> > > > to eventual
> > > > publication. There seems to be no study too fragmented, no
> > > > hypothesis too
> > > > trivial, no literature citation too biased or too egotistical, no
> > > > design too
> > > > warped, no methodology too bungled, no presentation of results too
> > > > inaccurate, too obscure, and too contradictory, no analysis too
> > > > self-serving,
> > > > no argument too circular, no conclusions too trifling or too
> > > > unjustified, and
> > > > no grammar and syntax too offensive for a paper to end up in print.
> > > > -- Drummond Rennie
> > > >
> > >
> > >
> >
> >
> > --
> > One trouble is that despite this system, anyone who reads journals
> > widely
> > and critically is forced to realize that there are scarcely any bars to
> > eventual
> > publication. There seems to be no study too fragmented, no hypothesis
> > too
> > trivial, no literature citation too biased or too egotistical, no design
> > too
> > warped, no methodology too bungled, no presentation of results too
> > inaccurate, too obscure, and too contradictory, no analysis too
> > self-serving,
> > no argument too circular, no conclusions too trifling or too
> > unjustified, and
> > no grammar and syntax too offensive for a paper to end up in print. --
> > Drummond Rennie
> >
>
>


-- 
One trouble is that despite this system, anyone who reads journals widely
and critically is forced to realize that there are scarcely any bars to
eventual
publication. There seems to be no study too fragmented, no hypothesis too
trivial, no literature citation too biased or too egotistical, no design too
warped, no methodology too bungled, no presentation of results too
inaccurate, too obscure, and too contradictory, no analysis too
self-serving,
no argument too circular, no conclusions too trifling or too unjustified,
and
no grammar and syntax too offensive for a paper to end up in print. --
Drummond Rennie
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070209/e7204f2f/attachment.htm>

From bsmith at mcs.anl.gov  Fri Feb  9 19:27:33 2007
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Fri, 9 Feb 2007 19:27:33 -0600 (CST)
Subject: understanding the output from -info
In-Reply-To: <a9f269830702091715qfe72360nf4657f0019b858cf@mail.gmail.com>
References: <804ab5d40702080747g69ee8b44h54cce509177ee0a8@mail.gmail.com> 
 <804ab5d40702090434w4f0674e6s1c936cb410f3744a@mail.gmail.com> 
 <Pine.OSX.4.64.0702090801030.20722@barry-smiths-computer.local> 
 <804ab5d40702090620u5cf86c51s4e1b7b724eaf4f98@mail.gmail.com> 
 <a9f269830702090659n323f9381vf9b963aee438244e@mail.gmail.com> 
 <804ab5d40702090724n73db6f8w574622903161eb4a@mail.gmail.com> 
 <a9f269830702090727q2e875049v8007b5282d1455f0@mail.gmail.com> 
 <804ab5d40702090816qb6d1325g1d311a0eb53eec26@mail.gmail.com> 
 <a9f269830702090820ha6a989br4ef18bedc1662665@mail.gmail.com> 
 <804ab5d40702091651h6265a510jf5d4ca46cd526876@mail.gmail.com>
 <a9f269830702091715qfe72360nf4657f0019b858cf@mail.gmail.com>
Message-ID: <Pine.OSX.4.64.0702091921170.20722@barry-smiths-computer.local>


  Ben,

1) 

> >
> >
> >       ##########################################################
> >       #                                                        #
> >       #                          WARNING!!!                    #
> >       #                                                        #
> >       #   This code was compiled with a debugging option,      #
> >       #   To get timing results run config/configure.py        #
> >       #   using --with-debugging=no, the performance will      #
> >       #   be generally two or three times faster.              #
> >       #                                                        #
> >       ##########################################################

2) In general to get any decent parallel performance you need to have at 
least 10,000 unknowns per process.

3) It is important that each proces have roughly the same number of nonzeros
in the matrix.

> > Event                Count      Time (sec)
> > Flops/sec                         --- Global ---  --- Stage ---   Total
> >                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
> > MatSolve            3967 1.0 2.5914e+00 1.9 7.99e+07 1.9 0.0e+00 0.0e+00
                                           ^^^^^^
  One process is taking 1.9 times for the matsolves then the fastest one. Since the
MatSolves are not parallel this likely means that the "slow process" has much more
nonzeros thant the "fast process"

   Barry

On Fri, 9 Feb 2007, Matthew Knepley wrote:

> 1) These MFlop rates are terrible. It seems like your problem is way too
> small.
> 
> 2) The load balance is not good.
> 
>   Matt
> 
> On 2/9/07, Ben Tay <zonexo at gmail.com> wrote:
> > 
> > Ya, that's the mistake. I changed part of the code resulting in
> > PetscFinalize not being called.
> > 
> > Here's the output:
> > 
> > 
> > ---------------------------------------------- PETSc Performance Summary:
> > ----------------------------------------------
> > 
> > /home/enduser/g0306332/ns2d/a.out on a linux-mpi named
> > atlas00.nus.edu.sgwith 4 processors, by g0306332 Sat Feb 10 08:32:08 2007
> > Using Petsc Release Version 2.3.2, Patch 8, Tue Jan  2 14:33:59 PST 2007
> > HG revision: ebeddcedcc065e32fc252af32cf1d01ed4fc7a80
> > 
> >                          Max       Max/Min        Avg      Total
> > Time (sec):           2.826e+02      2.08192   1.725e+02
> > Objects:              1.110e+02      1.00000   1.110e+02
> > Flops:                6.282e+08       1.00736   6.267e+08  2.507e+09
> > Flops/sec:            4.624e+06      2.08008   4.015e+06  1.606e+07
> > Memory:               1.411e+07      1.01142              5.610e+07
> > MPI Messages:         8.287e+03      1.90156    6.322e+03  2.529e+04
> > MPI Message Lengths:  6.707e+07      1.11755   1.005e+04  2.542e+08
> > MPI Reductions:       3.112e+03      1.00000
> > 
> > Flop counting convention: 1 flop = 1 real number operation of type
> > (multiply/divide/add/subtract)
> >                             e.g., VecAXPY() for real vectors of length N
> > --> 2N flops
> >                             and VecAXPY() for complex vectors of length N
> > --> 8N flops
> > 
> > Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages
> > ---  -- Message Lengths --  -- Reductions --
> >                         Avg     %Total     Avg     %Total   counts
> > %Total     Avg         %Total   counts   %Total
> >  0:      Main Stage: 1.7247e+02 100.0%  2.5069e+09 100.0%  2.529e+04
> > 100.0%  1.005e+04      100.0%  1.245e+04 100.0%
> > 
> > 
> > 
> > ------------------------------------------------------------------------------------------------------------------------
> > See the 'Profiling' chapter of the users' manual for details on
> > interpreting output.
> > Phase summary info:
> >    Count: number of times phase was executed
> >    Time and Flops/sec: Max - maximum over all processors
> >                        Ratio - ratio of maximum to minimum over all
> > processors
> >    Mess: number of messages sent
> >    Avg. len: average message length
> >    Reduct: number of global reductions
> >    Global: entire computation
> >    Stage: stages of a computation. Set stages with PetscLogStagePush() and
> > PetscLogStagePop().
> >       %T - percent time in this phase         %F - percent flops in this
> > phase
> >       %M - percent messages in this phase     %L - percent message lengths
> > in this phase
> >       %R - percent reductions in this phase
> >    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
> > over all processors)
> > 
> > 
> > ------------------------------------------------------------------------------------------------------------------------
> > 
> > 
> >       ##########################################################
> >       #                                                        #
> >       #                          WARNING!!!                    #
> >       #                                                        #
> >       #   This code was compiled with a debugging option,      #
> >       #   To get timing results run config/configure.py        #
> >       #   using --with-debugging=no, the performance will      #
> >       #   be generally two or three times faster.              #
> >       #                                                        #
> >       ##########################################################
> > 
> > 
> > 
> > 
> >       ##########################################################
> > 
> > 
> > 
> >      ##########################################################
> >       #                                                        #
> >       #                          WARNING!!!                    #
> >       #                                                        #
> >       #   This code was run without the PreLoadBegin()         #
> >       #   macros. To get timing results we always recommend    #
> >       #   preloading. otherwise timing numbers may be          #
> >       #   meaningless.                                         #
> >       ##########################################################
> > 
> > 
> > Event                Count      Time (sec)
> > Flops/sec                         --- Global ---  --- Stage ---   Total
> >                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
> > Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
> > 
> > 
> > ------------------------------------------------------------------------------------------------------------------------
> > 
> > --- Event Stage 0: Main Stage
> > 
> > MatMult             3927 1.0 2.4071e+01 1.3 6.14e+06 1.4 2.4e+04 1.3e+03
> > 0.0e+00 12 18 93 12  0  12 18 93 12  0    19
> > MatSolve            3967 1.0 2.5914e+00 1.9 7.99e+07 1.9 0.0e+00 0.0e+00
> > 0.0e+00  1 17  0  0  0   1 17  0  0  0   168
> > MatLUFactorNum        40 1.0 4.4779e-01 1.5 3.14e+07 1.5 0.0e+00 0.0e+00
> > 0.0e+00  0  2  0  0  0   0  2  0  0  0    85
> > MatILUFactorSym        2 1.0 3.1099e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> > 4.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > MatScale              20 1.0 1.1487e-01 8.7 8.73e+07 8.9 0.0e+00 0.0e+00
> > 0.0e+00  0  0  0  0  0   0  0  0  0  0    39
> > MatAssemblyBegin      40 1.0 7.8844e+00 1.3 0.00e+00 0.0 7.6e+02 2.8e+05
> > 8.0e+01  4  0  3 83  1   4  0  3 83  1     0
> > MatAssemblyEnd        40 1.0 6.9408e+00 1.2 0.00e+00 0.0 1.2e+01 9.6e+02
> > 6.4e+01  4  0  0  0  1   4  0  0  0  1     0
> > MatGetOrdering         2 1.0 8.0509e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> > 6.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > MatZeroEntries        21 1.0 1.4379e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
> > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > VecMDot             3792 1.0 4.7372e+01 1.4 5.20e+06 1.4 0.0e+00 0.0e+00
> > 3.8e+03 24 29  0  0 30  24 29  0  0 30    15
> > VecNorm             3967 1.0 3.9513e+01 1.2 4.11e+05 1.2 0.0e+00 0.0e+00
> > 4.0e+03 21  2  0  0 32  21  2  0  0 32     1
> > VecScale            3947 1.0 3.4941e-02 1.2 2.18e+08 1.2 0.0e+00 0.0e+00
> > 0.0e+00  0  1  0  0  0   0  1  0  0  0   738
> > VecCopy              155 1.0 1.0029e-0125.1 0.00e+00 0.0 0.0e+00 0.0e+00
> > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > VecSet              4142 1.0 3.4638e-01 6.6 0.00e+00 0.0 0.0e+00 0.0e+00
> > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > VecAXPY              290 1.0 5.9618e-03 1.2 2.14e+08 1.2 0.0e+00 0.0e+00
> > 0.0e+00  0  0  0  0  0   0  0  0  0  0   709
> > VecMAXPY            3947 1.0 1.5566e+00 1.3 1.64e+08 1.3 0.0e+00 0.0e+00
> > 0.0e+00  1 31  0  0  0   1 31  0  0  0   498
> > VecAssemblyBegin      80 1.0 4.1793e+00 1.1 0.00e+00 0.0 9.6e+02 1.4e+04
> > 2.4e+02  2  0  4  5  2   2  0  4  5  2     0
> > VecAssemblyEnd        80 1.0 2.0682e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
> > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > VecScatterBegin     3927 1.0 2.8672e-01 3.9 0.00e+00 0.0 2.4e+04 1.3e+03
> > 0.0e+00  0  0 93 12  0   0  0 93 12  0     0
> > VecScatterEnd       3927 1.0 2.2135e+01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00
> > 0.0e+00 11  0  0  0  0  11  0  0  0  0     0
> > VecNormalize        3947 1.0 3.9593e+01 1.2 6.11e+05 1.2 0.0e+00 0.0e+00
> > 3.9e+03 21  3  0  0 32  21  3  0  0 32     2
> > KSPGMRESOrthog      3792 1.0 4.8670e+01 1.3 9.92e+06 1.3 0.0e+00 0.0e+00
> > 3.8e+03 25 58  0  0 30  25 58  0  0 30    30
> > KSPSetup              80 1.0 2.0014e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> > 2.0e+01  0  0  0  0  0   0  0  0  0  0     0
> > KSPSolve              40 1.0 1.0660e+02 1.0 5.90e+06 1.0 2.4e+04 1.3e+03
> > 1.2e+04 62100 93 12 97  62100 93 12 97    23
> > PCSetUp               80 1.0 4.5669e-01 1.5 3.05e+07 1.5 0.0e+00 0.0e+00
> > 1.4e+01  0  2  0  0  0   0  2  0  0  0    83
> > PCSetUpOnBlocks       40 1.0 4.5418e-01 1.5 3.07e+07 1.5 0.0e+00 0.0e+00
> > 1.0e+01  0  2  0  0  0   0  2  0  0  0    84
> > PCApply             3967 1.0 4.1737e+00 2.0 5.30e+07 2.0 0.0e+00 0.0e+00
> > 4.0e+03  2 17  0  0 32   2 17  0  0 32   104
> > 
> > ------------------------------------------------------------------------------------------------------------------------
> > 
> > 
> > 
> > Memory usage is given in bytes:
> > 
> > Object Type          Creations   Destructions   Memory  Descendants' Mem.
> > 
> > --- Event Stage 0: Main Stage
> > 
> >               Matrix     8              8      21136     0
> >            Index Set    12             12      74952     0
> >                  Vec    81             81    1447476     0
> >          Vec Scatter     2              2          0     0
> >        Krylov Solver     4              4      33760     0
> >       Preconditioner     4              4        392     0
> > ========================================================================================================================
> > 
> > Average time to get PetscTime(): 1.09673e-06
> > Average time for MPI_Barrier(): 3.90053e-05
> > Average time for zero size MPI_Send(): 1.65105e-05
> > OptionTable: -log_summary
> > Compiled without FORTRAN kernels
> > Compiled with full precision matrices (default)
> > sizeof(short) 2 sizeof(int) 4 sizeof(long) 4 sizeof(void*) 4
> > sizeof(PetscScalar) 8
> > Configure run at: Thu Jan 18 12:23:31 2007
> > Configure options: --with-vendor-compilers=intel --with-x=0 --with-shared
> > --with-blas-lapack-dir=/lsftmp/g0306332/inter/mkl/lib/32
> > --with-mpi-dir=/opt/mpich/myrinet/intel/
> > -----------------------------------------
> > Libraries compiled on Thu Jan 18 12:24:41 SGT 2007 on atlas1.nus.edu.sg
> > Machine characteristics: Linux atlas1.nus.edu.sg 2.4.21-20.ELsmp #1 SMP
> > Wed Sep 8 17:29:34 GMT 2004 i686 i686 i386 GNU/Linux
> > Using PETSc directory: /nas/lsftmp/g0306332/petsc-2.3.2-p8
> > Using PETSc arch: linux-mpif90
> > -----------------------------------------
> > Using C compiler: /opt/mpich/myrinet/intel/bin/mpicc -fPIC -g
> > Using Fortran compiler: /opt/mpich/myrinet/intel/bin/mpif90 -I. -fPIC -g
> > -w90 -w
> > -----------------------------------------
> > Using include paths: -I/nas/lsftmp/g0306332/petsc-
> > 2.3.2-p8-I/nas/lsftmp/g0306332/petsc-
> > 2.3.2-p8/bmake/linux-mpif90 -I/nas/lsftmp/g0306332/petsc-2.3.2-p8/include
> > -I/opt/mpich/myrinet/intel/include
> > ------------------------------------------
> > Using C linker: /opt/mpich/myrinet/intel/bin/mpicc -fPIC -g
> > Using Fortran linker: /opt/mpich/myrinet/intel/bin/mpif90 -I. -fPIC -g
> > -w90 -w
> > Using libraries:
> > -Wl,-rpath,/nas/lsftmp/g0306332/petsc-2.3.2-p8/lib/linux-mpif90
> > -L/nas/lsftmp/g0306332/petsc-2.3.2-p8/lib/linux-mpif90 -lpetscts
> > -lpetscsnes -lpetscksp -lpetscdm -lpetscmat -lpetscvec -lpetsc
> > -Wl,-rpath,/lsftmp/g0306332/inter/mkl/lib/32
> > -L/lsftmp/g0306332/inter/mkl/lib/32 -lmkl_lapack -lmkl_ia32 -lguide
> > -lPEPCF90 -Wl,-rpath,/opt/intel/compiler70/ia32/lib
> > -Wl,-rpath,/opt/mpich/myrinet/intel/lib -L/opt/mpich/myrinet/intel/lib
> > -Wl,-rpath,-rpath -Wl,-rpath,-ldl -L-ldl -lmpich -Wl,-rpath,-L -lgm
> > -lpthread -Wl,-rpath,/opt/intel/compiler70/ia32/lib
> > -Wl,-rpath,/opt/intel/compiler70/ia32/lib -L/opt/intel/compiler70/ia32/lib
> > -Wl,-rpath,/usr/lib -Wl,-rpath,/usr/lib -L/usr/lib -limf -lirc -lcprts -lcxa
> > -lunwind -ldl -lmpichf90 -Wl,-rpath,/opt/gm/lib -L/opt/gm/lib -lPEPCF90
> > -Wl,-rpath,/opt/intel/compiler70/ia32/lib -L/opt/intel/compiler70/ia32/lib
> > -Wl,-rpath,/usr/lib -L/usr/lib -lintrins -lIEPCF90 -lF90 -lm  -Wl,-rpath,\
> > -Wl,-rpath,\ -L\ -ldl -lmpich -Wl,-rpath,\ -L\ -lgm -lpthread
> > -Wl,-rpath,/opt/intel/compiler70/ia32/lib -L/opt/intel/compiler70/ia32/lib
> > -Wl,-rpath,/usr/lib -L/usr/lib -limf -lirc -lcprts -lcxa -lunwind -ldl
> > 
> > 
> > This is the result I get for running 20 steps. There are 2 matrix to be
> > solved. I've only parallize the solving of linear equations and kept the
> > rest of the code serial for this test. However, I found that it's much
> > slower than the sequential version.
> > 
> > From the ratio, it seems that MatScale and VecSet 's ratio are very high.
> > I've done a scaling of 0.5 for momentum eqn. Is that the reason for the
> > slowness? That is all I can decipher ....
> > 
> > Thank you.
> > 
> > 
> > 
> > 
> > 
> > On 2/10/07, Matthew Knepley <knepley at gmail.com> wrote:
> > >
> > > On 2/9/07, Ben Tay < zonexo at gmail.com> wrote:
> > > >
> > > > ops.... it worked for ex2 and ex2f  ;-)
> > > >
> > > > so what could be wrong? is there some commands or subroutine which i
> > > > must call? btw, i'm programming in fortran.
> > > >
> > >
> > > Yes, you must call PetscFinalize() in your code.
> > >
> > >   Matt
> > >
> > >
> > >  thank you.
> > > >
> > > >
> > > > On 2/9/07, Matthew Knepley <knepley at gmail.com > wrote:
> > > > >
> > > > > Problems do not go away by ignoring them. Something is wrong here,
> > > > > and it may
> > > > > affect the rest of your program. Please try to run an example:
> > > > >
> > > > >   cd src/ksp/ksp/examples/tutorials
> > > > >   make ex2
> > > > >   ./ex2 -log_summary
> > > > >
> > > > >      Matt
> > > > >
> > > > > On 2/9/07, Ben Tay <zonexo at gmail.com> wrote:
> > > > > >
> > > > > > Well, I don't know what's wrong. I did the same thing for -info
> > > > > > and it worked. Anyway, is there any other way?
> > > > > >
> > > > > > Like I can use -mat_view or call matview( ... ) to view a matrix.
> > > > > > Is there a similar subroutine for me to call?
> > > > > >
> > > > > > Thank you.
> > > > > >
> > > > > >
> > > > > >  On 2/9/07, Matthew Knepley <knepley at gmail.com > wrote:
> > > > > > >
> > > > > > > Impossible, please check the spelling, and make sure your
> > > > > > > command line was not truncated.
> > > > > > >
> > > > > > >   Matt
> > > > > > >
> > > > > > > On 2/9/07, Ben Tay < zonexo at gmail.com> wrote:
> > > > > > > >
> > > > > > > > ya, i did use -log_summary. but no output.....
> > > > > > > >
> > > > > > > > On 2/9/07, Barry Smith <bsmith at mcs.anl.gov> wrote:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > -log_summary
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Fri, 9 Feb 2007, Ben Tay wrote:
> > > > > > > > >
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > I've tried to use log_summary but nothing came out? Did I
> > > > > > > > > miss out
> > > > > > > > > > something? It worked when I used -info...
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On 2/9/07, Lisandro Dalcin <dalcinl at gmail.com > wrote:
> > > > > > > > > > >
> > > > > > > > > > > On 2/8/07, Ben Tay < zonexo at gmail.com> wrote:
> > > > > > > > > > > > i'm trying to solve my cfd code using PETSc in
> > > > > > > > > parallel. Besides the
> > > > > > > > > > > linear
> > > > > > > > > > > > eqns for PETSc, other parts of the code has also been
> > > > > > > > > parallelized using
> > > > > > > > > > > > MPI.
> > > > > > > > > > >
> > > > > > > > > > > Finite elements or finite differences, or what?
> > > > > > > > > > >
> > > > > > > > > > > > however i find that the parallel version of the code
> > > > > > > > > running on 4
> > > > > > > > > > > processors
> > > > > > > > > > > > is even slower than the sequential version.
> > > > > > > > > > >
> > > > > > > > > > > Can you monitor the convergence and iteration count of
> > > > > > > > > momentum and
> > > > > > > > > > > poisson steps?
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > in order to find out why, i've used the -info option
> > > > > > > > > to print out the
> > > > > > > > > > > > details. there are 2 linear equations being solved -
> > > > > > > > > momentum and
> > > > > > > > > > > poisson.
> > > > > > > > > > > > the momentum one is twice the size of the poisson. it
> > > > > > > > > is shown below:
> > > > > > > > > > >
> > > > > > > > > > > Can you use -log_summary command line option and send
> > > > > > > > > the output attached?
> > > > > > > > > > >
> > > > > > > > > > > > i saw some statements stating "seq". am i running in
> > > > > > > > > sequential or
> > > > > > > > > > > parallel
> > > > > > > > > > > > mode? have i preallocated too much space?
> > > > > > > > > > >
> > > > > > > > > > > It seems you are running in parallel. The "Seq" are
> > > > > > > > > related to local,
> > > > > > > > > > > internal objects. In PETSc, parallel matrices have inner
> > > > > > > > > sequential
> > > > > > > > > > > matrices.
> > > > > > > > > > >
> > > > > > > > > > > > lastly, if Ax=b, A_sta and A_end
> > > > > > > > > from  MatGetOwnershipRange and b_sta
> > > > > > > > > > > and
> > > > > > > > > > > > b_end from VecGetOwnershipRange should always be the
> > > > > > > > > same value, right?
> > > > > > > > > > >
> > > > > > > > > > > I should. If not, you are likely going to get an runtime
> > > > > > > > > error.
> > > > > > > > > > >
> > > > > > > > > > > Regards,
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > Lisandro Dalc?n
> > > > > > > > > > > ---------------
> > > > > > > > > > > Centro Internacional de M?todos Computacionales en
> > > > > > > > > Ingenier?a (CIMEC)
> > > > > > > > > > > Instituto de Desarrollo Tecnol?gico para la Industria
> > > > > > > > > Qu?mica (INTEC)
> > > > > > > > > > > Consejo Nacional de Investigaciones Cient?ficas y
> > > > > > > > > T?cnicas (CONICET)
> > > > > > > > > > > PTLC - G?emes 3450, (3000) Santa Fe, Argentina
> > > > > > > > > > > Tel/Fax: +54-(0)342-451.1594
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > One trouble is that despite this system, anyone who reads
> > > > > > > journals widely
> > > > > > > and critically is forced to realize that there are scarcely any
> > > > > > > bars to eventual
> > > > > > > publication. There seems to be no study too fragmented, no
> > > > > > > hypothesis too
> > > > > > > trivial, no literature citation too biased or too egotistical,
> > > > > > > no design too
> > > > > > > warped, no methodology too bungled, no presentation of results
> > > > > > > too
> > > > > > > inaccurate, too obscure, and too contradictory, no analysis too
> > > > > > > self-serving,
> > > > > > > no argument too circular, no conclusions too trifling or too
> > > > > > > unjustified, and
> > > > > > > no grammar and syntax too offensive for a paper to end up in
> > > > > > > print. -- Drummond Rennie
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > One trouble is that despite this system, anyone who reads journals
> > > > > widely
> > > > > and critically is forced to realize that there are scarcely any bars
> > > > > to eventual
> > > > > publication. There seems to be no study too fragmented, no
> > > > > hypothesis too
> > > > > trivial, no literature citation too biased or too egotistical, no
> > > > > design too
> > > > > warped, no methodology too bungled, no presentation of results too
> > > > > inaccurate, too obscure, and too contradictory, no analysis too
> > > > > self-serving,
> > > > > no argument too circular, no conclusions too trifling or too
> > > > > unjustified, and
> > > > > no grammar and syntax too offensive for a paper to end up in print.
> > > > > -- Drummond Rennie
> > > > >
> > > >
> > > >
> > >
> > >
> > > --
> > > One trouble is that despite this system, anyone who reads journals
> > > widely
> > > and critically is forced to realize that there are scarcely any bars to
> > > eventual
> > > publication. There seems to be no study too fragmented, no hypothesis
> > > too
> > > trivial, no literature citation too biased or too egotistical, no design
> > > too
> > > warped, no methodology too bungled, no presentation of results too
> > > inaccurate, too obscure, and too contradictory, no analysis too
> > > self-serving,
> > > no argument too circular, no conclusions too trifling or too
> > > unjustified, and
> > > no grammar and syntax too offensive for a paper to end up in print. --
> > > Drummond Rennie
> > >
> > 
> > 
> 
> 
> 

From balay at mcs.anl.gov  Fri Feb  9 19:41:16 2007
From: balay at mcs.anl.gov (Satish Balay)
Date: Fri, 9 Feb 2007 19:41:16 -0600 (CST)
Subject: understanding the output from -info
In-Reply-To: <Pine.OSX.4.64.0702091921170.20722@barry-smiths-computer.local>
References: <804ab5d40702080747g69ee8b44h54cce509177ee0a8@mail.gmail.com> 
 <804ab5d40702090434w4f0674e6s1c936cb410f3744a@mail.gmail.com> 
 <Pine.OSX.4.64.0702090801030.20722@barry-smiths-computer.local> 
 <804ab5d40702090620u5cf86c51s4e1b7b724eaf4f98@mail.gmail.com> 
 <a9f269830702090659n323f9381vf9b963aee438244e@mail.gmail.com> 
 <804ab5d40702090724n73db6f8w574622903161eb4a@mail.gmail.com> 
 <a9f269830702090727q2e875049v8007b5282d1455f0@mail.gmail.com> 
 <804ab5d40702090816qb6d1325g1d311a0eb53eec26@mail.gmail.com> 
 <a9f269830702090820ha6a989br4ef18bedc1662665@mail.gmail.com> 
 <804ab5d40702091651h6265a510jf5d4ca46cd526876@mail.gmail.com>
 <a9f269830702091715qfe72360nf4657f0019b858cf@mail.gmail.com>
 <Pine.OSX.4.64.0702091921170.20722@barry-smiths-computer.local>
Message-ID: <Pine.LNX.4.64.0702091940080.3665@asterix>

Looks like MatMult = 24sec Out of this the scatter time is: 22sec.
Either something is wrong with your run - or MPI is really broken..

Satish

> > > MatMult             3927 1.0 2.4071e+01 1.3 6.14e+06 1.4 2.4e+04 1.3e+03
> > > VecScatterBegin     3927 1.0 2.8672e-01 3.9 0.00e+00 0.0 2.4e+04 1.3e+03
> > > VecScatterEnd       3927 1.0 2.2135e+01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00



From jinzishuai at yahoo.com  Fri Feb  9 20:42:29 2007
From: jinzishuai at yahoo.com (Shi Jin)
Date: Fri, 9 Feb 2007 18:42:29 -0800 (PST)
Subject: A 3D example of KSPSolve?
In-Reply-To: <Pine.OSX.4.64.0702091850430.20722@barry-smiths-computer.local>
Message-ID: <790923.93477.qm@web36208.mail.mud.yahoo.com>

Thank you.
But my code has 10 calls to KSPSolve of three
different linear systems at each time update. Should I
strip it down to a single KSPSolve so that it is
easier to analysis? I might have the code dump the
Matrix and vector and write another code to read them
into and call KSPSolve. I don't know whether this is
worth doing  or should I just send in the messy log
file of the whole run.
Thanks for any advice.

Shi

--- Barry Smith <bsmith at mcs.anl.gov> wrote:

> 
>   Shi,
> 
>    There is never a better test problem then your
> actual problem.
> Send the results from running on 1, 4, and 8
> processes with the options
> -log_summary -ksp_view (use the optimized version of
> PETSc (running 
> config/configure.py --with-debugging=0))
> 
>   Barry
> 
> 
> On Fri, 9 Feb 2007, Shi Jin wrote:
> 
> > Hi there,
> > 
> > I am tuning our 3D FEM CFD code written with
> PETSc.
> > The code doesn't scale very well. For example,
> with 8
> > processes on a linux cluster, the speedup we
> achieve
> > with a fairly large problem size(million of
> elements)
> > is only 3 to 4 using the Congugate gradient
> solver. We
> > can achieve a speed up of a 6.5 using a GMRes
> solver
> > but the wall clock time of a GMRes is longer than
> a CG
> > solver which indicates that CG is the faster
> solver
> > and it scales not as good as GMRes. Is this
> generally
> > true?
> > 
> > I then went to the examples and find a 2D example
> of
> > KSPSolve (ex2.c). I let the code ran with a
> 1000x1000
> > mesh and get a linear scaling of the CG solver and
> a
> > super linear scaling of the GMRes. These are both
> much
> > better than our code. However, I think the 2D
> nature
> > of the sample problem might help the scaling of
> the
> > code. So I would like to try some 3D example using
> the
> > KSPSolve. Unfortunately, I couldn't find such an
> > example either in the
> src/ksp/ksp/examples/tutorials
> > directory or by google search. There are a couple
> of
> > 3D examples in the src/ksp/ksp/examples/tutorials
> but
> > they   are about the SNES not KSPSolve. If anyone
> can
> > provide me with such an example, I would really
> > appreciate it.
> > Thanks a lot.
> > 
> > Shi
> > 
> > 
> >  
> >
>
____________________________________________________________________________________
> > Finding fabulous fares is fun.  
> > Let Yahoo! FareChase search your favorite travel
> sites to find flight and hotel bargains.
> > http://farechase.yahoo.com/promo-generic-14795097
> > 
> > 
> 
> 



 
____________________________________________________________________________________
8:00? 8:25? 8:40? Find a flick in no time 
with the Yahoo! Search movie showtime shortcut.
http://tools.search.yahoo.com/shortcuts/#news



From bsmith at mcs.anl.gov  Fri Feb  9 20:47:17 2007
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Fri, 9 Feb 2007 20:47:17 -0600 (CST)
Subject: A 3D example of KSPSolve?
In-Reply-To: <790923.93477.qm@web36208.mail.mud.yahoo.com>
References: <790923.93477.qm@web36208.mail.mud.yahoo.com>
Message-ID: <Pine.OSX.4.64.0702092045240.20722@barry-smiths-computer.local>


  NO, NO, don't spend time stripping your code! Unproductive

  See the manul pages for PetscLogStageRegister(), PetscLogStagePush() and 
PetscLogStagePop(). All you need to do is maintain a seperate stage for each
of your KSPSolves; in your case you'll create 3 stages.

   Barry

On Fri, 9 Feb 2007, Shi Jin wrote:

> Thank you.
> But my code has 10 calls to KSPSolve of three
> different linear systems at each time update. Should I
> strip it down to a single KSPSolve so that it is
> easier to analysis? I might have the code dump the
> Matrix and vector and write another code to read them
> into and call KSPSolve. I don't know whether this is
> worth doing  or should I just send in the messy log
> file of the whole run.
> Thanks for any advice.
> 
> Shi
> 
> --- Barry Smith <bsmith at mcs.anl.gov> wrote:
> 
> > 
> >   Shi,
> > 
> >    There is never a better test problem then your
> > actual problem.
> > Send the results from running on 1, 4, and 8
> > processes with the options
> > -log_summary -ksp_view (use the optimized version of
> > PETSc (running 
> > config/configure.py --with-debugging=0))
> > 
> >   Barry
> > 
> > 
> > On Fri, 9 Feb 2007, Shi Jin wrote:
> > 
> > > Hi there,
> > > 
> > > I am tuning our 3D FEM CFD code written with
> > PETSc.
> > > The code doesn't scale very well. For example,
> > with 8
> > > processes on a linux cluster, the speedup we
> > achieve
> > > with a fairly large problem size(million of
> > elements)
> > > is only 3 to 4 using the Congugate gradient
> > solver. We
> > > can achieve a speed up of a 6.5 using a GMRes
> > solver
> > > but the wall clock time of a GMRes is longer than
> > a CG
> > > solver which indicates that CG is the faster
> > solver
> > > and it scales not as good as GMRes. Is this
> > generally
> > > true?
> > > 
> > > I then went to the examples and find a 2D example
> > of
> > > KSPSolve (ex2.c). I let the code ran with a
> > 1000x1000
> > > mesh and get a linear scaling of the CG solver and
> > a
> > > super linear scaling of the GMRes. These are both
> > much
> > > better than our code. However, I think the 2D
> > nature
> > > of the sample problem might help the scaling of
> > the
> > > code. So I would like to try some 3D example using
> > the
> > > KSPSolve. Unfortunately, I couldn't find such an
> > > example either in the
> > src/ksp/ksp/examples/tutorials
> > > directory or by google search. There are a couple
> > of
> > > 3D examples in the src/ksp/ksp/examples/tutorials
> > but
> > > they   are about the SNES not KSPSolve. If anyone
> > can
> > > provide me with such an example, I would really
> > > appreciate it.
> > > Thanks a lot.
> > > 
> > > Shi
> > > 
> > > 
> > >  
> > >
> >
> ____________________________________________________________________________________
> > > Finding fabulous fares is fun.  
> > > Let Yahoo! FareChase search your favorite travel
> > sites to find flight and hotel bargains.
> > > http://farechase.yahoo.com/promo-generic-14795097
> > > 
> > > 
> > 
> > 
> 
> 
> 
>  
> ____________________________________________________________________________________
> 8:00? 8:25? 8:40? Find a flick in no time 
> with the Yahoo! Search movie showtime shortcut.
> http://tools.search.yahoo.com/shortcuts/#news
> 
> 



From jinzishuai at yahoo.com  Fri Feb  9 21:01:09 2007
From: jinzishuai at yahoo.com (Shi Jin)
Date: Fri, 9 Feb 2007 19:01:09 -0800 (PST)
Subject: A 3D example of KSPSolve?
In-Reply-To: <Pine.OSX.4.64.0702092045240.20722@barry-smiths-computer.local>
Message-ID: <867640.48509.qm@web36210.mail.mud.yahoo.com>

Dear Barry,

Thank you.
I actually have done the staging already.
I summarized the timing of the runs in google online
spreadsheets. I have two runs.
1. with 400,000 finite elements: 
http://spreadsheets.google.com/pub?key=pZHoqlL60quZeDZlucTjEIA
2. with 1,600,000 finite elements:
http://spreadsheets.google.com/pub?key=pZHoqlL60quZcCVLAqmzqQQ

If you can take a look at them and give me some
advice, I will be deeply grateful.

Shi
--- Barry Smith <bsmith at mcs.anl.gov> wrote:

> 
>   NO, NO, don't spend time stripping your code!
> Unproductive
> 
>   See the manul pages for PetscLogStageRegister(),
> PetscLogStagePush() and 
> PetscLogStagePop(). All you need to do is maintain a
> seperate stage for each
> of your KSPSolves; in your case you'll create 3
> stages.
> 
>    Barry
> 
> On Fri, 9 Feb 2007, Shi Jin wrote:
> 
> > Thank you.
> > But my code has 10 calls to KSPSolve of three
> > different linear systems at each time update.
> Should I
> > strip it down to a single KSPSolve so that it is
> > easier to analysis? I might have the code dump the
> > Matrix and vector and write another code to read
> them
> > into and call KSPSolve. I don't know whether this
> is
> > worth doing  or should I just send in the messy
> log
> > file of the whole run.
> > Thanks for any advice.
> > 
> > Shi
> > 
> > --- Barry Smith <bsmith at mcs.anl.gov> wrote:
> > 
> > > 
> > >   Shi,
> > > 
> > >    There is never a better test problem then
> your
> > > actual problem.
> > > Send the results from running on 1, 4, and 8
> > > processes with the options
> > > -log_summary -ksp_view (use the optimized
> version of
> > > PETSc (running 
> > > config/configure.py --with-debugging=0))
> > > 
> > >   Barry
> > > 
> > > 
> > > On Fri, 9 Feb 2007, Shi Jin wrote:
> > > 
> > > > Hi there,
> > > > 
> > > > I am tuning our 3D FEM CFD code written with
> > > PETSc.
> > > > The code doesn't scale very well. For example,
> > > with 8
> > > > processes on a linux cluster, the speedup we
> > > achieve
> > > > with a fairly large problem size(million of
> > > elements)
> > > > is only 3 to 4 using the Congugate gradient
> > > solver. We
> > > > can achieve a speed up of a 6.5 using a GMRes
> > > solver
> > > > but the wall clock time of a GMRes is longer
> than
> > > a CG
> > > > solver which indicates that CG is the faster
> > > solver
> > > > and it scales not as good as GMRes. Is this
> > > generally
> > > > true?
> > > > 
> > > > I then went to the examples and find a 2D
> example
> > > of
> > > > KSPSolve (ex2.c). I let the code ran with a
> > > 1000x1000
> > > > mesh and get a linear scaling of the CG solver
> and
> > > a
> > > > super linear scaling of the GMRes. These are
> both
> > > much
> > > > better than our code. However, I think the 2D
> > > nature
> > > > of the sample problem might help the scaling
> of
> > > the
> > > > code. So I would like to try some 3D example
> using
> > > the
> > > > KSPSolve. Unfortunately, I couldn't find such
> an
> > > > example either in the
> > > src/ksp/ksp/examples/tutorials
> > > > directory or by google search. There are a
> couple
> > > of
> > > > 3D examples in the
> src/ksp/ksp/examples/tutorials
> > > but
> > > > they   are about the SNES not KSPSolve. If
> anyone
> > > can
> > > > provide me with such an example, I would
> really
> > > > appreciate it.
> > > > Thanks a lot.
> > > > 
> > > > Shi
> > > > 
> > > > 
> > > >  
> > > >
> > >
> >
>
____________________________________________________________________________________
> > > > Finding fabulous fares is fun.  
> > > > Let Yahoo! FareChase search your favorite
> travel
> > > sites to find flight and hotel bargains.
> > > >
> http://farechase.yahoo.com/promo-generic-14795097
> > > > 
> > > > 
> > > 
> > > 
> > 
> > 
> > 
> >  
> >
>
____________________________________________________________________________________
> > 8:00? 8:25? 8:40? Find a flick in no time 
> > with the Yahoo! Search movie showtime shortcut.
> > http://tools.search.yahoo.com/shortcuts/#news
> > 
> > 
> 
> 



 
____________________________________________________________________________________
Looking for earth-friendly autos? 
Browse Top Cars by "Green Rating" at Yahoo! Autos' Green Center.
http://autos.yahoo.com/green_center/



From knepley at gmail.com  Fri Feb  9 21:06:43 2007
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 9 Feb 2007 21:06:43 -0600
Subject: A 3D example of KSPSolve?
In-Reply-To: <867640.48509.qm@web36210.mail.mud.yahoo.com>
References: <Pine.OSX.4.64.0702092045240.20722@barry-smiths-computer.local>
	 <867640.48509.qm@web36210.mail.mud.yahoo.com>
Message-ID: <a9f269830702091906m7fdb1076m4be9ddeda2f798d4@mail.gmail.com>

You really have to give us the log summary output. None of the relevant
numbers are in your summary.

 Thanks,

   Matt

On 2/9/07, Shi Jin <jinzishuai at yahoo.com> wrote:
>
> Dear Barry,
>
> Thank you.
> I actually have done the staging already.
> I summarized the timing of the runs in google online
> spreadsheets. I have two runs.
> 1. with 400,000 finite elements:
> http://spreadsheets.google.com/pub?key=pZHoqlL60quZeDZlucTjEIA
> 2. with 1,600,000 finite elements:
> http://spreadsheets.google.com/pub?key=pZHoqlL60quZcCVLAqmzqQQ
>
> If you can take a look at them and give me some
> advice, I will be deeply grateful.
>
> Shi
> --- Barry Smith <bsmith at mcs.anl.gov> wrote:
>
> >
> >   NO, NO, don't spend time stripping your code!
> > Unproductive
> >
> >   See the manul pages for PetscLogStageRegister(),
> > PetscLogStagePush() and
> > PetscLogStagePop(). All you need to do is maintain a
> > seperate stage for each
> > of your KSPSolves; in your case you'll create 3
> > stages.
> >
> >    Barry
> >
> > On Fri, 9 Feb 2007, Shi Jin wrote:
> >
> > > Thank you.
> > > But my code has 10 calls to KSPSolve of three
> > > different linear systems at each time update.
> > Should I
> > > strip it down to a single KSPSolve so that it is
> > > easier to analysis? I might have the code dump the
> > > Matrix and vector and write another code to read
> > them
> > > into and call KSPSolve. I don't know whether this
> > is
> > > worth doing  or should I just send in the messy
> > log
> > > file of the whole run.
> > > Thanks for any advice.
> > >
> > > Shi
> > >
> > > --- Barry Smith <bsmith at mcs.anl.gov> wrote:
> > >
> > > >
> > > >   Shi,
> > > >
> > > >    There is never a better test problem then
> > your
> > > > actual problem.
> > > > Send the results from running on 1, 4, and 8
> > > > processes with the options
> > > > -log_summary -ksp_view (use the optimized
> > version of
> > > > PETSc (running
> > > > config/configure.py --with-debugging=0))
> > > >
> > > >   Barry
> > > >
> > > >
> > > > On Fri, 9 Feb 2007, Shi Jin wrote:
> > > >
> > > > > Hi there,
> > > > >
> > > > > I am tuning our 3D FEM CFD code written with
> > > > PETSc.
> > > > > The code doesn't scale very well. For example,
> > > > with 8
> > > > > processes on a linux cluster, the speedup we
> > > > achieve
> > > > > with a fairly large problem size(million of
> > > > elements)
> > > > > is only 3 to 4 using the Congugate gradient
> > > > solver. We
> > > > > can achieve a speed up of a 6.5 using a GMRes
> > > > solver
> > > > > but the wall clock time of a GMRes is longer
> > than
> > > > a CG
> > > > > solver which indicates that CG is the faster
> > > > solver
> > > > > and it scales not as good as GMRes. Is this
> > > > generally
> > > > > true?
> > > > >
> > > > > I then went to the examples and find a 2D
> > example
> > > > of
> > > > > KSPSolve (ex2.c). I let the code ran with a
> > > > 1000x1000
> > > > > mesh and get a linear scaling of the CG solver
> > and
> > > > a
> > > > > super linear scaling of the GMRes. These are
> > both
> > > > much
> > > > > better than our code. However, I think the 2D
> > > > nature
> > > > > of the sample problem might help the scaling
> > of
> > > > the
> > > > > code. So I would like to try some 3D example
> > using
> > > > the
> > > > > KSPSolve. Unfortunately, I couldn't find such
> > an
> > > > > example either in the
> > > > src/ksp/ksp/examples/tutorials
> > > > > directory or by google search. There are a
> > couple
> > > > of
> > > > > 3D examples in the
> > src/ksp/ksp/examples/tutorials
> > > > but
> > > > > they   are about the SNES not KSPSolve. If
> > anyone
> > > > can
> > > > > provide me with such an example, I would
> > really
> > > > > appreciate it.
> > > > > Thanks a lot.
> > > > >
> > > > > Shi
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>
> ____________________________________________________________________________________
> > > > > Finding fabulous fares is fun.
> > > > > Let Yahoo! FareChase search your favorite
> > travel
> > > > sites to find flight and hotel bargains.
> > > > >
> > http://farechase.yahoo.com/promo-generic-14795097
> > > > >
> > > > >
> > > >
> > > >
> > >
> > >
> > >
> > >
> > >
> >
>
> ____________________________________________________________________________________
> > > 8:00? 8:25? 8:40? Find a flick in no time
> > > with the Yahoo! Search movie showtime shortcut.
> > > http://tools.search.yahoo.com/shortcuts/#news
> > >
> > >
> >
> >
>
>
>
>
>
> ____________________________________________________________________________________
> Looking for earth-friendly autos?
> Browse Top Cars by "Green Rating" at Yahoo! Autos' Green Center.
> http://autos.yahoo.com/green_center/
>
>


-- 
One trouble is that despite this system, anyone who reads journals widely
and critically is forced to realize that there are scarcely any bars to
eventual
publication. There seems to be no study too fragmented, no hypothesis too
trivial, no literature citation too biased or too egotistical, no design too
warped, no methodology too bungled, no presentation of results too
inaccurate, too obscure, and too contradictory, no analysis too
self-serving,
no argument too circular, no conclusions too trifling or too unjustified,
and
no grammar and syntax too offensive for a paper to end up in print. --
Drummond Rennie
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070209/0b2ea7cb/attachment.htm>

From jinzishuai at yahoo.com  Fri Feb  9 21:23:10 2007
From: jinzishuai at yahoo.com (Shi Jin)
Date: Fri, 9 Feb 2007 19:23:10 -0800 (PST)
Subject: A 3D example of KSPSolve?
In-Reply-To: <a9f269830702091906m7fdb1076m4be9ddeda2f798d4@mail.gmail.com>
Message-ID: <292553.57576.qm@web36210.mail.mud.yahoo.com>

Sorry that is not informative.
So I decide to attach the 5 files for NP=1,2,4,8,16
for 
the 400,000 finite element case.

Please note that the simulation runs over 100 steps.
The 1st step is first order update, named as stage 1.
The rest 99 steps are second order updates. Within
that, stage 2-9 are created for the 8 stages of a
second order update. We should concentrate on the
second order updates. So four calls to KSPSolve in the
log file are important, in stage 4,5,6,and 8
separately.
Pleaes let me know if you need any other information
or explanation.
Thank you very much.

Shi
--- Matthew Knepley <knepley at gmail.com> wrote:

> You really have to give us the log summary output.
> None of the relevant
> numbers are in your summary.
> 
>  Thanks,
> 
>    Matt
> 
> On 2/9/07, Shi Jin <jinzishuai at yahoo.com> wrote:
> >
> > Dear Barry,
> >
> > Thank you.
> > I actually have done the staging already.
> > I summarized the timing of the runs in google
> online
> > spreadsheets. I have two runs.
> > 1. with 400,000 finite elements:
> >
>
http://spreadsheets.google.com/pub?key=pZHoqlL60quZeDZlucTjEIA
> > 2. with 1,600,000 finite elements:
> >
>
http://spreadsheets.google.com/pub?key=pZHoqlL60quZcCVLAqmzqQQ
> >
> > If you can take a look at them and give me some
> > advice, I will be deeply grateful.
> >
> > Shi
> > --- Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> > >
> > >   NO, NO, don't spend time stripping your code!
> > > Unproductive
> > >
> > >   See the manul pages for
> PetscLogStageRegister(),
> > > PetscLogStagePush() and
> > > PetscLogStagePop(). All you need to do is
> maintain a
> > > seperate stage for each
> > > of your KSPSolves; in your case you'll create 3
> > > stages.
> > >
> > >    Barry
> > >
> > > On Fri, 9 Feb 2007, Shi Jin wrote:
> > >
> > > > Thank you.
> > > > But my code has 10 calls to KSPSolve of three
> > > > different linear systems at each time update.
> > > Should I
> > > > strip it down to a single KSPSolve so that it
> is
> > > > easier to analysis? I might have the code dump
> the
> > > > Matrix and vector and write another code to
> read
> > > them
> > > > into and call KSPSolve. I don't know whether
> this
> > > is
> > > > worth doing  or should I just send in the
> messy
> > > log
> > > > file of the whole run.
> > > > Thanks for any advice.
> > > >
> > > > Shi
> > > >
> > > > --- Barry Smith <bsmith at mcs.anl.gov> wrote:
> > > >
> > > > >
> > > > >   Shi,
> > > > >
> > > > >    There is never a better test problem then
> > > your
> > > > > actual problem.
> > > > > Send the results from running on 1, 4, and 8
> > > > > processes with the options
> > > > > -log_summary -ksp_view (use the optimized
> > > version of
> > > > > PETSc (running
> > > > > config/configure.py --with-debugging=0))
> > > > >
> > > > >   Barry
> > > > >
> > > > >
> > > > > On Fri, 9 Feb 2007, Shi Jin wrote:
> > > > >
> > > > > > Hi there,
> > > > > >
> > > > > > I am tuning our 3D FEM CFD code written
> with
> > > > > PETSc.
> > > > > > The code doesn't scale very well. For
> example,
> > > > > with 8
> > > > > > processes on a linux cluster, the speedup
> we
> > > > > achieve
> > > > > > with a fairly large problem size(million
> of
> > > > > elements)
> > > > > > is only 3 to 4 using the Congugate
> gradient
> > > > > solver. We
> > > > > > can achieve a speed up of a 6.5 using a
> GMRes
> > > > > solver
> > > > > > but the wall clock time of a GMRes is
> longer
> > > than
> > > > > a CG
> > > > > > solver which indicates that CG is the
> faster
> > > > > solver
> > > > > > and it scales not as good as GMRes. Is
> this
> > > > > generally
> > > > > > true?
> > > > > >
> > > > > > I then went to the examples and find a 2D
> > > example
> > > > > of
> > > > > > KSPSolve (ex2.c). I let the code ran with
> a
> > > > > 1000x1000
> > > > > > mesh and get a linear scaling of the CG
> solver
> > > and
> > > > > a
> > > > > > super linear scaling of the GMRes. These
> are
> > > both
> > > > > much
> > > > > > better than our code. However, I think the
> 2D
> > > > > nature
> > > > > > of the sample problem might help the
> scaling
> > > of
> > > > > the
> > > > > > code. So I would like to try some 3D
> example
> > > using
> > > > > the
> > > > > > KSPSolve. Unfortunately, I couldn't find
> such
> > > an
> > > > > > example either in the
> > > > > src/ksp/ksp/examples/tutorials
> > > > > > directory or by google search. There are a
> > > couple
> > > > > of
> > > > > > 3D examples in the
> > > src/ksp/ksp/examples/tutorials
> > > > > but
> > > > > > they   are about the SNES not KSPSolve. If
> > > anyone
> > > > > can
> > > > > > provide me with such an example, I would
> > > really
> > > > > > appreciate it.
> > > > > > Thanks a lot.
> > > > > >
> > > > > > Shi
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
>
____________________________________________________________________________________
> > > > > > Finding fabulous fares is fun.
> > > > > > Let Yahoo! FareChase search your favorite
> > > travel
> > > > > sites to find flight and hotel bargains.
> > > > > >
> > >
> http://farechase.yahoo.com/promo-generic-14795097
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
> >
>
____________________________________________________________________________________
> > > > 8:00? 8:25? 8:40? Find a flick in no time
> > > > with the Yahoo! Search movie showtime
> shortcut.
> > > > http://tools.search.yahoo.com/shortcuts/#news
> > > >
> 
=== message truncated ===



 
____________________________________________________________________________________
Finding fabulous fares is fun.  
Let Yahoo! FareChase search your favorite travel sites to find flight and hotel bargains.
http://farechase.yahoo.com/promo-generic-14795097
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: log-1.txt
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070209/40ca14ef/attachment.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: log-2.txt
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070209/40ca14ef/attachment-0001.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: log-4.txt
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070209/40ca14ef/attachment-0002.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: log-8.txt
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070209/40ca14ef/attachment-0003.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: log-16.txt
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070209/40ca14ef/attachment-0004.txt>

From bsmith at mcs.anl.gov  Fri Feb  9 22:37:18 2007
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Fri, 9 Feb 2007 22:37:18 -0600 (CST)
Subject: A 3D example of KSPSolve?
In-Reply-To: <292553.57576.qm@web36210.mail.mud.yahoo.com>
References: <292553.57576.qm@web36210.mail.mud.yahoo.com>
Message-ID: <Pine.OSX.4.64.0702092235260.21480@barry-smiths-computer.local>


  What are all the calls for MatGetRow() for? They are consuming a
great deal of time. Is there anyway to get rid of them?

   Barry


On Fri, 9 Feb 2007, Shi Jin wrote:

> Sorry that is not informative.
> So I decide to attach the 5 files for NP=1,2,4,8,16
> for 
> the 400,000 finite element case.
> 
> Please note that the simulation runs over 100 steps.
> The 1st step is first order update, named as stage 1.
> The rest 99 steps are second order updates. Within
> that, stage 2-9 are created for the 8 stages of a
> second order update. We should concentrate on the
> second order updates. So four calls to KSPSolve in the
> log file are important, in stage 4,5,6,and 8
> separately.
> Pleaes let me know if you need any other information
> or explanation.
> Thank you very much.
> 
> Shi
> --- Matthew Knepley <knepley at gmail.com> wrote:
> 
> > You really have to give us the log summary output.
> > None of the relevant
> > numbers are in your summary.
> > 
> >  Thanks,
> > 
> >    Matt
> > 
> > On 2/9/07, Shi Jin <jinzishuai at yahoo.com> wrote:
> > >
> > > Dear Barry,
> > >
> > > Thank you.
> > > I actually have done the staging already.
> > > I summarized the timing of the runs in google
> > online
> > > spreadsheets. I have two runs.
> > > 1. with 400,000 finite elements:
> > >
> >
> http://spreadsheets.google.com/pub?key=pZHoqlL60quZeDZlucTjEIA
> > > 2. with 1,600,000 finite elements:
> > >
> >
> http://spreadsheets.google.com/pub?key=pZHoqlL60quZcCVLAqmzqQQ
> > >
> > > If you can take a look at them and give me some
> > > advice, I will be deeply grateful.
> > >
> > > Shi
> > > --- Barry Smith <bsmith at mcs.anl.gov> wrote:
> > >
> > > >
> > > >   NO, NO, don't spend time stripping your code!
> > > > Unproductive
> > > >
> > > >   See the manul pages for
> > PetscLogStageRegister(),
> > > > PetscLogStagePush() and
> > > > PetscLogStagePop(). All you need to do is
> > maintain a
> > > > seperate stage for each
> > > > of your KSPSolves; in your case you'll create 3
> > > > stages.
> > > >
> > > >    Barry
> > > >
> > > > On Fri, 9 Feb 2007, Shi Jin wrote:
> > > >
> > > > > Thank you.
> > > > > But my code has 10 calls to KSPSolve of three
> > > > > different linear systems at each time update.
> > > > Should I
> > > > > strip it down to a single KSPSolve so that it
> > is
> > > > > easier to analysis? I might have the code dump
> > the
> > > > > Matrix and vector and write another code to
> > read
> > > > them
> > > > > into and call KSPSolve. I don't know whether
> > this
> > > > is
> > > > > worth doing  or should I just send in the
> > messy
> > > > log
> > > > > file of the whole run.
> > > > > Thanks for any advice.
> > > > >
> > > > > Shi
> > > > >
> > > > > --- Barry Smith <bsmith at mcs.anl.gov> wrote:
> > > > >
> > > > > >
> > > > > >   Shi,
> > > > > >
> > > > > >    There is never a better test problem then
> > > > your
> > > > > > actual problem.
> > > > > > Send the results from running on 1, 4, and 8
> > > > > > processes with the options
> > > > > > -log_summary -ksp_view (use the optimized
> > > > version of
> > > > > > PETSc (running
> > > > > > config/configure.py --with-debugging=0))
> > > > > >
> > > > > >   Barry
> > > > > >
> > > > > >
> > > > > > On Fri, 9 Feb 2007, Shi Jin wrote:
> > > > > >
> > > > > > > Hi there,
> > > > > > >
> > > > > > > I am tuning our 3D FEM CFD code written
> > with
> > > > > > PETSc.
> > > > > > > The code doesn't scale very well. For
> > example,
> > > > > > with 8
> > > > > > > processes on a linux cluster, the speedup
> > we
> > > > > > achieve
> > > > > > > with a fairly large problem size(million
> > of
> > > > > > elements)
> > > > > > > is only 3 to 4 using the Congugate
> > gradient
> > > > > > solver. We
> > > > > > > can achieve a speed up of a 6.5 using a
> > GMRes
> > > > > > solver
> > > > > > > but the wall clock time of a GMRes is
> > longer
> > > > than
> > > > > > a CG
> > > > > > > solver which indicates that CG is the
> > faster
> > > > > > solver
> > > > > > > and it scales not as good as GMRes. Is
> > this
> > > > > > generally
> > > > > > > true?
> > > > > > >
> > > > > > > I then went to the examples and find a 2D
> > > > example
> > > > > > of
> > > > > > > KSPSolve (ex2.c). I let the code ran with
> > a
> > > > > > 1000x1000
> > > > > > > mesh and get a linear scaling of the CG
> > solver
> > > > and
> > > > > > a
> > > > > > > super linear scaling of the GMRes. These
> > are
> > > > both
> > > > > > much
> > > > > > > better than our code. However, I think the
> > 2D
> > > > > > nature
> > > > > > > of the sample problem might help the
> > scaling
> > > > of
> > > > > > the
> > > > > > > code. So I would like to try some 3D
> > example
> > > > using
> > > > > > the
> > > > > > > KSPSolve. Unfortunately, I couldn't find
> > such
> > > > an
> > > > > > > example either in the
> > > > > > src/ksp/ksp/examples/tutorials
> > > > > > > directory or by google search. There are a
> > > > couple
> > > > > > of
> > > > > > > 3D examples in the
> > > > src/ksp/ksp/examples/tutorials
> > > > > > but
> > > > > > > they   are about the SNES not KSPSolve. If
> > > > anyone
> > > > > > can
> > > > > > > provide me with such an example, I would
> > > > really
> > > > > > > appreciate it.
> > > > > > > Thanks a lot.
> > > > > > >
> > > > > > > Shi
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> >
> ____________________________________________________________________________________
> > > > > > > Finding fabulous fares is fun.
> > > > > > > Let Yahoo! FareChase search your favorite
> > > > travel
> > > > > > sites to find flight and hotel bargains.
> > > > > > >
> > > >
> > http://farechase.yahoo.com/promo-generic-14795097
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> > >
> >
> ____________________________________________________________________________________
> > > > > 8:00? 8:25? 8:40? Find a flick in no time
> > > > > with the Yahoo! Search movie showtime
> > shortcut.
> > > > > http://tools.search.yahoo.com/shortcuts/#news
> > > > >
> > 
> === message truncated ===
> 
> 
> 
>  
> ____________________________________________________________________________________
> Finding fabulous fares is fun.  
> Let Yahoo! FareChase search your favorite travel sites to find flight and hotel bargains.
> http://farechase.yahoo.com/promo-generic-14795097



From jinzishuai at yahoo.com  Fri Feb  9 22:56:02 2007
From: jinzishuai at yahoo.com (Shi Jin)
Date: Fri, 9 Feb 2007 20:56:02 -0800 (PST)
Subject: A 3D example of KSPSolve?
In-Reply-To: <Pine.OSX.4.64.0702092235260.21480@barry-smiths-computer.local>
Message-ID: <902001.46633.qm@web36203.mail.mud.yahoo.com>

MatGetRow are used to build the right hand side
vector.
We use it in order to get the number of nonzero cols,
global col indices and values in a row.

The reason it is time consuming is that it is called
for each row of the matrix. I am not sure how I can
get  away without it.
Thanks.

Shi
--- Barry Smith <bsmith at mcs.anl.gov> wrote:

> 
>   What are all the calls for MatGetRow() for? They
> are consuming a
> great deal of time. Is there anyway to get rid of
> them?
> 
>    Barry
> 
> 
> On Fri, 9 Feb 2007, Shi Jin wrote:
> 
> > Sorry that is not informative.
> > So I decide to attach the 5 files for
> NP=1,2,4,8,16
> > for 
> > the 400,000 finite element case.
> > 
> > Please note that the simulation runs over 100
> steps.
> > The 1st step is first order update, named as stage
> 1.
> > The rest 99 steps are second order updates. Within
> > that, stage 2-9 are created for the 8 stages of a
> > second order update. We should concentrate on the
> > second order updates. So four calls to KSPSolve in
> the
> > log file are important, in stage 4,5,6,and 8
> > separately.
> > Pleaes let me know if you need any other
> information
> > or explanation.
> > Thank you very much.
> > 
> > Shi
> > --- Matthew Knepley <knepley at gmail.com> wrote:
> > 
> > > You really have to give us the log summary
> output.
> > > None of the relevant
> > > numbers are in your summary.
> > > 
> > >  Thanks,
> > > 
> > >    Matt
> > > 
> > > On 2/9/07, Shi Jin <jinzishuai at yahoo.com> wrote:
> > > >
> > > > Dear Barry,
> > > >
> > > > Thank you.
> > > > I actually have done the staging already.
> > > > I summarized the timing of the runs in google
> > > online
> > > > spreadsheets. I have two runs.
> > > > 1. with 400,000 finite elements:
> > > >
> > >
> >
>
http://spreadsheets.google.com/pub?key=pZHoqlL60quZeDZlucTjEIA
> > > > 2. with 1,600,000 finite elements:
> > > >
> > >
> >
>
http://spreadsheets.google.com/pub?key=pZHoqlL60quZcCVLAqmzqQQ
> > > >
> > > > If you can take a look at them and give me
> some
> > > > advice, I will be deeply grateful.
> > > >
> > > > Shi
> > > > --- Barry Smith <bsmith at mcs.anl.gov> wrote:
> > > >
> > > > >
> > > > >   NO, NO, don't spend time stripping your
> code!
> > > > > Unproductive
> > > > >
> > > > >   See the manul pages for
> > > PetscLogStageRegister(),
> > > > > PetscLogStagePush() and
> > > > > PetscLogStagePop(). All you need to do is
> > > maintain a
> > > > > seperate stage for each
> > > > > of your KSPSolves; in your case you'll
> create 3
> > > > > stages.
> > > > >
> > > > >    Barry
> > > > >
> > > > > On Fri, 9 Feb 2007, Shi Jin wrote:
> > > > >
> > > > > > Thank you.
> > > > > > But my code has 10 calls to KSPSolve of
> three
> > > > > > different linear systems at each time
> update.
> > > > > Should I
> > > > > > strip it down to a single KSPSolve so that
> it
> > > is
> > > > > > easier to analysis? I might have the code
> dump
> > > the
> > > > > > Matrix and vector and write another code
> to
> > > read
> > > > > them
> > > > > > into and call KSPSolve. I don't know
> whether
> > > this
> > > > > is
> > > > > > worth doing  or should I just send in the
> > > messy
> > > > > log
> > > > > > file of the whole run.
> > > > > > Thanks for any advice.
> > > > > >
> > > > > > Shi
> > > > > >
> > > > > > --- Barry Smith <bsmith at mcs.anl.gov>
> wrote:
> > > > > >
> > > > > > >
> > > > > > >   Shi,
> > > > > > >
> > > > > > >    There is never a better test problem
> then
> > > > > your
> > > > > > > actual problem.
> > > > > > > Send the results from running on 1, 4,
> and 8
> > > > > > > processes with the options
> > > > > > > -log_summary -ksp_view (use the
> optimized
> > > > > version of
> > > > > > > PETSc (running
> > > > > > > config/configure.py --with-debugging=0))
> > > > > > >
> > > > > > >   Barry
> > > > > > >
> > > > > > >
> > > > > > > On Fri, 9 Feb 2007, Shi Jin wrote:
> > > > > > >
> > > > > > > > Hi there,
> > > > > > > >
> > > > > > > > I am tuning our 3D FEM CFD code
> written
> > > with
> > > > > > > PETSc.
> > > > > > > > The code doesn't scale very well. For
> > > example,
> > > > > > > with 8
> > > > > > > > processes on a linux cluster, the
> speedup
> > > we
> > > > > > > achieve
> > > > > > > > with a fairly large problem
> size(million
> > > of
> > > > > > > elements)
> > > > > > > > is only 3 to 4 using the Congugate
> > > gradient
> > > > > > > solver. We
> > > > > > > > can achieve a speed up of a 6.5 using
> a
> > > GMRes
> > > > > > > solver
> > > > > > > > but the wall clock time of a GMRes is
> > > longer
> > > > > than
> > > > > > > a CG
> > > > > > > > solver which indicates that CG is the
> > > faster
> > > > > > > solver
> > > > > > > > and it scales not as good as GMRes. Is
> > > this
> > > > > > > generally
> > > > > > > > true?
> > > > > > > >
> > > > > > > > I then went to the examples and find a
> 2D
> > > > > example
> > > > > > > of
> > > > > > > > KSPSolve (ex2.c). I let the code ran
> with
> > > a
> > > > > > > 1000x1000
> > > > > > > > mesh and get a linear scaling of the
> CG
> > > solver
> > > > > and
> > > > > > > a
> > > > > > > > super linear scaling of the GMRes.
> These
> > > are
> > > > > both
> > > > > > > much
> > > > > > > > better than our code. However, I think
> the
> > > 2D
> > > > > > > nature
> 
=== message truncated ===



 
____________________________________________________________________________________
Cheap talk?
Check out Yahoo! Messenger's low PC-to-Phone call rates.
http://voice.yahoo.com



From balay at mcs.anl.gov  Fri Feb  9 23:02:14 2007
From: balay at mcs.anl.gov (Satish Balay)
Date: Fri, 9 Feb 2007 23:02:14 -0600 (CST)
Subject: A 3D example of KSPSolve?
In-Reply-To: <Pine.OSX.4.64.0702092235260.21480@barry-smiths-computer.local>
References: <292553.57576.qm@web36210.mail.mud.yahoo.com>
 <Pine.OSX.4.64.0702092235260.21480@barry-smiths-computer.local>
Message-ID: <Pine.LNX.4.64.0702092253430.3665@asterix>


Just looking at 8 proc run [diffusion stage] we have:

MatMult        :  79 sec
MatMultAdd     :   2 sec
VecScatterBegin:  17 sec
VecScatterEnd  :  51 sec

So basically the communication in MatMult/Add is represented by
VecScatters. Here out of 81 sec total - 68 seconds are used for
communication [with a load imbalance of 11 for vecscaterend]

So - I think MPI performance is reducing scalability here..

Things to try:

* -vecstatter_rr etc options I sugested earlier

* install mpich with '--with-device=ch3:ssm' and see
if it makes a difference

Satish

--- Event Stage 4: Diffusion

[x]rhsLtP            297 1.0 1.1017e+02 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  7  0  0  0  0  39  0  0  0  0     0
[x]rhsGravity         99 1.0 4.2582e+0083.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  0  0  0  0     0
VecDot              4657 1.0 2.5748e+01 3.2 7.60e+07 3.2 0.0e+00 0.0e+00 4.7e+03  1  1  0  0  6   5  3  0  0 65   191
VecNorm             2477 1.0 2.2109e+01 2.2 3.22e+07 2.2 0.0e+00 0.0e+00 2.5e+03  1  0  0  0  3   5  2  0  0 35   118
VecScale             594 1.0 2.9330e-02 1.5 2.61e+08 1.5 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1361
VecCopy              594 1.0 2.7552e-01 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              3665 1.0 6.0793e-01 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY             5251 1.0 2.5892e+00 1.2 3.31e+08 1.2 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   1  4  0  0  0  2137
VecAYPX             1883 1.0 8.6419e-01 1.3 3.62e+08 1.3 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  1  0  0  0  2296
VecScatterBegin     2873 1.0 1.7569e+01 3.0 0.00e+00 0.0 3.8e+04 1.6e+05 0.0e+00  1  0 10 20  0   5  0100100  0     0
VecScatterEnd       2774 1.0 5.1519e+0110.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   7  0  0  0  0     0
MatMult             2477 1.0 7.9186e+01 2.4 2.34e+08 2.4 3.5e+04 1.7e+05 0.0e+00  3 11  9 20  0  20 48 91 98  0   850
MatMultAdd           297 1.0 2.8161e+00 5.4 4.46e+07 2.2 3.6e+03 3.4e+04 0.0e+00  0  0  1  0  0   0  0  9  2  0   125
MatSolve            2477 1.0 6.2245e+01 1.2 1.41e+08 1.2 0.0e+00 0.0e+00 0.0e+00  4 10  0  0  0  22 41  0  0  0   926
MatLUFactorNum         3 1.0 2.7686e-01 1.1 2.79e+08 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  2016
MatGetRow        19560420 1.0 5.5195e+01 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0  20  0  0  0  0     0
KSPSetup               6 1.0 3.0756e-05 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve             297 1.0 1.3142e+02 1.0 1.31e+08 1.1 3.1e+04 1.7e+05 7.1e+03  8 22  8 18  9  50 93 80 86100  1001
PCSetUp                6 1.0 2.7700e-01 1.1 2.78e+08 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  2015
PCSetUpOnBlocks      297 1.0 2.7794e-01 1.1 2.78e+08 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  2008
PCApply             2477 1.0 6.2772e+01 1.2 1.39e+08 1.2 0.0e+00 0.0e+00 0.0e+00  4 10  0  0  0  23 41  0  0  0   918



From bsmith at mcs.anl.gov  Fri Feb  9 23:07:11 2007
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Fri, 9 Feb 2007 23:07:11 -0600 (CST)
Subject: A 3D example of KSPSolve?
In-Reply-To: <292553.57576.qm@web36210.mail.mud.yahoo.com>
References: <292553.57576.qm@web36210.mail.mud.yahoo.com>
Message-ID: <Pine.OSX.4.64.0702092248170.21480@barry-smiths-computer.local>


  Shi,

   The lack of good scaling is coming from two important sources.

1) The MPI on this system is terrible

Average time to get PetscTime(): 1.71661e-06
Average time for MPI_Barrier(): 0.008253
Average time for zero size MPI_Send(): 0.000279441

  you want to see numbers like 1.e-5 to 1.e-6 instead of 1e-3 to 1e-4

2) The number of iterations for the linear systems is growing too rapidly
with more processes. For example in stage 8 it goes from 1782 iterations on 1 
process to 3267 on 16 processors. 

3) a lessor effect is from a slight inbalance in work between processes,
for example in stage 8 the slowest MatSolve is 1.3 times the fastest. 

Initial suggestions. 

0) Get rid of the MatGetRows()

1) it appears your matrices are symmetric? If so, you can use MATMPISBAIJ 
instead of AIJ, then you can use (incomplete) Cholesky on the blocks.

2) Try using ASM instead of block Jacobi as the preconditioner. Use 
-pc_type asm -pc_asm_type basic -sub_pc_type icc
this will decrease the number of iterations in parallel at the cost of more
expensive iterations so it may help or may not.

3) Try using hypre's boomeramg for some (poisson?) (all?) of the solves. 
config/configure.py PETSc with --download-hypre and run with 
-pc_type hypre -pc_hypre_type boomeramg (if you run this with -help
it will show a large number of tuneable options that can really speed things 
up.)

Final note: I would not expect to EVER see more than a speed up of more then 
say 10 to 12 on this machine, no matter how good the linear solver; due to the 
slowness of the network. But on a really good network you "might" be able to 
get 13 or 14 with hypre boomeramg.

  Barry


On Fri, 9 Feb 2007, Shi Jin wrote:

> Sorry that is not informative.
> So I decide to attach the 5 files for NP=1,2,4,8,16
> for 
> the 400,000 finite element case.
> 
> Please note that the simulation runs over 100 steps.
> The 1st step is first order update, named as stage 1.
> The rest 99 steps are second order updates. Within
> that, stage 2-9 are created for the 8 stages of a
> second order update. We should concentrate on the
> second order updates. So four calls to KSPSolve in the
> log file are important, in stage 4,5,6,and 8
> separately.
> Pleaes let me know if you need any other information
> or explanation.
> Thank you very much.
> 
> Shi
> --- Matthew Knepley <knepley at gmail.com> wrote:
> 
> > You really have to give us the log summary output.
> > None of the relevant
> > numbers are in your summary.
> > 
> >  Thanks,
> > 
> >    Matt
> > 
> > On 2/9/07, Shi Jin <jinzishuai at yahoo.com> wrote:
> > >
> > > Dear Barry,
> > >
> > > Thank you.
> > > I actually have done the staging already.
> > > I summarized the timing of the runs in google
> > online
> > > spreadsheets. I have two runs.
> > > 1. with 400,000 finite elements:
> > >
> >
> http://spreadsheets.google.com/pub?key=pZHoqlL60quZeDZlucTjEIA
> > > 2. with 1,600,000 finite elements:
> > >
> >
> http://spreadsheets.google.com/pub?key=pZHoqlL60quZcCVLAqmzqQQ
> > >
> > > If you can take a look at them and give me some
> > > advice, I will be deeply grateful.
> > >
> > > Shi
> > > --- Barry Smith <bsmith at mcs.anl.gov> wrote:
> > >
> > > >
> > > >   NO, NO, don't spend time stripping your code!
> > > > Unproductive
> > > >
> > > >   See the manul pages for
> > PetscLogStageRegister(),
> > > > PetscLogStagePush() and
> > > > PetscLogStagePop(). All you need to do is
> > maintain a
> > > > seperate stage for each
> > > > of your KSPSolves; in your case you'll create 3
> > > > stages.
> > > >
> > > >    Barry
> > > >
> > > > On Fri, 9 Feb 2007, Shi Jin wrote:
> > > >
> > > > > Thank you.
> > > > > But my code has 10 calls to KSPSolve of three
> > > > > different linear systems at each time update.
> > > > Should I
> > > > > strip it down to a single KSPSolve so that it
> > is
> > > > > easier to analysis? I might have the code dump
> > the
> > > > > Matrix and vector and write another code to
> > read
> > > > them
> > > > > into and call KSPSolve. I don't know whether
> > this
> > > > is
> > > > > worth doing  or should I just send in the
> > messy
> > > > log
> > > > > file of the whole run.
> > > > > Thanks for any advice.
> > > > >
> > > > > Shi
> > > > >
> > > > > --- Barry Smith <bsmith at mcs.anl.gov> wrote:
> > > > >
> > > > > >
> > > > > >   Shi,
> > > > > >
> > > > > >    There is never a better test problem then
> > > > your
> > > > > > actual problem.
> > > > > > Send the results from running on 1, 4, and 8
> > > > > > processes with the options
> > > > > > -log_summary -ksp_view (use the optimized
> > > > version of
> > > > > > PETSc (running
> > > > > > config/configure.py --with-debugging=0))
> > > > > >
> > > > > >   Barry
> > > > > >
> > > > > >
> > > > > > On Fri, 9 Feb 2007, Shi Jin wrote:
> > > > > >
> > > > > > > Hi there,
> > > > > > >
> > > > > > > I am tuning our 3D FEM CFD code written
> > with
> > > > > > PETSc.
> > > > > > > The code doesn't scale very well. For
> > example,
> > > > > > with 8
> > > > > > > processes on a linux cluster, the speedup
> > we
> > > > > > achieve
> > > > > > > with a fairly large problem size(million
> > of
> > > > > > elements)
> > > > > > > is only 3 to 4 using the Congugate
> > gradient
> > > > > > solver. We
> > > > > > > can achieve a speed up of a 6.5 using a
> > GMRes
> > > > > > solver
> > > > > > > but the wall clock time of a GMRes is
> > longer
> > > > than
> > > > > > a CG
> > > > > > > solver which indicates that CG is the
> > faster
> > > > > > solver
> > > > > > > and it scales not as good as GMRes. Is
> > this
> > > > > > generally
> > > > > > > true?
> > > > > > >
> > > > > > > I then went to the examples and find a 2D
> > > > example
> > > > > > of
> > > > > > > KSPSolve (ex2.c). I let the code ran with
> > a
> > > > > > 1000x1000
> > > > > > > mesh and get a linear scaling of the CG
> > solver
> > > > and
> > > > > > a
> > > > > > > super linear scaling of the GMRes. These
> > are
> > > > both
> > > > > > much
> > > > > > > better than our code. However, I think the
> > 2D
> > > > > > nature
> > > > > > > of the sample problem might help the
> > scaling
> > > > of
> > > > > > the
> > > > > > > code. So I would like to try some 3D
> > example
> > > > using
> > > > > > the
> > > > > > > KSPSolve. Unfortunately, I couldn't find
> > such
> > > > an
> > > > > > > example either in the
> > > > > > src/ksp/ksp/examples/tutorials
> > > > > > > directory or by google search. There are a
> > > > couple
> > > > > > of
> > > > > > > 3D examples in the
> > > > src/ksp/ksp/examples/tutorials
> > > > > > but
> > > > > > > they   are about the SNES not KSPSolve. If
> > > > anyone
> > > > > > can
> > > > > > > provide me with such an example, I would
> > > > really
> > > > > > > appreciate it.
> > > > > > > Thanks a lot.
> > > > > > >
> > > > > > > Shi
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> >
> ____________________________________________________________________________________
> > > > > > > Finding fabulous fares is fun.
> > > > > > > Let Yahoo! FareChase search your favorite
> > > > travel
> > > > > > sites to find flight and hotel bargains.
> > > > > > >
> > > >
> > http://farechase.yahoo.com/promo-generic-14795097
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> > >
> >
> ____________________________________________________________________________________
> > > > > 8:00? 8:25? 8:40? Find a flick in no time
> > > > > with the Yahoo! Search movie showtime
> > shortcut.
> > > > > http://tools.search.yahoo.com/shortcuts/#news
> > > > >
> > 
> === message truncated ===
> 
> 
> 
>  
> ____________________________________________________________________________________
> Finding fabulous fares is fun.  
> Let Yahoo! FareChase search your favorite travel sites to find flight and hotel bargains.
> http://farechase.yahoo.com/promo-generic-14795097



From bsmith at mcs.anl.gov  Fri Feb  9 23:09:50 2007
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Fri, 9 Feb 2007 23:09:50 -0600 (CST)
Subject: A 3D example of KSPSolve?
In-Reply-To: <902001.46633.qm@web36203.mail.mud.yahoo.com>
References: <902001.46633.qm@web36203.mail.mud.yahoo.com>
Message-ID: <Pine.OSX.4.64.0702092308220.21480@barry-smiths-computer.local>



On Fri, 9 Feb 2007, Shi Jin wrote:

> MatGetRow are used to build the right hand side
> vector.
^^^^^^

  Huh?

> We use it in order to get the number of nonzero cols,
> global col indices and values in a row.

  Huh? What do you do with all this information? Maybe
we can do what you do with this information much more efficiently?
Without all the calls to MatGetRow().

   Barry

> 
> The reason it is time consuming is that it is called
> for each row of the matrix. I am not sure how I can
> get  away without it.
> Thanks.
> 
> Shi
> --- Barry Smith <bsmith at mcs.anl.gov> wrote:
> 
> > 
> >   What are all the calls for MatGetRow() for? They
> > are consuming a
> > great deal of time. Is there anyway to get rid of
> > them?
> > 
> >    Barry
> > 
> > 
> > On Fri, 9 Feb 2007, Shi Jin wrote:
> > 
> > > Sorry that is not informative.
> > > So I decide to attach the 5 files for
> > NP=1,2,4,8,16
> > > for 
> > > the 400,000 finite element case.
> > > 
> > > Please note that the simulation runs over 100
> > steps.
> > > The 1st step is first order update, named as stage
> > 1.
> > > The rest 99 steps are second order updates. Within
> > > that, stage 2-9 are created for the 8 stages of a
> > > second order update. We should concentrate on the
> > > second order updates. So four calls to KSPSolve in
> > the
> > > log file are important, in stage 4,5,6,and 8
> > > separately.
> > > Pleaes let me know if you need any other
> > information
> > > or explanation.
> > > Thank you very much.
> > > 
> > > Shi
> > > --- Matthew Knepley <knepley at gmail.com> wrote:
> > > 
> > > > You really have to give us the log summary
> > output.
> > > > None of the relevant
> > > > numbers are in your summary.
> > > > 
> > > >  Thanks,
> > > > 
> > > >    Matt
> > > > 
> > > > On 2/9/07, Shi Jin <jinzishuai at yahoo.com> wrote:
> > > > >
> > > > > Dear Barry,
> > > > >
> > > > > Thank you.
> > > > > I actually have done the staging already.
> > > > > I summarized the timing of the runs in google
> > > > online
> > > > > spreadsheets. I have two runs.
> > > > > 1. with 400,000 finite elements:
> > > > >
> > > >
> > >
> >
> http://spreadsheets.google.com/pub?key=pZHoqlL60quZeDZlucTjEIA
> > > > > 2. with 1,600,000 finite elements:
> > > > >
> > > >
> > >
> >
> http://spreadsheets.google.com/pub?key=pZHoqlL60quZcCVLAqmzqQQ
> > > > >
> > > > > If you can take a look at them and give me
> > some
> > > > > advice, I will be deeply grateful.
> > > > >
> > > > > Shi
> > > > > --- Barry Smith <bsmith at mcs.anl.gov> wrote:
> > > > >
> > > > > >
> > > > > >   NO, NO, don't spend time stripping your
> > code!
> > > > > > Unproductive
> > > > > >
> > > > > >   See the manul pages for
> > > > PetscLogStageRegister(),
> > > > > > PetscLogStagePush() and
> > > > > > PetscLogStagePop(). All you need to do is
> > > > maintain a
> > > > > > seperate stage for each
> > > > > > of your KSPSolves; in your case you'll
> > create 3
> > > > > > stages.
> > > > > >
> > > > > >    Barry
> > > > > >
> > > > > > On Fri, 9 Feb 2007, Shi Jin wrote:
> > > > > >
> > > > > > > Thank you.
> > > > > > > But my code has 10 calls to KSPSolve of
> > three
> > > > > > > different linear systems at each time
> > update.
> > > > > > Should I
> > > > > > > strip it down to a single KSPSolve so that
> > it
> > > > is
> > > > > > > easier to analysis? I might have the code
> > dump
> > > > the
> > > > > > > Matrix and vector and write another code
> > to
> > > > read
> > > > > > them
> > > > > > > into and call KSPSolve. I don't know
> > whether
> > > > this
> > > > > > is
> > > > > > > worth doing  or should I just send in the
> > > > messy
> > > > > > log
> > > > > > > file of the whole run.
> > > > > > > Thanks for any advice.
> > > > > > >
> > > > > > > Shi
> > > > > > >
> > > > > > > --- Barry Smith <bsmith at mcs.anl.gov>
> > wrote:
> > > > > > >
> > > > > > > >
> > > > > > > >   Shi,
> > > > > > > >
> > > > > > > >    There is never a better test problem
> > then
> > > > > > your
> > > > > > > > actual problem.
> > > > > > > > Send the results from running on 1, 4,
> > and 8
> > > > > > > > processes with the options
> > > > > > > > -log_summary -ksp_view (use the
> > optimized
> > > > > > version of
> > > > > > > > PETSc (running
> > > > > > > > config/configure.py --with-debugging=0))
> > > > > > > >
> > > > > > > >   Barry
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, 9 Feb 2007, Shi Jin wrote:
> > > > > > > >
> > > > > > > > > Hi there,
> > > > > > > > >
> > > > > > > > > I am tuning our 3D FEM CFD code
> > written
> > > > with
> > > > > > > > PETSc.
> > > > > > > > > The code doesn't scale very well. For
> > > > example,
> > > > > > > > with 8
> > > > > > > > > processes on a linux cluster, the
> > speedup
> > > > we
> > > > > > > > achieve
> > > > > > > > > with a fairly large problem
> > size(million
> > > > of
> > > > > > > > elements)
> > > > > > > > > is only 3 to 4 using the Congugate
> > > > gradient
> > > > > > > > solver. We
> > > > > > > > > can achieve a speed up of a 6.5 using
> > a
> > > > GMRes
> > > > > > > > solver
> > > > > > > > > but the wall clock time of a GMRes is
> > > > longer
> > > > > > than
> > > > > > > > a CG
> > > > > > > > > solver which indicates that CG is the
> > > > faster
> > > > > > > > solver
> > > > > > > > > and it scales not as good as GMRes. Is
> > > > this
> > > > > > > > generally
> > > > > > > > > true?
> > > > > > > > >
> > > > > > > > > I then went to the examples and find a
> > 2D
> > > > > > example
> > > > > > > > of
> > > > > > > > > KSPSolve (ex2.c). I let the code ran
> > with
> > > > a
> > > > > > > > 1000x1000
> > > > > > > > > mesh and get a linear scaling of the
> > CG
> > > > solver
> > > > > > and
> > > > > > > > a
> > > > > > > > > super linear scaling of the GMRes.
> > These
> > > > are
> > > > > > both
> > > > > > > > much
> > > > > > > > > better than our code. However, I think
> > the
> > > > 2D
> > > > > > > > nature
> > 
> === message truncated ===
> 
> 
> 
>  
> ____________________________________________________________________________________
> Cheap talk?
> Check out Yahoo! Messenger's low PC-to-Phone call rates.
> http://voice.yahoo.com
> 
> 



From zonexo at gmail.com  Sat Feb 10 02:28:51 2007
From: zonexo at gmail.com (Ben Tay)
Date: Sat, 10 Feb 2007 16:28:51 +0800
Subject: understanding the output from -info
In-Reply-To: <Pine.LNX.4.64.0702091940080.3665@asterix>
References: <804ab5d40702080747g69ee8b44h54cce509177ee0a8@mail.gmail.com>
	 <a9f269830702090659n323f9381vf9b963aee438244e@mail.gmail.com>
	 <804ab5d40702090724n73db6f8w574622903161eb4a@mail.gmail.com>
	 <a9f269830702090727q2e875049v8007b5282d1455f0@mail.gmail.com>
	 <804ab5d40702090816qb6d1325g1d311a0eb53eec26@mail.gmail.com>
	 <a9f269830702090820ha6a989br4ef18bedc1662665@mail.gmail.com>
	 <804ab5d40702091651h6265a510jf5d4ca46cd526876@mail.gmail.com>
	 <a9f269830702091715qfe72360nf4657f0019b858cf@mail.gmail.com>
	 <Pine.OSX.4.64.0702091921170.20722@barry-smiths-computer.local>
	 <Pine.LNX.4.64.0702091940080.3665@asterix>
Message-ID: <804ab5d40702100028sf595a2apae8aba2fda9251f3@mail.gmail.com>

Hi,

I tried to use ex2f.F as a test code. I've changed the number n,m from 3 to
500 each. I ran the code using 1 processor and then with 4 processor. I then
repeat the same with the following modification:


do i=1,10

      call KSPSolve(ksp,b,x,ierr)

end do
I've added to do loop to make the solving repeat 10 times.

In both cases, the serial code is faster, e.g. 1 taking 2.4 min while the
other 3.3 min.

Here's the log_summary:


---------------------------------------------- PETSc Performance Summary:
----------------------------------------------

./ex2f on a linux-mpi named atlas12.nus.edu.sg with 4 processors, by
g0306332 Sat Feb 10 16:21:36 2007
Using Petsc Release Version 2.3.2, Patch 8, Tue Jan  2 14:33:59 PST 2007 HG
revision: ebeddcedcc065e32fc252af32cf1d01ed4fc7a80

                         Max       Max/Min        Avg      Total
Time (sec):           2.213e+02      1.00051   2.212e+02
Objects:              5.500e+01      1.00000   5.500e+01
Flops:                4.718e+09      1.00019   4.718e+09  1.887e+10
Flops/sec:            2.134e+07      1.00070   2.133e+07  8.531e+07

Memory:               3.186e+07      1.00069              1.274e+08
MPI Messages:         1.832e+03      2.00000   1.374e+03  5.496e+03
MPI Message Lengths:  7.324e+06      2.00000   3.998e+03  2.197e+07
MPI Reductions:       7.112e+02      1.00000

Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N -->
2N flops
                            and VecAXPY() for complex vectors of length N
--> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---
-- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts
%Total     Avg         %Total   counts   %Total
 0:      Main Stage: 2.2120e+02 100.0%  1.8871e+10 100.0%  5.496e+03 100.0%
3.998e+03      100.0%  2.845e+03 100.0%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting
output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops/sec: Max - maximum over all processors
                       Ratio - ratio of maximum to minimum over all
processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this
phase
      %M - percent messages in this phase     %L - percent message lengths
in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over
all processors)
------------------------------------------------------------------------------------------------------------------------

      ##########################################################
      #                                                        #
      #                          WARNING!!!                    #
      #                                                        #
      #   This code was compiled with a debugging option,      #
      #   To get timing results run config/configure.py        #
      #   using --with-debugging=no, the performance will      #
      #   be generally two or three times faster.              #
      #                                                        #
      ##########################################################




      ##########################################################
      #                                                        #
      #                          WARNING!!!                    #
      #                                                        #
      #   This code was run without the PreLoadBegin()         #
      #   macros. To get timing results we always recommend    #
      #   preloading. otherwise timing numbers may be          #
      #   meaningless.                                         #
      ##########################################################


Event                Count      Time (sec)
Flops/sec                         --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult              915 1.0 4.4291e+01 1.3 1.50e+07 1.3 5.5e+03 4.0e+03
0.0e+00 18 11100100  0  18 11100100  0    46
MatSolve             915 1.0 1.5684e+01 1.1 3.56e+07 1.1 0.0e+00 0.0e+00
0.0e+00  7 11  0  0  0   7 11  0  0  0   131
MatLUFactorNum         1 1.0 5.1654e-02 1.4 1.48e+07 1.4 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0    43
MatILUFactorSym        1 1.0 1.6838e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
2.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin       1 1.0 3.2428e-01 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
2.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 1.3120e+00 1.1 0.00e+00 0.0 6.0e+00 2.0e+03
1.3e+01  1  0  0  0  0   1  0  0  0  0     0
MatGetOrdering         1 1.0 4.1590e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
2.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecMDot              885 1.0 8.5091e+01 1.1 2.27e+07 1.1 0.0e+00 0.0e+00
8.8e+02 36 36  0  0 31  36 36  0  0 31    80
VecNorm              916 1.0 6.6747e+01 1.1 1.81e+06 1.1 0.0e+00 0.0e+00
9.2e+02 29  2  0  0 32  29  2  0  0 32     7
VecScale             915 1.0 1.1430e+00 2.2 1.12e+08 2.2 0.0e+00 0.0e+00
0.0e+00  0  1  0  0  0   0  1  0  0  0   200
VecCopy               30 1.0 1.2816e-01 5.7 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               947 1.0 7.8979e-01 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY               60 1.0 5.5332e-02 1.1 1.51e+08 1.1 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0   542
VecMAXPY             915 1.0 1.5004e+01 1.3 1.54e+08 1.3 0.0e+00 0.0e+00
0.0e+00  6 38  0  0  0   6 38  0  0  0   483
VecScatterBegin      915 1.0 9.0358e-02 1.4 0.00e+00 0.0 5.5e+03 4.0e+03
0.0e+00  0  0100100  0   0  0100100  0     0
VecScatterEnd        915 1.0 3.5136e+01 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 14  0  0  0  0  14  0  0  0  0     0
VecNormalize         915 1.0 6.7272e+01 1.0 2.68e+06 1.0 0.0e+00 0.0e+00
9.2e+02 30  4  0  0 32  30  4  0  0 32    10
KSPGMRESOrthog       885 1.0 9.8478e+01 1.1 3.87e+07 1.1 0.0e+00 0.0e+00
8.8e+02 42 72  0  0 31  42 72  0  0 31   138
KSPSetup               2 1.0 6.1918e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
1.0e+01  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 2.1892e+02 1.0 2.15e+07 1.0 5.5e+03 4.0e+03
2.8e+03 99100100100 99  99100100100 99    86
PCSetUp                2 1.0 7.3292e-02 1.3 9.84e+06 1.3 0.0e+00 0.0e+00
6.0e+00  0  0  0  0  0   0  0  0  0  0    30
PCSetUpOnBlocks        1 1.0 7.2706e-02 1.3 9.97e+06 1.3 0.0e+00 0.0e+00
4.0e+00  0  0  0  0  0   0  0  0  0  0    31
PCApply              915 1.0 1.6508e+01 1.1 3.27e+07 1.1 0.0e+00 0.0e+00
9.2e+02  7 11  0  0 32   7 11  0  0 32   124
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions   Memory  Descendants' Mem.

--- Event Stage 0: Main Stage

              Matrix     4              4     252008     0
           Index Set     5              5     753096     0
                 Vec    41             41   18519984     0
         Vec Scatter     1              1          0     0
       Krylov Solver     2              2      16880     0
      Preconditioner     2              2        196     0
========================================================================================================================
Average time to get PetscTime(): 1.09673e-06
Average time for MPI_Barrier(): 4.18186e-05
Average time for zero size MPI_Send(): 2.62856e-05
OptionTable: -log_summary
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 4 sizeof(void*) 4
sizeof(PetscScalar) 8
Configure run at: Thu Jan 18 12:23:31 2007
Configure options: --with-vendor-compilers=intel --with-x=0 --with-shared
--with-blas-lapack-dir=/lsftmp/g0306332/inter/mkl/lib/32
--with-mpi-dir=/opt/mpich/myrinet/intel/
-----------------------------------------
Libraries compiled on Thu Jan 18 12:24:41 SGT 2007 on atlas1.nus.edu.sg
Machine characteristics: Linux atlas1.nus.edu.sg 2.4.21-20.ELsmp #1 SMP Wed
Sep 8 17:29:34 GMT 2004 i686 i686 i386 GNU/Linux
Using PETSc directory: /nas/lsftmp/g0306332/petsc-2.3.2-p8
Using PETSc arch: linux-mpif90
-----------------------------------------
Using C compiler: /opt/mpich/myrinet/intel/bin/mpicc -fPIC -g
Using Fortran compiler: /opt/mpich/myrinet/intel/bin/mpif90 -I. -fPIC -g
-w90 -w
-----------------------------------------
Using include paths:
-I/nas/lsftmp/g0306332/petsc-2.3.2-p8-I/nas/lsftmp/g0306332/petsc-
2.3.2-p8/bmake/linux-mpif90 -I/nas/lsftmp/g0306332/petsc-2.3.2-p8/include
-I/opt/mpich/myrinet/intel/include
------------------------------------------
Using C linker: /opt/mpich/myrinet/intel/bin/mpicc -fPIC -g
Using Fortran linker: /opt/mpich/myrinet/intel/bin/mpif90 -I. -fPIC -g -w90
-w
Using libraries:
-Wl,-rpath,/nas/lsftmp/g0306332/petsc-2.3.2-p8/lib/linux-mpif90
-L/nas/lsftmp/g0306332/petsc-2.3.2-p8/lib/linux-mpif90 -lpetscts -lpetscsnes
-lpetscksp -lpetscdm -lpetscmat -lpetscvec -lpetsc
-Wl,-rpath,/lsftmp/g0306332/inter/mkl/lib/32
-L/lsftmp/g0306332/inter/mkl/lib/32 -lmkl_lapack -lmkl_ia32 -lguide
-lPEPCF90 -Wl,-rpath,/opt/intel/compiler70/ia32/lib
-Wl,-rpath,/opt/mpich/myrinet/intel/lib -L/opt/mpich/myrinet/intel/lib
-Wl,-rpath,-rpath -Wl,-rpath,-ldl -L-ldl -lmpich -Wl,-rpath,-L -lgm
-lpthread -Wl,-rpath,/opt/intel/compiler70/ia32/lib
-Wl,-rpath,/opt/intel/compiler70/ia32/lib -L/opt/intel/compiler70/ia32/lib
-Wl,-rpath,/usr/lib -Wl,-rpath,/usr/lib -L/usr/lib -limf -lirc -lcprts -lcxa
-lunwind -ldl -lmpichf90 -Wl,-rpath,/opt/gm/lib -L/opt/gm/lib -lPEPCF90
-Wl,-rpath,/opt/intel/compiler70/ia32/lib -L/opt/intel/compiler70/ia32/lib
-Wl,-rpath,/usr/lib -L/usr/lib -lintrins -lIEPCF90 -lF90 -lm  -Wl,-rpath,\
-Wl,-rpath,\ -L\ -ldl -lmpich -Wl,-rpath,\ -L\ -lgm -lpthread
-Wl,-rpath,/opt/intel/compiler70/ia32/lib -L/opt/intel/compiler70/ia32/lib
-Wl,-rpath,/usr/lib -L/usr/lib -limf -lirc -lcprts -lcxa -lunwind -ldl
------------------------------------------

 So is there something wrong with the server's mpi implementation?

Thank you.



On 2/10/07, Satish Balay <balay at mcs.anl.gov> wrote:
>
> Looks like MatMult = 24sec Out of this the scatter time is: 22sec.
> Either something is wrong with your run - or MPI is really broken..
>
> Satish
>
> > > > MatMult             3927 1.0 2.4071e+01 1.3 6.14e+06 1.4 2.4e+04
> 1.3e+03
> > > > VecScatterBegin     3927 1.0 2.8672e-01 3.9 0.00e+00 0.0 2.4e+04
> 1.3e+03
> > > > VecScatterEnd       3927 1.0 2.2135e+01 1.5 0.00e+00 0.0 0.0e+00
> 0.0e+00
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070210/164583ed/attachment.htm>

From zonexo at gmail.com  Sat Feb 10 03:17:52 2007
From: zonexo at gmail.com (Ben Tay)
Date: Sat, 10 Feb 2007 17:17:52 +0800
Subject: understanding the output from -info
In-Reply-To: <804ab5d40702100028sf595a2apae8aba2fda9251f3@mail.gmail.com>
References: <804ab5d40702080747g69ee8b44h54cce509177ee0a8@mail.gmail.com>
	 <804ab5d40702090724n73db6f8w574622903161eb4a@mail.gmail.com>
	 <a9f269830702090727q2e875049v8007b5282d1455f0@mail.gmail.com>
	 <804ab5d40702090816qb6d1325g1d311a0eb53eec26@mail.gmail.com>
	 <a9f269830702090820ha6a989br4ef18bedc1662665@mail.gmail.com>
	 <804ab5d40702091651h6265a510jf5d4ca46cd526876@mail.gmail.com>
	 <a9f269830702091715qfe72360nf4657f0019b858cf@mail.gmail.com>
	 <Pine.OSX.4.64.0702091921170.20722@barry-smiths-computer.local>
	 <Pine.LNX.4.64.0702091940080.3665@asterix>
	 <804ab5d40702100028sf595a2apae8aba2fda9251f3@mail.gmail.com>
Message-ID: <804ab5d40702100117i5977f5bh9b161c026f16a32a@mail.gmail.com>

Hi,

I've repeated the test with n,m = 800. Now serial takes around 11mins while
parallel with 4 processors took 6mins. Does it mean that the problem must be
pretty large before it is more superior to use parallel?  Moreover 800x800
means there's 640000 unknowns. My problem is a 2D CFD code which typically
has 200x80=16000 unknowns. Does it mean that I won't be able to benefit from
running in parallel?

Btw, this is the parallel's log_summary:


Event                Count      Time (sec)
Flops/sec                         --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             1265 1.0 7.0615e+01 1.2 3.22e+07 1.2 7.6e+03 6.4e+03
0.0e+00 16 11100100  0  16 11100100  0   103
MatSolve            1265 1.0 4.7820e+01 1.2 4.60e+07 1.2 0.0e+00 0.0e+00
0.0e+00 11 11  0  0  0  11 11  0  0  0   152
MatLUFactorNum         1 1.0 2.5703e-01 2.3 1.27e+07 2.3 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0    22
MatILUFactorSym        1 1.0 1.8933e-01 4.1 0.00e+00 0.0 0.0e+00 0.0e+00
2.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin       1 1.0 4.2153e-01 3.5 0.00e+00 0.0 0.0e+00 0.0e+00
2.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 3.6475e-01 1.5 0.00e+00 0.0 6.0e+00 3.2e+03
1.3e+01  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 1.2088e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
2.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecMDot             1224 1.0 1.5314e+02 1.2 4.63e+07 1.2 0.0e+00 0.0e+00
1.2e+03 36 36  0  0 31  36 36  0  0 31   158
VecNorm             1266 1.0 1.0215e+02 1.1 4.31e+06 1.1 0.0e+00 0.0e+00
1.3e+03 24  2  0  0 33  24  2  0  0 33    16
VecScale            1265 1.0 3.7467e+00 1.5 8.34e+07 1.5 0.0e+00 0.0e+00
0.0e+00  1  1  0  0  0   1  1  0  0  0   216
VecCopy               41 1.0 2.5530e-01 2.8 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              1308 1.0 3.2717e+00 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY               82 1.0 5.3338e-01 2.8 1.40e+08 2.8 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0   197
VecMAXPY            1265 1.0 4.6234e+01 1.2 1.74e+08 1.2 0.0e+00 0.0e+00
0.0e+00 10 38  0  0  0  10 38  0  0  0   557
VecScatterBegin     1265 1.0 1.5684e-01 1.6 0.00e+00 0.0 7.6e+03 6.4e+03
0.0e+00  0  0100100  0   0  0100100  0     0
VecScatterEnd       1265 1.0 4.3167e+01 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  9  0  0  0  0   9  0  0  0  0     0
VecNormalize        1265 1.0 1.0459e+02 1.1 6.21e+06 1.1 0.0e+00 0.0e+00
1.3e+03 25  4  0  0 32  25  4  0  0 32    23
KSPGMRESOrthog      1224 1.0 1.9035e+02 1.1 7.00e+07 1.1 0.0e+00 0.0e+00
1.2e+03 45 72  0  0 31  45 72  0  0 31   254
KSPSetup               2 1.0 5.1674e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
1.0e+01  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 4.0269e+02 1.0 4.16e+07 1.0 7.6e+03 6.4e+03
3.9e+03 99100100100 99  99100100100 99   166
PCSetUp                2 1.0 4.5924e-01 2.6 8.23e+06 2.6 0.0e+00 0.0e+00
6.0e+00  0  0  0  0  0   0  0  0  0  0    12
PCSetUpOnBlocks        1 1.0 4.5847e-01 2.6 8.26e+06 2.6 0.0e+00 0.0e+00
4.0e+00  0  0  0  0  0   0  0  0  0  0    13
PCApply             1265 1.0 5.0990e+01 1.2 4.33e+07 1.2 0.0e+00 0.0e+00
1.3e+03 12 11  0  0 32  12 11  0  0 32   143
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions   Memory  Descendants' Mem.

--- Event Stage 0: Main Stage

              Matrix     4              4     643208     0
           Index Set     5              5    1924296     0
                 Vec    41             41   47379984     0
         Vec Scatter     1              1          0     0
       Krylov Solver     2              2      16880     0
      Preconditioner     2              2        196     0
========================================================================================================================
Average time to get PetscTime(): 1.00136e-06
Average time for MPI_Barrier(): 4.00066e-05
Average time for zero size MPI_Send(): 1.70469e-05
OptionTable: -log_summary
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 4 sizeof(void*) 4
sizeof(PetscScalar) 8
Configure run at: Thu Jan 18 12:23:31 2007
Configure options: --with-vendor-compilers=intel --with-x=0 --with-shared
--with-blas-lapack-dir=/lsftmp/g0306332/inter/mkl/lib/32
--with-mpi-dir=/opt/mpich/myrinet/intel/
-----------------------------------------







On 2/10/07, Ben Tay <zonexo at gmail.com> wrote:
>
> Hi,
>
> I tried to use ex2f.F as a test code. I've changed the number n,m from 3
> to 500 each. I ran the code using 1 processor and then with 4 processor. I
> then repeat the same with the following modification:
>
>
> do i=1,10
>
>       call KSPSolve(ksp,b,x,ierr)
>
> end do
> I've added to do loop to make the solving repeat 10 times.
>
> In both cases, the serial code is faster, e.g. 1 taking 2.4 min while the
> other 3.3 min.
>
> Here's the log_summary:
>
>
> ---------------------------------------------- PETSc Performance Summary:
> ----------------------------------------------
>
> ./ex2f on a linux-mpi named atlas12.nus.edu.sg with 4 processors, by
> g0306332 Sat Feb 10 16:21:36 2007
> Using Petsc Release Version 2.3.2, Patch 8, Tue Jan  2 14:33:59 PST 2007
> HG revision: ebeddcedcc065e32fc252af32cf1d01ed4fc7a80
>
>                          Max       Max/Min        Avg      Total
> Time (sec):           2.213e+02      1.00051   2.212e+02
> Objects:              5.500e+01      1.00000   5.500e+01
> Flops:                4.718e+09      1.00019   4.718e+09  1.887e+10
> Flops/sec:            2.134e+07       1.00070   2.133e+07  8.531e+07
>
> Memory:               3.186e+07      1.00069              1.274e+08
> MPI Messages:         1.832e+03      2.00000   1.374e+03  5.496e+03
> MPI Message Lengths:  7.324e+06       2.00000   3.998e+03  2.197e+07
> MPI Reductions:       7.112e+02      1.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
>                             e.g., VecAXPY() for real vectors of length N
> --> 2N flops
>                             and VecAXPY() for complex vectors of length N
> --> 8N flops
>
> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages
> ---  -- Message Lengths --  -- Reductions --
>                         Avg     %Total     Avg     %Total   counts
> %Total     Avg         %Total   counts   %Total
>  0:      Main Stage: 2.2120e+02 100.0%  1.8871e+10 100.0%  5.496e+03
> 100.0%  3.998e+03      100.0%  2.845e+03 100.0%
>
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
>    Count: number of times phase was executed
>    Time and Flops/sec: Max - maximum over all processors
>                        Ratio - ratio of maximum to minimum over all
> processors
>    Mess: number of messages sent
>    Avg. len: average message length
>    Reduct: number of global reductions
>    Global: entire computation
>    Stage: stages of a computation. Set stages with PetscLogStagePush() and
> PetscLogStagePop().
>       %T - percent time in this phase         %F - percent flops in this
> phase
>       %M - percent messages in this phase     %L - percent message lengths
> in this phase
>       %R - percent reductions in this phase
>    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
> over all processors)
>
> ------------------------------------------------------------------------------------------------------------------------
>
>       ##########################################################
>       #                                                        #
>       #                          WARNING!!!                    #
>       #                                                        #
>       #   This code was compiled with a debugging option,      #
>       #   To get timing results run config/configure.py        #
>       #   using --with-debugging=no, the performance will      #
>       #   be generally two or three times faster.              #
>       #                                                        #
>       ##########################################################
>
>
>
>
>       ##########################################################
>       #                                                        #
>       #                          WARNING!!!                    #
>       #                                                        #
>       #   This code was run without the PreLoadBegin()         #
>       #   macros. To get timing results we always recommend    #
>       #   preloading. otherwise timing numbers may be          #
>       #   meaningless.                                         #
>       ##########################################################
>
>
> Event                Count      Time (sec)
> Flops/sec                         --- Global ---  --- Stage ---   Total
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> MatMult              915 1.0 4.4291e+01 1.3 1.50e+07 1.3 5.5e+03 4.0e+03
> 0.0e+00 18 11100100  0  18 11100100  0    46
> MatSolve             915 1.0 1.5684e+01 1.1 3.56e+07 1.1 0.0e+00 0.0e+00
> 0.0e+00  7 11  0  0  0   7 11  0  0  0   131
> MatLUFactorNum         1 1.0 5.1654e-02 1.4 1.48e+07 1.4 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0    43
> MatILUFactorSym        1 1.0 1.6838e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyBegin       1 1.0 3.2428e-01 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
> 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyEnd         1 1.0 1.3120e+00 1.1 0.00e+00 0.0 6.0e+00 2.0e+03
> 1.3e+01  1  0  0  0  0   1  0  0  0  0     0
> MatGetOrdering         1 1.0 4.1590e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecMDot              885 1.0 8.5091e+01 1.1 2.27e+07 1.1 0.0e+00 0.0e+00
> 8.8e+02 36 36  0  0 31  36 36  0  0 31    80
> VecNorm              916 1.0 6.6747e+01 1.1 1.81e+06 1.1 0.0e+00 0.0e+00
> 9.2e+02 29  2  0  0 32  29  2  0  0 32     7
> VecScale             915 1.0 1.1430e+00 2.2 1.12e+08 2.2 0.0e+00 0.0e+00
> 0.0e+00  0  1  0  0  0   0  1  0  0  0   200
> VecCopy               30 1.0 1.2816e-01 5.7 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSet               947 1.0 7.8979e-01 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAXPY               60 1.0 5.5332e-02 1.1 1.51e+08 1.1 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0   542
> VecMAXPY             915 1.0 1.5004e+01 1.3 1.54e+08 1.3 0.0e+00 0.0e+00
> 0.0e+00  6 38  0  0  0   6 38  0  0  0   483
> VecScatterBegin      915 1.0 9.0358e-02 1.4 0.00e+00 0.0 5.5e+03 4.0e+03
> 0.0e+00  0  0100100  0   0  0100100  0     0
> VecScatterEnd        915 1.0 3.5136e+01 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 14  0  0  0  0  14  0  0  0  0     0
> VecNormalize         915 1.0 6.7272e+01 1.0 2.68e+06 1.0 0.0e+00 0.0e+00
> 9.2e+02 30  4  0  0 32  30  4  0  0 32    10
> KSPGMRESOrthog       885 1.0 9.8478e+01 1.1 3.87e+07 1.1 0.0e+00 0.0e+00
> 8.8e+02 42 72  0  0 31  42 72  0  0 31   138
> KSPSetup               2 1.0 6.1918e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.0e+01  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve               1 1.0 2.1892e+02 1.0 2.15e+07 1.0 5.5e+03 4.0e+03
> 2.8e+03 99100100100 99  99100100100 99    86
> PCSetUp                2 1.0 7.3292e-02 1.3 9.84e+06 1.3 0.0e+00 0.0e+00
> 6.0e+00  0  0  0  0  0   0  0  0  0  0    30
> PCSetUpOnBlocks        1 1.0 7.2706e-02 1.3 9.97e+06 1.3 0.0e+00 0.0e+00
> 4.0e+00  0  0  0  0  0   0  0  0  0  0    31
> PCApply              915 1.0 1.6508e+01 1.1 3.27e+07 1.1 0.0e+00 0.0e+00
> 9.2e+02  7 11  0  0 32   7 11  0  0 32   124
> ------------------------------------------------------------------------------------------------------------------------
>
>
> Memory usage is given in bytes:
>
> Object Type          Creations   Destructions   Memory  Descendants' Mem.
>
> --- Event Stage 0: Main Stage
>
>               Matrix     4              4     252008     0
>            Index Set     5              5     753096     0
>                  Vec    41             41   18519984     0
>          Vec Scatter     1              1          0     0
>        Krylov Solver     2              2      16880     0
>       Preconditioner     2              2        196     0
> ========================================================================================================================
>
> Average time to get PetscTime(): 1.09673e-06
> Average time for MPI_Barrier(): 4.18186e-05
> Average time for zero size MPI_Send(): 2.62856e-05
> OptionTable: -log_summary
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 4 sizeof(void*) 4
> sizeof(PetscScalar) 8
> Configure run at: Thu Jan 18 12:23:31 2007
> Configure options: --with-vendor-compilers=intel --with-x=0 --with-shared
> --with-blas-lapack-dir=/lsftmp/g0306332/inter/mkl/lib/32
> --with-mpi-dir=/opt/mpich/myrinet/intel/
> -----------------------------------------
> Libraries compiled on Thu Jan 18 12:24:41 SGT 2007 on atlas1.nus.edu.sg
> Machine characteristics: Linux atlas1.nus.edu.sg 2.4.21-20.ELsmp #1 SMP
> Wed Sep 8 17:29:34 GMT 2004 i686 i686 i386 GNU/Linux
> Using PETSc directory: /nas/lsftmp/g0306332/petsc-2.3.2-p8
> Using PETSc arch: linux-mpif90
> -----------------------------------------
> Using C compiler: /opt/mpich/myrinet/intel/bin/mpicc -fPIC -g
> Using Fortran compiler: /opt/mpich/myrinet/intel/bin/mpif90 -I. -fPIC -g
> -w90 -w
> -----------------------------------------
> Using include paths: -I/nas/lsftmp/g0306332/petsc- 2.3.2-p8-I/nas/lsftmp/g0306332/petsc-
> 2.3.2-p8/bmake/linux-mpif90 -I/nas/lsftmp/g0306332/petsc-2.3.2-p8/include
> -I/opt/mpich/myrinet/intel/include
> ------------------------------------------
> Using C linker: /opt/mpich/myrinet/intel/bin/mpicc -fPIC -g
> Using Fortran linker: /opt/mpich/myrinet/intel/bin/mpif90 -I. -fPIC -g
> -w90 -w
> Using libraries: -Wl,-rpath,/nas/lsftmp/g0306332/petsc-2.3.2-p8/lib/linux-mpif90
> -L/nas/lsftmp/g0306332/petsc-2.3.2-p8/lib/linux-mpif90 -lpetscts
> -lpetscsnes -lpetscksp -lpetscdm -lpetscmat -lpetscvec -lpetsc
> -Wl,-rpath,/lsftmp/g0306332/inter/mkl/lib/32
> -L/lsftmp/g0306332/inter/mkl/lib/32 -lmkl_lapack -lmkl_ia32 -lguide
> -lPEPCF90 -Wl,-rpath,/opt/intel/compiler70/ia32/lib
> -Wl,-rpath,/opt/mpich/myrinet/intel/lib -L/opt/mpich/myrinet/intel/lib
> -Wl,-rpath,-rpath -Wl,-rpath,-ldl -L-ldl -lmpich -Wl,-rpath,-L -lgm
> -lpthread -Wl,-rpath,/opt/intel/compiler70/ia32/lib
> -Wl,-rpath,/opt/intel/compiler70/ia32/lib -L/opt/intel/compiler70/ia32/lib
> -Wl,-rpath,/usr/lib -Wl,-rpath,/usr/lib -L/usr/lib -limf -lirc -lcprts -lcxa
> -lunwind -ldl -lmpichf90 -Wl,-rpath,/opt/gm/lib -L/opt/gm/lib -lPEPCF90
> -Wl,-rpath,/opt/intel/compiler70/ia32/lib -L/opt/intel/compiler70/ia32/lib
> -Wl,-rpath,/usr/lib -L/usr/lib -lintrins -lIEPCF90 -lF90 -lm  -Wl,-rpath,\
> -Wl,-rpath,\ -L\ -ldl -lmpich -Wl,-rpath,\ -L\ -lgm -lpthread
> -Wl,-rpath,/opt/intel/compiler70/ia32/lib -L/opt/intel/compiler70/ia32/lib
> -Wl,-rpath,/usr/lib -L/usr/lib -limf -lirc -lcprts -lcxa -lunwind -ldl
> ------------------------------------------
>
>  So is there something wrong with the server's mpi implementation?
>
> Thank you.
>
>
>
> On 2/10/07, Satish Balay <balay at mcs.anl.gov> wrote:
> >
> > Looks like MatMult = 24sec Out of this the scatter time is: 22sec.
> > Either something is wrong with your run - or MPI is really broken..
> >
> > Satish
> >
> > > > > MatMult             3927 1.0 2.4071e+01 1.3 6.14e+06 1.4 2.4e+04
> > 1.3e+03
> > > > > VecScatterBegin     3927 1.0 2.8672e-01 3.9 0.00e+00 0.0 2.4e+04
> > 1.3e+03
> > > > > VecScatterEnd       3927 1.0 2.2135e+01 1.5 0.00e+00 0.0 0.0e+00
> > 0.0e+00
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070210/dad7ae05/attachment.htm>

From bsmith at mcs.anl.gov  Sat Feb 10 13:06:15 2007
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Sat, 10 Feb 2007 13:06:15 -0600 (CST)
Subject: understanding the output from -info
In-Reply-To: <804ab5d40702100117i5977f5bh9b161c026f16a32a@mail.gmail.com>
References: <804ab5d40702080747g69ee8b44h54cce509177ee0a8@mail.gmail.com> 
 <804ab5d40702090724n73db6f8w574622903161eb4a@mail.gmail.com> 
 <a9f269830702090727q2e875049v8007b5282d1455f0@mail.gmail.com> 
 <804ab5d40702090816qb6d1325g1d311a0eb53eec26@mail.gmail.com> 
 <a9f269830702090820ha6a989br4ef18bedc1662665@mail.gmail.com> 
 <804ab5d40702091651h6265a510jf5d4ca46cd526876@mail.gmail.com> 
 <a9f269830702091715qfe72360nf4657f0019b858cf@mail.gmail.com> 
 <Pine.OSX.4.64.0702091921170.20722@barry-smiths-computer.local> 
 <Pine.LNX.4.64.0702091940080.3665@asterix>  <804ab5d40702100028sf595a2apae8aba2fda9251f3@mail.gmail.com>
 <804ab5d40702100117i5977f5bh9b161c026f16a32a@mail.gmail.com>
Message-ID: <Pine.OSX.4.64.0702101305160.21480@barry-smiths-computer.local>



On Sat, 10 Feb 2007, Ben Tay wrote:

> Hi,
> 
> I've repeated the test with n,m = 800. Now serial takes around 11mins while
> parallel with 4 processors took 6mins. Does it mean that the problem must be
> pretty large before it is more superior to use parallel?  Moreover 800x800
> means there's 640000 unknowns. My problem is a 2D CFD code which typically
> has 200x80=16000 unknowns. Does it mean that I won't be able to benefit from
      ^^^^^^^^^^^
  You'll never get much performance past 2 processors; its not even worth
all the work of having a parallel code in this case. I'd just optimize the
heck out of the serial code.

   Barry



> running in parallel?
> 
> Btw, this is the parallel's log_summary:
> 
> 
> Event                Count      Time (sec)
> Flops/sec                         --- Global ---  --- Stage ---   Total
>                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
> 
> --- Event Stage 0: Main Stage
> 
> MatMult             1265 1.0 7.0615e+01 1.2 3.22e+07 1.2 7.6e+03 6.4e+03
> 0.0e+00 16 11100100  0  16 11100100  0   103
> MatSolve            1265 1.0 4.7820e+01 1.2 4.60e+07 1.2 0.0e+00 0.0e+00
> 0.0e+00 11 11  0  0  0  11 11  0  0  0   152
> MatLUFactorNum         1 1.0 2.5703e-01 2.3 1.27e+07 2.3 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0    22
> MatILUFactorSym        1 1.0 1.8933e-01 4.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyBegin       1 1.0 4.2153e-01 3.5 0.00e+00 0.0 0.0e+00 0.0e+00
> 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyEnd         1 1.0 3.6475e-01 1.5 0.00e+00 0.0 6.0e+00 3.2e+03
> 1.3e+01  0  0  0  0  0   0  0  0  0  0     0
> MatGetOrdering         1 1.0 1.2088e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecMDot             1224 1.0 1.5314e+02 1.2 4.63e+07 1.2 0.0e+00 0.0e+00
> 1.2e+03 36 36  0  0 31  36 36  0  0 31   158
> VecNorm             1266 1.0 1.0215e+02 1.1 4.31e+06 1.1 0.0e+00 0.0e+00
> 1.3e+03 24  2  0  0 33  24  2  0  0 33    16
> VecScale            1265 1.0 3.7467e+00 1.5 8.34e+07 1.5 0.0e+00 0.0e+00
> 0.0e+00  1  1  0  0  0   1  1  0  0  0   216
> VecCopy               41 1.0 2.5530e-01 2.8 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSet              1308 1.0 3.2717e+00 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
> VecAXPY               82 1.0 5.3338e-01 2.8 1.40e+08 2.8 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0   197
> VecMAXPY            1265 1.0 4.6234e+01 1.2 1.74e+08 1.2 0.0e+00 0.0e+00
> 0.0e+00 10 38  0  0  0  10 38  0  0  0   557
> VecScatterBegin     1265 1.0 1.5684e-01 1.6 0.00e+00 0.0 7.6e+03 6.4e+03
> 0.0e+00  0  0100100  0   0  0100100  0     0
> VecScatterEnd       1265 1.0 4.3167e+01 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  9  0  0  0  0   9  0  0  0  0     0
> VecNormalize        1265 1.0 1.0459e+02 1.1 6.21e+06 1.1 0.0e+00 0.0e+00
> 1.3e+03 25  4  0  0 32  25  4  0  0 32    23
> KSPGMRESOrthog      1224 1.0 1.9035e+02 1.1 7.00e+07 1.1 0.0e+00 0.0e+00
> 1.2e+03 45 72  0  0 31  45 72  0  0 31   254
> KSPSetup               2 1.0 5.1674e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.0e+01  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve               1 1.0 4.0269e+02 1.0 4.16e+07 1.0 7.6e+03 6.4e+03
> 3.9e+03 99100100100 99  99100100100 99   166
> PCSetUp                2 1.0 4.5924e-01 2.6 8.23e+06 2.6 0.0e+00 0.0e+00
> 6.0e+00  0  0  0  0  0   0  0  0  0  0    12
> PCSetUpOnBlocks        1 1.0 4.5847e-01 2.6 8.26e+06 2.6 0.0e+00 0.0e+00
> 4.0e+00  0  0  0  0  0   0  0  0  0  0    13
> PCApply             1265 1.0 5.0990e+01 1.2 4.33e+07 1.2 0.0e+00 0.0e+00
> 1.3e+03 12 11  0  0 32  12 11  0  0 32   143
> ------------------------------------------------------------------------------------------------------------------------
> 
> Memory usage is given in bytes:
> 
> Object Type          Creations   Destructions   Memory  Descendants' Mem.
> 
> --- Event Stage 0: Main Stage
> 
>              Matrix     4              4     643208     0
>           Index Set     5              5    1924296     0
>                 Vec    41             41   47379984     0
>         Vec Scatter     1              1          0     0
>       Krylov Solver     2              2      16880     0
>      Preconditioner     2              2        196     0
> ========================================================================================================================
> Average time to get PetscTime(): 1.00136e-06
> Average time for MPI_Barrier(): 4.00066e-05
> Average time for zero size MPI_Send(): 1.70469e-05
> OptionTable: -log_summary
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 4 sizeof(void*) 4
> sizeof(PetscScalar) 8
> Configure run at: Thu Jan 18 12:23:31 2007
> Configure options: --with-vendor-compilers=intel --with-x=0 --with-shared
> --with-blas-lapack-dir=/lsftmp/g0306332/inter/mkl/lib/32
> --with-mpi-dir=/opt/mpich/myrinet/intel/
> -----------------------------------------
> 
> 
> 
> 
> 
> 
> 
> On 2/10/07, Ben Tay <zonexo at gmail.com> wrote:
> > 
> > Hi,
> > 
> > I tried to use ex2f.F as a test code. I've changed the number n,m from 3
> > to 500 each. I ran the code using 1 processor and then with 4 processor. I
> > then repeat the same with the following modification:
> > 
> > 
> > do i=1,10
> > 
> >       call KSPSolve(ksp,b,x,ierr)
> > 
> > end do
> > I've added to do loop to make the solving repeat 10 times.
> > 
> > In both cases, the serial code is faster, e.g. 1 taking 2.4 min while the
> > other 3.3 min.
> > 
> > Here's the log_summary:
> > 
> > 
> > ---------------------------------------------- PETSc Performance Summary:
> > ----------------------------------------------
> > 
> > ./ex2f on a linux-mpi named atlas12.nus.edu.sg with 4 processors, by
> > g0306332 Sat Feb 10 16:21:36 2007
> > Using Petsc Release Version 2.3.2, Patch 8, Tue Jan  2 14:33:59 PST 2007
> > HG revision: ebeddcedcc065e32fc252af32cf1d01ed4fc7a80
> > 
> >                          Max       Max/Min        Avg      Total
> > Time (sec):           2.213e+02      1.00051   2.212e+02
> > Objects:              5.500e+01      1.00000   5.500e+01
> > Flops:                4.718e+09      1.00019   4.718e+09  1.887e+10
> > Flops/sec:            2.134e+07       1.00070   2.133e+07  8.531e+07
> > 
> > Memory:               3.186e+07      1.00069              1.274e+08
> > MPI Messages:         1.832e+03      2.00000   1.374e+03  5.496e+03
> > MPI Message Lengths:  7.324e+06       2.00000   3.998e+03  2.197e+07
> > MPI Reductions:       7.112e+02      1.00000
> > 
> > Flop counting convention: 1 flop = 1 real number operation of type
> > (multiply/divide/add/subtract)
> >                             e.g., VecAXPY() for real vectors of length N
> > --> 2N flops
> >                             and VecAXPY() for complex vectors of length N
> > --> 8N flops
> > 
> > Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages
> > ---  -- Message Lengths --  -- Reductions --
> >                         Avg     %Total     Avg     %Total   counts
> > %Total     Avg         %Total   counts   %Total
> >  0:      Main Stage: 2.2120e+02 100.0%  1.8871e+10 100.0%  5.496e+03
> > 100.0%  3.998e+03      100.0%  2.845e+03 100.0%
> > 
> > 
> > 
> > ------------------------------------------------------------------------------------------------------------------------
> > See the 'Profiling' chapter of the users' manual for details on
> > interpreting output.
> > Phase summary info:
> >    Count: number of times phase was executed
> >    Time and Flops/sec: Max - maximum over all processors
> >                        Ratio - ratio of maximum to minimum over all
> > processors
> >    Mess: number of messages sent
> >    Avg. len: average message length
> >    Reduct: number of global reductions
> >    Global: entire computation
> >    Stage: stages of a computation. Set stages with PetscLogStagePush() and
> > PetscLogStagePop().
> >       %T - percent time in this phase         %F - percent flops in this
> > phase
> >       %M - percent messages in this phase     %L - percent message lengths
> > in this phase
> >       %R - percent reductions in this phase
> >    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
> > over all processors)
> > 
> > 
> > ------------------------------------------------------------------------------------------------------------------------
> > 
> >       ##########################################################
> >       #                                                        #
> >       #                          WARNING!!!                    #
> >       #                                                        #
> >       #   This code was compiled with a debugging option,      #
> >       #   To get timing results run config/configure.py        #
> >       #   using --with-debugging=no, the performance will      #
> >       #   be generally two or three times faster.              #
> >       #                                                        #
> >       ##########################################################
> > 
> > 
> > 
> > 
> >       ##########################################################
> >       #                                                        #
> >       #                          WARNING!!!                    #
> >       #                                                        #
> >       #   This code was run without the PreLoadBegin()         #
> >       #   macros. To get timing results we always recommend    #
> >       #   preloading. otherwise timing numbers may be          #
> >       #   meaningless.                                         #
> >       ##########################################################
> > 
> > 
> > Event                Count      Time (sec)
> > Flops/sec                         --- Global ---  --- Stage ---   Total
> >                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
> > Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
> > 
> > 
> > ------------------------------------------------------------------------------------------------------------------------
> > 
> > --- Event Stage 0: Main Stage
> > 
> > MatMult              915 1.0 4.4291e+01 1.3 1.50e+07 1.3 5.5e+03 4.0e+03
> > 0.0e+00 18 11100100  0  18 11100100  0    46
> > MatSolve             915 1.0 1.5684e+01 1.1 3.56e+07 1.1 0.0e+00 0.0e+00
> > 0.0e+00  7 11  0  0  0   7 11  0  0  0   131
> > MatLUFactorNum         1 1.0 5.1654e-02 1.4 1.48e+07 1.4 0.0e+00 0.0e+00
> > 0.0e+00  0  0  0  0  0   0  0  0  0  0    43
> > MatILUFactorSym        1 1.0 1.6838e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
> > 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > MatAssemblyBegin       1 1.0 3.2428e-01 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
> > 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > MatAssemblyEnd         1 1.0 1.3120e+00 1.1 0.00e+00 0.0 6.0e+00 2.0e+03
> > 1.3e+01  1  0  0  0  0   1  0  0  0  0     0
> > MatGetOrdering         1 1.0 4.1590e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
> > 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > VecMDot              885 1.0 8.5091e+01 1.1 2.27e+07 1.1 0.0e+00 0.0e+00
> > 8.8e+02 36 36  0  0 31  36 36  0  0 31    80
> > VecNorm              916 1.0 6.6747e+01 1.1 1.81e+06 1.1 0.0e+00 0.0e+00
> > 9.2e+02 29  2  0  0 32  29  2  0  0 32     7
> > VecScale             915 1.0 1.1430e+00 2.2 1.12e+08 2.2 0.0e+00 0.0e+00
> > 0.0e+00  0  1  0  0  0   0  1  0  0  0   200
> > VecCopy               30 1.0 1.2816e-01 5.7 0.00e+00 0.0 0.0e+00 0.0e+00
> > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > VecSet               947 1.0 7.8979e-01 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
> > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > VecAXPY               60 1.0 5.5332e-02 1.1 1.51e+08 1.1 0.0e+00 0.0e+00
> > 0.0e+00  0  0  0  0  0   0  0  0  0  0   542
> > VecMAXPY             915 1.0 1.5004e+01 1.3 1.54e+08 1.3 0.0e+00 0.0e+00
> > 0.0e+00  6 38  0  0  0   6 38  0  0  0   483
> > VecScatterBegin      915 1.0 9.0358e-02 1.4 0.00e+00 0.0 5.5e+03 4.0e+03
> > 0.0e+00  0  0100100  0   0  0100100  0     0
> > VecScatterEnd        915 1.0 3.5136e+01 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
> > 0.0e+00 14  0  0  0  0  14  0  0  0  0     0
> > VecNormalize         915 1.0 6.7272e+01 1.0 2.68e+06 1.0 0.0e+00 0.0e+00
> > 9.2e+02 30  4  0  0 32  30  4  0  0 32    10
> > KSPGMRESOrthog       885 1.0 9.8478e+01 1.1 3.87e+07 1.1 0.0e+00 0.0e+00
> > 8.8e+02 42 72  0  0 31  42 72  0  0 31   138
> > KSPSetup               2 1.0 6.1918e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
> > 1.0e+01  0  0  0  0  0   0  0  0  0  0     0
> > KSPSolve               1 1.0 2.1892e+02 1.0 2.15e+07 1.0 5.5e+03 4.0e+03
> > 2.8e+03 99100100100 99  99100100100 99    86
> > PCSetUp                2 1.0 7.3292e-02 1.3 9.84e+06 1.3 0.0e+00 0.0e+00
> > 6.0e+00  0  0  0  0  0   0  0  0  0  0    30
> > PCSetUpOnBlocks        1 1.0 7.2706e-02 1.3 9.97e+06 1.3 0.0e+00 0.0e+00
> > 4.0e+00  0  0  0  0  0   0  0  0  0  0    31
> > PCApply              915 1.0 1.6508e+01 1.1 3.27e+07 1.1 0.0e+00 0.0e+00
> > 9.2e+02  7 11  0  0 32   7 11  0  0 32   124
> > 
> > ------------------------------------------------------------------------------------------------------------------------
> > 
> > 
> > Memory usage is given in bytes:
> > 
> > Object Type          Creations   Destructions   Memory  Descendants' Mem.
> > 
> > --- Event Stage 0: Main Stage
> > 
> >               Matrix     4              4     252008     0
> >            Index Set     5              5     753096     0
> >                  Vec    41             41   18519984     0
> >          Vec Scatter     1              1          0     0
> >        Krylov Solver     2              2      16880     0
> >       Preconditioner     2              2        196     0
> > ========================================================================================================================
> > 
> > Average time to get PetscTime(): 1.09673e-06
> > Average time for MPI_Barrier(): 4.18186e-05
> > Average time for zero size MPI_Send(): 2.62856e-05
> > OptionTable: -log_summary
> > Compiled without FORTRAN kernels
> > Compiled with full precision matrices (default)
> > sizeof(short) 2 sizeof(int) 4 sizeof(long) 4 sizeof(void*) 4
> > sizeof(PetscScalar) 8
> > Configure run at: Thu Jan 18 12:23:31 2007
> > Configure options: --with-vendor-compilers=intel --with-x=0 --with-shared
> > --with-blas-lapack-dir=/lsftmp/g0306332/inter/mkl/lib/32
> > --with-mpi-dir=/opt/mpich/myrinet/intel/
> > -----------------------------------------
> > Libraries compiled on Thu Jan 18 12:24:41 SGT 2007 on atlas1.nus.edu.sg
> > Machine characteristics: Linux atlas1.nus.edu.sg 2.4.21-20.ELsmp #1 SMP
> > Wed Sep 8 17:29:34 GMT 2004 i686 i686 i386 GNU/Linux
> > Using PETSc directory: /nas/lsftmp/g0306332/petsc-2.3.2-p8
> > Using PETSc arch: linux-mpif90
> > -----------------------------------------
> > Using C compiler: /opt/mpich/myrinet/intel/bin/mpicc -fPIC -g
> > Using Fortran compiler: /opt/mpich/myrinet/intel/bin/mpif90 -I. -fPIC -g
> > -w90 -w
> > -----------------------------------------
> > Using include paths: -I/nas/lsftmp/g0306332/petsc-
> > 2.3.2-p8-I/nas/lsftmp/g0306332/petsc-
> > 2.3.2-p8/bmake/linux-mpif90 -I/nas/lsftmp/g0306332/petsc-2.3.2-p8/include
> > -I/opt/mpich/myrinet/intel/include
> > ------------------------------------------
> > Using C linker: /opt/mpich/myrinet/intel/bin/mpicc -fPIC -g
> > Using Fortran linker: /opt/mpich/myrinet/intel/bin/mpif90 -I. -fPIC -g
> > -w90 -w
> > Using libraries:
> > -Wl,-rpath,/nas/lsftmp/g0306332/petsc-2.3.2-p8/lib/linux-mpif90
> > -L/nas/lsftmp/g0306332/petsc-2.3.2-p8/lib/linux-mpif90 -lpetscts
> > -lpetscsnes -lpetscksp -lpetscdm -lpetscmat -lpetscvec -lpetsc
> > -Wl,-rpath,/lsftmp/g0306332/inter/mkl/lib/32
> > -L/lsftmp/g0306332/inter/mkl/lib/32 -lmkl_lapack -lmkl_ia32 -lguide
> > -lPEPCF90 -Wl,-rpath,/opt/intel/compiler70/ia32/lib
> > -Wl,-rpath,/opt/mpich/myrinet/intel/lib -L/opt/mpich/myrinet/intel/lib
> > -Wl,-rpath,-rpath -Wl,-rpath,-ldl -L-ldl -lmpich -Wl,-rpath,-L -lgm
> > -lpthread -Wl,-rpath,/opt/intel/compiler70/ia32/lib
> > -Wl,-rpath,/opt/intel/compiler70/ia32/lib -L/opt/intel/compiler70/ia32/lib
> > -Wl,-rpath,/usr/lib -Wl,-rpath,/usr/lib -L/usr/lib -limf -lirc -lcprts -lcxa
> > -lunwind -ldl -lmpichf90 -Wl,-rpath,/opt/gm/lib -L/opt/gm/lib -lPEPCF90
> > -Wl,-rpath,/opt/intel/compiler70/ia32/lib -L/opt/intel/compiler70/ia32/lib
> > -Wl,-rpath,/usr/lib -L/usr/lib -lintrins -lIEPCF90 -lF90 -lm  -Wl,-rpath,\
> > -Wl,-rpath,\ -L\ -ldl -lmpich -Wl,-rpath,\ -L\ -lgm -lpthread
> > -Wl,-rpath,/opt/intel/compiler70/ia32/lib -L/opt/intel/compiler70/ia32/lib
> > -Wl,-rpath,/usr/lib -L/usr/lib -limf -lirc -lcprts -lcxa -lunwind -ldl
> > ------------------------------------------
> > 
> >  So is there something wrong with the server's mpi implementation?
> > 
> > Thank you.
> > 
> > 
> > 
> > On 2/10/07, Satish Balay <balay at mcs.anl.gov> wrote:
> > >
> > > Looks like MatMult = 24sec Out of this the scatter time is: 22sec.
> > > Either something is wrong with your run - or MPI is really broken..
> > >
> > > Satish
> > >
> > > > > > MatMult             3927 1.0 2.4071e+01 1.3 6.14e+06 1.4 2.4e+04
> > > 1.3e+03
> > > > > > VecScatterBegin     3927 1.0 2.8672e-01 3.9 0.00e+00 0.0 2.4e+04
> > > 1.3e+03
> > > > > > VecScatterEnd       3927 1.0 2.2135e+01 1.5 0.00e+00 0.0 0.0e+00
> > > 0.0e+00
> > >
> > >
> > 
> 



From balay at mcs.anl.gov  Sat Feb 10 13:11:03 2007
From: balay at mcs.anl.gov (Satish Balay)
Date: Sat, 10 Feb 2007 13:11:03 -0600 (CST)
Subject: A 3D example of KSPSolve?
In-Reply-To: <Pine.LNX.4.64.0702092253430.3665@asterix>
References: <292553.57576.qm@web36210.mail.mud.yahoo.com>
 <Pine.OSX.4.64.0702092235260.21480@barry-smiths-computer.local>
 <Pine.LNX.4.64.0702092253430.3665@asterix>
Message-ID: <Pine.LNX.4.64.0702101306140.3665@asterix>

Can you send the optupt from the following runs. You can do this with
src/ksp/ksp/examples/tutorials/ex2.c - to keep things simple.

petscmpirun -n 2 taskset -c 0,2 ./ex2 -log_summary | egrep \(MPI_Send\|MPI_Barrier\)
petscmpirun -n 2 taskset -c 0,4 ./ex2 -log_summary | egrep \(MPI_Send\|MPI_Barrier\)
petscmpirun -n 2 taskset -c 0,6 ./ex2 -log_summary | egrep \(MPI_Send\|MPI_Barrier\)
petscmpirun -n 2 taskset -c 0,8 ./ex2 -log_summary | egrep \(MPI_Send\|MPI_Barrier\)
petscmpirun -n 2 taskset -c 0,12 ./ex2 -log_summary | egrep \(MPI_Send\|MPI_Barrier\)
petscmpirun -n 2 taskset -c 0,14 ./ex2 -log_summary | egrep \(MPI_Send\|MPI_Barrier\)

Satish




From billy at dem.uminho.pt  Sat Feb 10 12:50:50 2007
From: billy at dem.uminho.pt (billy at dem.uminho.pt)
Date: Sat, 10 Feb 2007 18:50:50 +0000
Subject: A 3D example of KSPSolve?
In-Reply-To: <Pine.OSX.4.64.0702092308220.21480@barry-smiths-computer.local>
References: <902001.46633.qm@web36203.mail.mud.yahoo.com> <Pine.OSX.4.64.0702092308220.21480@barry-smiths-computer.local>
Message-ID: <1171133450.45ce140ad527c@serv-g1.ccom.uminho.pt>

Hi,

Lately I was using 2D examples and when I changed to 3D I noticed bad
performance. When I checked the code it was not allocating enough memory for 3D.
Instead of 6 nonzeros in each it row it had 7 and performance went down very
significantly as mentioned in PETSc manual. 

Billy.

Quoting Barry Smith <bsmith at mcs.anl.gov>:

> 
> 
> On Fri, 9 Feb 2007, Shi Jin wrote:
> 
> > MatGetRow are used to build the right hand side
> > vector.
> ^^^^^^
> 
>   Huh?
> 
> > We use it in order to get the number of nonzero cols,
> > global col indices and values in a row.
> 
>   Huh? What do you do with all this information? Maybe
> we can do what you do with this information much more efficiently?
> Without all the calls to MatGetRow().
> 
>    Barry
> 
> > 
> > The reason it is time consuming is that it is called
> > for each row of the matrix. I am not sure how I can
> > get  away without it.
> > Thanks.
> > 
> > Shi
> > --- Barry Smith <bsmith at mcs.anl.gov> wrote:
> > 
> > > 
> > >   What are all the calls for MatGetRow() for? They
> > > are consuming a
> > > great deal of time. Is there anyway to get rid of
> > > them?
> > > 
> > >    Barry
> > > 
> > > 
> > > On Fri, 9 Feb 2007, Shi Jin wrote:
> > > 
> > > > Sorry that is not informative.
> > > > So I decide to attach the 5 files for
> > > NP=1,2,4,8,16
> > > > for 
> > > > the 400,000 finite element case.
> > > > 
> > > > Please note that the simulation runs over 100
> > > steps.
> > > > The 1st step is first order update, named as stage
> > > 1.
> > > > The rest 99 steps are second order updates. Within
> > > > that, stage 2-9 are created for the 8 stages of a
> > > > second order update. We should concentrate on the
> > > > second order updates. So four calls to KSPSolve in
> > > the
> > > > log file are important, in stage 4,5,6,and 8
> > > > separately.
> > > > Pleaes let me know if you need any other
> > > information
> > > > or explanation.
> > > > Thank you very much.
> > > > 
> > > > Shi
> > > > --- Matthew Knepley <knepley at gmail.com> wrote:
> > > > 
> > > > > You really have to give us the log summary
> > > output.
> > > > > None of the relevant
> > > > > numbers are in your summary.
> > > > > 
> > > > >  Thanks,
> > > > > 
> > > > >    Matt
> > > > > 
> > > > > On 2/9/07, Shi Jin <jinzishuai at yahoo.com> wrote:
> > > > > >
> > > > > > Dear Barry,
> > > > > >
> > > > > > Thank you.
> > > > > > I actually have done the staging already.
> > > > > > I summarized the timing of the runs in google
> > > > > online
> > > > > > spreadsheets. I have two runs.
> > > > > > 1. with 400,000 finite elements:
> > > > > >
> > > > >
> > > >
> > >
> > http://spreadsheets.google.com/pub?key=pZHoqlL60quZeDZlucTjEIA
> > > > > > 2. with 1,600,000 finite elements:
> > > > > >
> > > > >
> > > >
> > >
> > http://spreadsheets.google.com/pub?key=pZHoqlL60quZcCVLAqmzqQQ
> > > > > >
> > > > > > If you can take a look at them and give me
> > > some
> > > > > > advice, I will be deeply grateful.
> > > > > >
> > > > > > Shi
> > > > > > --- Barry Smith <bsmith at mcs.anl.gov> wrote:
> > > > > >
> > > > > > >
> > > > > > >   NO, NO, don't spend time stripping your
> > > code!
> > > > > > > Unproductive
> > > > > > >
> > > > > > >   See the manul pages for
> > > > > PetscLogStageRegister(),
> > > > > > > PetscLogStagePush() and
> > > > > > > PetscLogStagePop(). All you need to do is
> > > > > maintain a
> > > > > > > seperate stage for each
> > > > > > > of your KSPSolves; in your case you'll
> > > create 3
> > > > > > > stages.
> > > > > > >
> > > > > > >    Barry
> > > > > > >
> > > > > > > On Fri, 9 Feb 2007, Shi Jin wrote:
> > > > > > >
> > > > > > > > Thank you.
> > > > > > > > But my code has 10 calls to KSPSolve of
> > > three
> > > > > > > > different linear systems at each time
> > > update.
> > > > > > > Should I
> > > > > > > > strip it down to a single KSPSolve so that
> > > it
> > > > > is
> > > > > > > > easier to analysis? I might have the code
> > > dump
> > > > > the
> > > > > > > > Matrix and vector and write another code
> > > to
> > > > > read
> > > > > > > them
> > > > > > > > into and call KSPSolve. I don't know
> > > whether
> > > > > this
> > > > > > > is
> > > > > > > > worth doing  or should I just send in the
> > > > > messy
> > > > > > > log
> > > > > > > > file of the whole run.
> > > > > > > > Thanks for any advice.
> > > > > > > >
> > > > > > > > Shi
> > > > > > > >
> > > > > > > > --- Barry Smith <bsmith at mcs.anl.gov>
> > > wrote:
> > > > > > > >
> > > > > > > > >
> > > > > > > > >   Shi,
> > > > > > > > >
> > > > > > > > >    There is never a better test problem
> > > then
> > > > > > > your
> > > > > > > > > actual problem.
> > > > > > > > > Send the results from running on 1, 4,
> > > and 8
> > > > > > > > > processes with the options
> > > > > > > > > -log_summary -ksp_view (use the
> > > optimized
> > > > > > > version of
> > > > > > > > > PETSc (running
> > > > > > > > > config/configure.py --with-debugging=0))
> > > > > > > > >
> > > > > > > > >   Barry
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Fri, 9 Feb 2007, Shi Jin wrote:
> > > > > > > > >
> > > > > > > > > > Hi there,
> > > > > > > > > >
> > > > > > > > > > I am tuning our 3D FEM CFD code
> > > written
> > > > > with
> > > > > > > > > PETSc.
> > > > > > > > > > The code doesn't scale very well. For
> > > > > example,
> > > > > > > > > with 8
> > > > > > > > > > processes on a linux cluster, the
> > > speedup
> > > > > we
> > > > > > > > > achieve
> > > > > > > > > > with a fairly large problem
> > > size(million
> > > > > of
> > > > > > > > > elements)
> > > > > > > > > > is only 3 to 4 using the Congugate
> > > > > gradient
> > > > > > > > > solver. We
> > > > > > > > > > can achieve a speed up of a 6.5 using
> > > a
> > > > > GMRes
> > > > > > > > > solver
> > > > > > > > > > but the wall clock time of a GMRes is
> > > > > longer
> > > > > > > than
> > > > > > > > > a CG
> > > > > > > > > > solver which indicates that CG is the
> > > > > faster
> > > > > > > > > solver
> > > > > > > > > > and it scales not as good as GMRes. Is
> > > > > this
> > > > > > > > > generally
> > > > > > > > > > true?
> > > > > > > > > >
> > > > > > > > > > I then went to the examples and find a
> > > 2D
> > > > > > > example
> > > > > > > > > of
> > > > > > > > > > KSPSolve (ex2.c). I let the code ran
> > > with
> > > > > a
> > > > > > > > > 1000x1000
> > > > > > > > > > mesh and get a linear scaling of the
> > > CG
> > > > > solver
> > > > > > > and
> > > > > > > > > a
> > > > > > > > > > super linear scaling of the GMRes.
> > > These
> > > > > are
> > > > > > > both
> > > > > > > > > much
> > > > > > > > > > better than our code. However, I think
> > > the
> > > > > 2D
> > > > > > > > > nature
> > > 
> > === message truncated ===
> > 
> > 
> > 
> >  
> >
>
____________________________________________________________________________________
> > Cheap talk?
> > Check out Yahoo! Messenger's low PC-to-Phone call rates.
> > http://voice.yahoo.com
> > 
> > 
> 
> 




From jinzishuai at yahoo.com  Sat Feb 10 16:45:29 2007
From: jinzishuai at yahoo.com (Shi Jin)
Date: Sat, 10 Feb 2007 14:45:29 -0800 (PST)
Subject: A 3D example of KSPSolve?
In-Reply-To: <Pine.LNX.4.64.0702101306140.3665@asterix>
Message-ID: <221628.34725.qm@web36210.mail.mud.yahoo.com>

Yes. The results follow.
--- Satish Balay <balay at mcs.anl.gov> wrote:

> Can you send the optupt from the following runs. You
> can do this with
> src/ksp/ksp/examples/tutorials/ex2.c - to keep
> things simple.
> 
> petscmpirun -n 2 taskset -c 0,2 ./ex2 -log_summary |
> egrep \(MPI_Send\|MPI_Barrier\)
Average time for MPI_Barrier(): 1.81198e-06
Average time for zero size MPI_Send(): 5.00679e-06
> petscmpirun -n 2 taskset -c 0,4 ./ex2 -log_summary |
> egrep \(MPI_Send\|MPI_Barrier\)
Average time for MPI_Barrier(): 2.00272e-06
Average time for zero size MPI_Send(): 4.05312e-06
> petscmpirun -n 2 taskset -c 0,6 ./ex2 -log_summary |
> egrep \(MPI_Send\|MPI_Barrier\)
Average time for MPI_Barrier(): 1.7643e-06
Average time for zero size MPI_Send(): 4.05312e-06
> petscmpirun -n 2 taskset -c 0,8 ./ex2 -log_summary |
> egrep \(MPI_Send\|MPI_Barrier\)
Average time for MPI_Barrier(): 2.00272e-06
Average time for zero size MPI_Send(): 4.05312e-06
> petscmpirun -n 2 taskset -c 0,12 ./ex2 -log_summary
> | egrep \(MPI_Send\|MPI_Barrier\)
Average time for MPI_Barrier(): 1.57356e-06
Average time for zero size MPI_Send(): 5.48363e-06
> petscmpirun -n 2 taskset -c 0,14 ./ex2 -log_summary
> | egrep \(MPI_Send\|MPI_Barrier\)
Average time for MPI_Barrier(): 2.00272e-06
Average time for zero size MPI_Send(): 4.52995e-06
I also did 
 petscmpirun -n 2 taskset -c 0,10 ./ex2 -log_summary |
egrep \(MPI_Send\|MPI_Barrier\)
Average time for MPI_Barrier(): 5.00679e-06
Average time for zero size MPI_Send(): 3.93391e-06


The results are not so different from each other. Also
please note, the timing is not exact, some times I got
O(1e-5) timings for all cases.
I assume these numbers are pretty good, right? Does it
indicate that the MPI communication on a SMP machine
is very fast?
I will do a similar test on a cluster and report it
back to the list.

Shi




 
____________________________________________________________________________________
Need Mail bonding?
Go to the Yahoo! Mail Q&A for great tips from Yahoo! Answers users.
http://answers.yahoo.com/dir/?link=list&sid=396546091



From jinzishuai at yahoo.com  Sat Feb 10 17:01:19 2007
From: jinzishuai at yahoo.com (Shi Jin)
Date: Sat, 10 Feb 2007 15:01:19 -0800 (PST)
Subject: A 3D example of KSPSolve?
In-Reply-To: <221628.34725.qm@web36210.mail.mud.yahoo.com>
Message-ID: <539403.87888.qm@web36205.mail.mud.yahoo.com>

Here is the test on a linux cluster with gigabit
ethernet interconnect.
MPI2/output:Average time for MPI_Barrier():
6.00338e-05
MPI2/output:Average time for zero size MPI_Send():
5.40018e-05
MPI4/output:Average time for MPI_Barrier(): 0.00806541
MPI4/output:Average time for zero size MPI_Send():
6.07371e-05
MPI8/output:Average time for MPI_Barrier(): 0.00805483
MPI8/output:Average time for zero size MPI_Send():
6.97374e-05

Note MPI<N> indicates the run using N processes.
It seems that the MPI_Barrier takes a much longer time
do finish than one a SMP machine. Is this a load
balance issue or is it merely the show of slow
communication speed? 
Thanks.
Shi
--- Shi Jin <jinzishuai at yahoo.com> wrote:

> Yes. The results follow.
> --- Satish Balay <balay at mcs.anl.gov> wrote:
> 
> > Can you send the optupt from the following runs.
> You
> > can do this with
> > src/ksp/ksp/examples/tutorials/ex2.c - to keep
> > things simple.
> > 
> > petscmpirun -n 2 taskset -c 0,2 ./ex2 -log_summary
> |
> > egrep \(MPI_Send\|MPI_Barrier\)
> Average time for MPI_Barrier(): 1.81198e-06
> Average time for zero size MPI_Send(): 5.00679e-06
> > petscmpirun -n 2 taskset -c 0,4 ./ex2 -log_summary
> |
> > egrep \(MPI_Send\|MPI_Barrier\)
> Average time for MPI_Barrier(): 2.00272e-06
> Average time for zero size MPI_Send(): 4.05312e-06
> > petscmpirun -n 2 taskset -c 0,6 ./ex2 -log_summary
> |
> > egrep \(MPI_Send\|MPI_Barrier\)
> Average time for MPI_Barrier(): 1.7643e-06
> Average time for zero size MPI_Send(): 4.05312e-06
> > petscmpirun -n 2 taskset -c 0,8 ./ex2 -log_summary
> |
> > egrep \(MPI_Send\|MPI_Barrier\)
> Average time for MPI_Barrier(): 2.00272e-06
> Average time for zero size MPI_Send(): 4.05312e-06
> > petscmpirun -n 2 taskset -c 0,12 ./ex2
> -log_summary
> > | egrep \(MPI_Send\|MPI_Barrier\)
> Average time for MPI_Barrier(): 1.57356e-06
> Average time for zero size MPI_Send(): 5.48363e-06
> > petscmpirun -n 2 taskset -c 0,14 ./ex2
> -log_summary
> > | egrep \(MPI_Send\|MPI_Barrier\)
> Average time for MPI_Barrier(): 2.00272e-06
> Average time for zero size MPI_Send(): 4.52995e-06
> I also did 
>  petscmpirun -n 2 taskset -c 0,10 ./ex2 -log_summary
> |
> egrep \(MPI_Send\|MPI_Barrier\)
> Average time for MPI_Barrier(): 5.00679e-06
> Average time for zero size MPI_Send(): 3.93391e-06
> 
> 
> The results are not so different from each other.
> Also
> please note, the timing is not exact, some times I
> got
> O(1e-5) timings for all cases.
> I assume these numbers are pretty good, right? Does
> it
> indicate that the MPI communication on a SMP machine
> is very fast?
> I will do a similar test on a cluster and report it
> back to the list.
> 
> Shi
> 
> 
> 
> 
>  
>
____________________________________________________________________________________
> Need Mail bonding?
> Go to the Yahoo! Mail Q&A for great tips from Yahoo!
> Answers users.
>
http://answers.yahoo.com/dir/?link=list&sid=396546091
> 
> 



 
____________________________________________________________________________________
We won't tell. Get more on shows you hate to love 
(and love to hate): Yahoo! TV's Guilty Pleasures list.
http://tv.yahoo.com/collections/265 



From bsmith at mcs.anl.gov  Sat Feb 10 17:03:49 2007
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Sat, 10 Feb 2007 17:03:49 -0600 (CST)
Subject: A 3D example of KSPSolve?
In-Reply-To: <539403.87888.qm@web36205.mail.mud.yahoo.com>
References: <539403.87888.qm@web36205.mail.mud.yahoo.com>
Message-ID: <Pine.OSX.4.64.0702101702590.21480@barry-smiths-computer.local>


 gigabit ethernet has huge latencies; it is not
good enough for a cluster.

  Barry


On Sat, 10 Feb 2007, Shi Jin wrote:

> Here is the test on a linux cluster with gigabit
> ethernet interconnect.
> MPI2/output:Average time for MPI_Barrier():
> 6.00338e-05
> MPI2/output:Average time for zero size MPI_Send():
> 5.40018e-05
> MPI4/output:Average time for MPI_Barrier(): 0.00806541
> MPI4/output:Average time for zero size MPI_Send():
> 6.07371e-05
> MPI8/output:Average time for MPI_Barrier(): 0.00805483
> MPI8/output:Average time for zero size MPI_Send():
> 6.97374e-05
> 
> Note MPI<N> indicates the run using N processes.
> It seems that the MPI_Barrier takes a much longer time
> do finish than one a SMP machine. Is this a load
> balance issue or is it merely the show of slow
> communication speed? 
> Thanks.
> Shi
> --- Shi Jin <jinzishuai at yahoo.com> wrote:
> 
> > Yes. The results follow.
> > --- Satish Balay <balay at mcs.anl.gov> wrote:
> > 
> > > Can you send the optupt from the following runs.
> > You
> > > can do this with
> > > src/ksp/ksp/examples/tutorials/ex2.c - to keep
> > > things simple.
> > > 
> > > petscmpirun -n 2 taskset -c 0,2 ./ex2 -log_summary
> > |
> > > egrep \(MPI_Send\|MPI_Barrier\)
> > Average time for MPI_Barrier(): 1.81198e-06
> > Average time for zero size MPI_Send(): 5.00679e-06
> > > petscmpirun -n 2 taskset -c 0,4 ./ex2 -log_summary
> > |
> > > egrep \(MPI_Send\|MPI_Barrier\)
> > Average time for MPI_Barrier(): 2.00272e-06
> > Average time for zero size MPI_Send(): 4.05312e-06
> > > petscmpirun -n 2 taskset -c 0,6 ./ex2 -log_summary
> > |
> > > egrep \(MPI_Send\|MPI_Barrier\)
> > Average time for MPI_Barrier(): 1.7643e-06
> > Average time for zero size MPI_Send(): 4.05312e-06
> > > petscmpirun -n 2 taskset -c 0,8 ./ex2 -log_summary
> > |
> > > egrep \(MPI_Send\|MPI_Barrier\)
> > Average time for MPI_Barrier(): 2.00272e-06
> > Average time for zero size MPI_Send(): 4.05312e-06
> > > petscmpirun -n 2 taskset -c 0,12 ./ex2
> > -log_summary
> > > | egrep \(MPI_Send\|MPI_Barrier\)
> > Average time for MPI_Barrier(): 1.57356e-06
> > Average time for zero size MPI_Send(): 5.48363e-06
> > > petscmpirun -n 2 taskset -c 0,14 ./ex2
> > -log_summary
> > > | egrep \(MPI_Send\|MPI_Barrier\)
> > Average time for MPI_Barrier(): 2.00272e-06
> > Average time for zero size MPI_Send(): 4.52995e-06
> > I also did 
> >  petscmpirun -n 2 taskset -c 0,10 ./ex2 -log_summary
> > |
> > egrep \(MPI_Send\|MPI_Barrier\)
> > Average time for MPI_Barrier(): 5.00679e-06
> > Average time for zero size MPI_Send(): 3.93391e-06
> > 
> > 
> > The results are not so different from each other.
> > Also
> > please note, the timing is not exact, some times I
> > got
> > O(1e-5) timings for all cases.
> > I assume these numbers are pretty good, right? Does
> > it
> > indicate that the MPI communication on a SMP machine
> > is very fast?
> > I will do a similar test on a cluster and report it
> > back to the list.
> > 
> > Shi
> > 
> > 
> > 
> > 
> >  
> >
> ____________________________________________________________________________________
> > Need Mail bonding?
> > Go to the Yahoo! Mail Q&A for great tips from Yahoo!
> > Answers users.
> >
> http://answers.yahoo.com/dir/?link=list&sid=396546091
> > 
> > 
> 
> 
> 
>  
> ____________________________________________________________________________________
> We won't tell. Get more on shows you hate to love 
> (and love to hate): Yahoo! TV's Guilty Pleasures list.
> http://tv.yahoo.com/collections/265 
> 
> 



From jinzishuai at yahoo.com  Sat Feb 10 17:10:22 2007
From: jinzishuai at yahoo.com (Shi Jin)
Date: Sat, 10 Feb 2007 15:10:22 -0800 (PST)
Subject: A 3D example of KSPSolve?
In-Reply-To: <221628.34725.qm@web36210.mail.mud.yahoo.com>
Message-ID: <918742.70603.qm@web36202.mail.mud.yahoo.com>


Furthermore, I did a multi-process test on the SMP.
petscmpirun -n 3 taskset -c 0,2,4 ./ex2 -ksp_type cg
-log_summary | egrep \(MPI_Send\|MPI_Barrier\)
Average time for MPI_Barrier(): 4.19617e-06
Average time for zero size MPI_Send(): 3.65575e-06

 petscmpirun -n 4 taskset -c 0,2,4,6 ./ex2 -ksp_type
cg -log_summary | egrep \(MPI_Send\|MPI_Barrier\)
Average time for MPI_Barrier(): 1.75953e-05
Average time for zero size MPI_Send(): 2.44975e-05

 petscmpirun -n 5 taskset -c 0,2,4,6,8 ./ex2 -ksp_type
cg -log_summary | egrep \(MPI_Send\|MPI_Barrier\)
Average time for MPI_Barrier(): 4.22001e-05
Average time for zero size MPI_Send(): 2.54154e-05

petscmpirun -n 6 taskset -c 0,2,4,6,8,10 ./ex2
-ksp_type cg -log_summary | egrep
\(MPI_Send\|MPI_Barrier\)
Average time for MPI_Barrier(): 4.87804e-05
Average time for zero size MPI_Send(): 1.83185e-05

petscmpirun -n 7 taskset -c 0,2,4,6,8,10,12 ./ex2
-ksp_type cg -log_summary | egrep
\(MPI_Send\|MPI_Barrier\)
Average time for MPI_Barrier(): 2.37942e-05
Average time for zero size MPI_Send(): 5.00679e-06

petscmpirun -n 8 taskset -c 0,2,4,6,8,10,12,14 ./ex2
-ksp_type cg -log_summary | egrep
\(MPI_Send\|MPI_Barrier\)
Average time for MPI_Barrier(): 1.35899e-05
Average time for zero size MPI_Send(): 6.73532e-06

They all seem quite fast.
Shi

--- Shi Jin <jinzishuai at yahoo.com> wrote:

> Yes. The results follow.
> --- Satish Balay <balay at mcs.anl.gov> wrote:
> 
> > Can you send the optupt from the following runs.
> You
> > can do this with
> > src/ksp/ksp/examples/tutorials/ex2.c - to keep
> > things simple.
> > 
> > petscmpirun -n 2 taskset -c 0,2 ./ex2 -log_summary
> |
> > egrep \(MPI_Send\|MPI_Barrier\)
> Average time for MPI_Barrier(): 1.81198e-06
> Average time for zero size MPI_Send(): 5.00679e-06
> > petscmpirun -n 2 taskset -c 0,4 ./ex2 -log_summary
> |
> > egrep \(MPI_Send\|MPI_Barrier\)
> Average time for MPI_Barrier(): 2.00272e-06
> Average time for zero size MPI_Send(): 4.05312e-06
> > petscmpirun -n 2 taskset -c 0,6 ./ex2 -log_summary
> |
> > egrep \(MPI_Send\|MPI_Barrier\)
> Average time for MPI_Barrier(): 1.7643e-06
> Average time for zero size MPI_Send(): 4.05312e-06
> > petscmpirun -n 2 taskset -c 0,8 ./ex2 -log_summary
> |
> > egrep \(MPI_Send\|MPI_Barrier\)
> Average time for MPI_Barrier(): 2.00272e-06
> Average time for zero size MPI_Send(): 4.05312e-06
> > petscmpirun -n 2 taskset -c 0,12 ./ex2
> -log_summary
> > | egrep \(MPI_Send\|MPI_Barrier\)
> Average time for MPI_Barrier(): 1.57356e-06
> Average time for zero size MPI_Send(): 5.48363e-06
> > petscmpirun -n 2 taskset -c 0,14 ./ex2
> -log_summary
> > | egrep \(MPI_Send\|MPI_Barrier\)
> Average time for MPI_Barrier(): 2.00272e-06
> Average time for zero size MPI_Send(): 4.52995e-06
> I also did 
>  petscmpirun -n 2 taskset -c 0,10 ./ex2 -log_summary
> |
> egrep \(MPI_Send\|MPI_Barrier\)
> Average time for MPI_Barrier(): 5.00679e-06
> Average time for zero size MPI_Send(): 3.93391e-06
> 
> 
> The results are not so different from each other.
> Also
> please note, the timing is not exact, some times I
> got
> O(1e-5) timings for all cases.
> I assume these numbers are pretty good, right? Does
> it
> indicate that the MPI communication on a SMP machine
> is very fast?
> I will do a similar test on a cluster and report it
> back to the list.
> 
> Shi
> 
> 
> 
> 
>  
>
____________________________________________________________________________________
> Need Mail bonding?
> Go to the Yahoo! Mail Q&A for great tips from Yahoo!
> Answers users.
>
http://answers.yahoo.com/dir/?link=list&sid=396546091
> 
> 



 
____________________________________________________________________________________
Yahoo! Music Unlimited
Access over 1 million songs.
http://music.yahoo.com/unlimited



From jinzishuai at yahoo.com  Sat Feb 10 17:54:53 2007
From: jinzishuai at yahoo.com (Shi Jin)
Date: Sat, 10 Feb 2007 15:54:53 -0800 (PST)
Subject: A 3D example of KSPSolve?
In-Reply-To: <Pine.OSX.4.64.0702101702590.21480@barry-smiths-computer.local>
Message-ID: <37521.92324.qm@web36202.mail.mud.yahoo.com>

I understand but this is our reality. I did the same
test on a cluster with infiniband:
MPI2/output:Average time for MPI_Barrier():
9.58443e-06
MPI2/output:Average time for zero size MPI_Send():
8.9407e-06
MPI4/output:Average time for MPI_Barrier():
1.93596e-05
MPI4/output:Average time for zero size MPI_Send():
1.0252e-05
MPI8/output:Average time for MPI_Barrier():
3.33786e-05
MPI8/output:Average time for zero size MPI_Send():
1.01328e-05
MPI16/output:Average time for MPI_Barrier():
4.53949e-05
MPI16/output:Average time for zero size MPI_Send():
9.87947e-06

The MPI_Barrier problem becomes much better.
However, when our code is tested on both clusters
(gigabit and infiniband), we don't see much difference
in their performance.  I attach the log file for a run
with 4 processes on this infiniband cluster.

Shi
--- Barry Smith <bsmith at mcs.anl.gov> wrote:

> 
>  gigabit ethernet has huge latencies; it is not
> good enough for a cluster.
> 
>   Barry
> 
> 
> On Sat, 10 Feb 2007, Shi Jin wrote:
> 
> > Here is the test on a linux cluster with gigabit
> > ethernet interconnect.
> > MPI2/output:Average time for MPI_Barrier():
> > 6.00338e-05
> > MPI2/output:Average time for zero size MPI_Send():
> > 5.40018e-05
> > MPI4/output:Average time for MPI_Barrier():
> 0.00806541
> > MPI4/output:Average time for zero size MPI_Send():
> > 6.07371e-05
> > MPI8/output:Average time for MPI_Barrier():
> 0.00805483
> > MPI8/output:Average time for zero size MPI_Send():
> > 6.97374e-05
> > 
> > Note MPI<N> indicates the run using N processes.
> > It seems that the MPI_Barrier takes a much longer
> time
> > do finish than one a SMP machine. Is this a load
> > balance issue or is it merely the show of slow
> > communication speed? 
> > Thanks.
> > Shi
> > --- Shi Jin <jinzishuai at yahoo.com> wrote:
> > 
> > > Yes. The results follow.
> > > --- Satish Balay <balay at mcs.anl.gov> wrote:
> > > 
> > > > Can you send the optupt from the following
> runs.
> > > You
> > > > can do this with
> > > > src/ksp/ksp/examples/tutorials/ex2.c - to keep
> > > > things simple.
> > > > 
> > > > petscmpirun -n 2 taskset -c 0,2 ./ex2
> -log_summary
> > > |
> > > > egrep \(MPI_Send\|MPI_Barrier\)
> > > Average time for MPI_Barrier(): 1.81198e-06
> > > Average time for zero size MPI_Send():
> 5.00679e-06
> > > > petscmpirun -n 2 taskset -c 0,4 ./ex2
> -log_summary
> > > |
> > > > egrep \(MPI_Send\|MPI_Barrier\)
> > > Average time for MPI_Barrier(): 2.00272e-06
> > > Average time for zero size MPI_Send():
> 4.05312e-06
> > > > petscmpirun -n 2 taskset -c 0,6 ./ex2
> -log_summary
> > > |
> > > > egrep \(MPI_Send\|MPI_Barrier\)
> > > Average time for MPI_Barrier(): 1.7643e-06
> > > Average time for zero size MPI_Send():
> 4.05312e-06
> > > > petscmpirun -n 2 taskset -c 0,8 ./ex2
> -log_summary
> > > |
> > > > egrep \(MPI_Send\|MPI_Barrier\)
> > > Average time for MPI_Barrier(): 2.00272e-06
> > > Average time for zero size MPI_Send():
> 4.05312e-06
> > > > petscmpirun -n 2 taskset -c 0,12 ./ex2
> > > -log_summary
> > > > | egrep \(MPI_Send\|MPI_Barrier\)
> > > Average time for MPI_Barrier(): 1.57356e-06
> > > Average time for zero size MPI_Send():
> 5.48363e-06
> > > > petscmpirun -n 2 taskset -c 0,14 ./ex2
> > > -log_summary
> > > > | egrep \(MPI_Send\|MPI_Barrier\)
> > > Average time for MPI_Barrier(): 2.00272e-06
> > > Average time for zero size MPI_Send():
> 4.52995e-06
> > > I also did 
> > >  petscmpirun -n 2 taskset -c 0,10 ./ex2
> -log_summary
> > > |
> > > egrep \(MPI_Send\|MPI_Barrier\)
> > > Average time for MPI_Barrier(): 5.00679e-06
> > > Average time for zero size MPI_Send():
> 3.93391e-06
> > > 
> > > 
> > > The results are not so different from each
> other.
> > > Also
> > > please note, the timing is not exact, some times
> I
> > > got
> > > O(1e-5) timings for all cases.
> > > I assume these numbers are pretty good, right?
> Does
> > > it
> > > indicate that the MPI communication on a SMP
> machine
> > > is very fast?
> > > I will do a similar test on a cluster and report
> it
> > > back to the list.
> > > 
> > > Shi
> > > 
> > > 
> > > 
> > > 
> > >  
> > >
> >
>
____________________________________________________________________________________
> > > Need Mail bonding?
> > > Go to the Yahoo! Mail Q&A for great tips from
> Yahoo!
> > > Answers users.
> > >
> >
>
http://answers.yahoo.com/dir/?link=list&sid=396546091
> > > 
> > > 
> > 
> > 
> > 
> >  
> >
>
____________________________________________________________________________________
> > We won't tell. Get more on shows you hate to love 
> > (and love to hate): Yahoo! TV's Guilty Pleasures
> list.
> > http://tv.yahoo.com/collections/265 
> > 
> > 
> 
> 



 
____________________________________________________________________________________
Any questions? Get answers on any topic at www.Answers.yahoo.com.  Try it now.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: log-4-infiniband.txt
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070210/9504bd0f/attachment.txt>

From jinzishuai at yahoo.com  Sat Feb 10 18:36:13 2007
From: jinzishuai at yahoo.com (Shi Jin)
Date: Sat, 10 Feb 2007 16:36:13 -0800 (PST)
Subject: A 3D example of KSPSolve?
In-Reply-To: <Pine.LNX.4.64.0702092253430.3665@asterix>
Message-ID: <497379.41444.qm@web36203.mail.mud.yahoo.com>

Hi, I am a bit confused at how to interpret the
log_summary results. In my previous log files, I
logged   everything in that solving staging, including
constructing the matrix and vector and the KSPSolve.
I then specifically change the code so that each
KSPSolve() function is tightly included within the 
PetscLogStagePush() and PetscLogStagePop() pair so
that we exclude the other timings and concentrate on
the linear solver.
In this way, I still get list of 16 functions in that
stage, although I only included one (KSPSolve). They
are
VecDot         
VecNorm        
VecCopy        
VecSet         
VecAXPY        
VecAYPX        
VecScatterBegin
VecScatterEnd  
MatMult        
MatSolve       
MatLUFactorNum 
KSPSetup       
KSPSolve       
PCSetUp        
PCSetUpOnBlocks
PCApply        
Are these functions called by the KSPSolve() (in this
case, I used -ksp_type cg).
I suppose the only network communications are done in
the function calls 
VecScatterBegin
VecScatterEnd  
If I am to compute the percentage of communication
specifically for KSPSolve(), shall I just use the
times of VecScatterBegin & VecScatterEnd  devided by
the time of KSPSolve? Or shall I use MatMult, like
Satish did in his previous emails? I am a bit
confused. Please advise.

Thank you very much.

Shi
--- Satish Balay <balay at mcs.anl.gov> wrote:

> 
> Just looking at 8 proc run [diffusion stage] we
> have:
> 
> MatMult        :  79 sec
> MatMultAdd     :   2 sec
> VecScatterBegin:  17 sec
> VecScatterEnd  :  51 sec
> 
> So basically the communication in MatMult/Add is
> represented by
> VecScatters. Here out of 81 sec total - 68 seconds
> are used for
> communication [with a load imbalance of 11 for
> vecscaterend]
> 
> So - I think MPI performance is reducing scalability
> here..
> 
> Things to try:
> 
> * -vecstatter_rr etc options I sugested earlier
> 
> * install mpich with '--with-device=ch3:ssm' and see
> if it makes a difference
> 
> Satish
> 
> --- Event Stage 4: Diffusion
> 
> [x]rhsLtP            297 1.0 1.1017e+02 1.5 0.00e+00
> 0.0 0.0e+00 0.0e+00 0.0e+00  7  0  0  0  0  39  0  0
>  0  0     0
> [x]rhsGravity         99 1.0 4.2582e+0083.5 0.00e+00
> 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  0  0
>  0  0     0
> VecDot              4657 1.0 2.5748e+01 3.2 7.60e+07
> 3.2 0.0e+00 0.0e+00 4.7e+03  1  1  0  0  6   5  3  0
>  0 65   191
> VecNorm             2477 1.0 2.2109e+01 2.2 3.22e+07
> 2.2 0.0e+00 0.0e+00 2.5e+03  1  0  0  0  3   5  2  0
>  0 35   118
> VecScale             594 1.0 2.9330e-02 1.5 2.61e+08
> 1.5 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0
>  0  0  1361
> VecCopy              594 1.0 2.7552e-01 1.3 0.00e+00
> 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0
>  0  0     0
> VecSet              3665 1.0 6.0793e-01 1.4 0.00e+00
> 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0
>  0  0     0
> VecAXPY             5251 1.0 2.5892e+00 1.2 3.31e+08
> 1.2 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   1  4  0
>  0  0  2137
> VecAYPX             1883 1.0 8.6419e-01 1.3 3.62e+08
> 1.3 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  1  0
>  0  0  2296
> VecScatterBegin     2873 1.0 1.7569e+01 3.0 0.00e+00
> 0.0 3.8e+04 1.6e+05 0.0e+00  1  0 10 20  0   5 
> 0100100  0     0
> VecScatterEnd       2774 1.0 5.1519e+0110.9 0.00e+00
> 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   7  0  0
>  0  0     0
> MatMult             2477 1.0 7.9186e+01 2.4 2.34e+08
> 2.4 3.5e+04 1.7e+05 0.0e+00  3 11  9 20  0  20 48 91
> 98  0   850
> MatMultAdd           297 1.0 2.8161e+00 5.4 4.46e+07
> 2.2 3.6e+03 3.4e+04 0.0e+00  0  0  1  0  0   0  0  9
>  2  0   125
> MatSolve            2477 1.0 6.2245e+01 1.2 1.41e+08
> 1.2 0.0e+00 0.0e+00 0.0e+00  4 10  0  0  0  22 41  0
>  0  0   926
> MatLUFactorNum         3 1.0 2.7686e-01 1.1 2.79e+08
> 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0
>  0  0  2016
> MatGetRow        19560420 1.0 5.5195e+01 1.6
> 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0 
> 20  0  0  0  0     0
> KSPSetup               6 1.0 3.0756e-05 2.8 0.00e+00
> 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0
>  0  0     0
> KSPSolve             297 1.0 1.3142e+02 1.0 1.31e+08
> 1.1 3.1e+04 1.7e+05 7.1e+03  8 22  8 18  9  50 93 80
> 86100  1001
> PCSetUp                6 1.0 2.7700e-01 1.1 2.78e+08
> 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0
>  0  0  2015
> PCSetUpOnBlocks      297 1.0 2.7794e-01 1.1 2.78e+08
> 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0
>  0  0  2008
> PCApply             2477 1.0 6.2772e+01 1.2 1.39e+08
> 1.2 0.0e+00 0.0e+00 0.0e+00  4 10  0  0  0  23 41  0
>  0  0   918
> 
> 



 
____________________________________________________________________________________
It's here! Your new message!  
Get new email alerts with the free Yahoo! Toolbar.
http://tools.search.yahoo.com/toolbar/features/mail/



From bsmith at mcs.anl.gov  Sat Feb 10 18:43:33 2007
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Sat, 10 Feb 2007 18:43:33 -0600 (CST)
Subject: A 3D example of KSPSolve?
In-Reply-To: <497379.41444.qm@web36203.mail.mud.yahoo.com>
References: <497379.41444.qm@web36203.mail.mud.yahoo.com>
Message-ID: <Pine.OSX.4.64.0702101838490.21480@barry-smiths-computer.local>



On Sat, 10 Feb 2007, Shi Jin wrote:

> Hi, I am a bit confused at how to interpret the
> log_summary results. In my previous log files, I
> logged   everything in that solving staging, including
> constructing the matrix and vector and the KSPSolve.
> I then specifically change the code so that each
> KSPSolve() function is tightly included within the 
> PetscLogStagePush() and PetscLogStagePop() pair so
> that we exclude the other timings and concentrate on
> the linear solver.
> In this way, I still get list of 16 functions in that
> stage, although I only included one (KSPSolve). They
> are
> VecDot         
> VecNorm        
> VecCopy        
> VecSet         
> VecAXPY        
> VecAYPX        
> VecScatterBegin
> VecScatterEnd  
> MatMult        
> MatSolve       
> MatLUFactorNum 
> KSPSetup       
> KSPSolve       
> PCSetUp        
> PCSetUpOnBlocks
> PCApply        
> Are these functions called by the KSPSolve() (in this
> case, I used -ksp_type cg).

  YES

> I suppose the only network communications are done in
> the function calls 
> VecScatterBegin
> VecScatterEnd  

  The message passing. VecDot, VecNorm have MPI_Allreduce()s
> If I am to compute the percentage of communication
> specifically for KSPSolve(), shall I just use the
> times of VecScatterBegin & VecScatterEnd  devided by
> the time of KSPSolve? Or shall I use MatMult, like
> Satish did in his previous emails? I am a bit
> confused. Please advise.

  You can do either; using Mult tells you how well the mult is doing
in terms of message passing communication. Using ksp tells how
in the entire solve.

  You can add the option -log_sync and it will try to seperate the 
amount of time in the dot, norm and scatters that is actually spent on
communication and how much is spent on synchronization (due to load inbalance).

  Barry

> 
> Thank you very much.
> 
> Shi
> --- Satish Balay <balay at mcs.anl.gov> wrote:
> 
> > 
> > Just looking at 8 proc run [diffusion stage] we
> > have:
> > 
> > MatMult        :  79 sec
> > MatMultAdd     :   2 sec
> > VecScatterBegin:  17 sec
> > VecScatterEnd  :  51 sec
> > 
> > So basically the communication in MatMult/Add is
> > represented by
> > VecScatters. Here out of 81 sec total - 68 seconds
> > are used for
> > communication [with a load imbalance of 11 for
> > vecscaterend]
> > 
> > So - I think MPI performance is reducing scalability
> > here..
> > 
> > Things to try:
> > 
> > * -vecstatter_rr etc options I sugested earlier
> > 
> > * install mpich with '--with-device=ch3:ssm' and see
> > if it makes a difference
> > 
> > Satish
> > 
> > --- Event Stage 4: Diffusion
> > 
> > [x]rhsLtP            297 1.0 1.1017e+02 1.5 0.00e+00
> > 0.0 0.0e+00 0.0e+00 0.0e+00  7  0  0  0  0  39  0  0
> >  0  0     0
> > [x]rhsGravity         99 1.0 4.2582e+0083.5 0.00e+00
> > 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  0  0
> >  0  0     0
> > VecDot              4657 1.0 2.5748e+01 3.2 7.60e+07
> > 3.2 0.0e+00 0.0e+00 4.7e+03  1  1  0  0  6   5  3  0
> >  0 65   191
> > VecNorm             2477 1.0 2.2109e+01 2.2 3.22e+07
> > 2.2 0.0e+00 0.0e+00 2.5e+03  1  0  0  0  3   5  2  0
> >  0 35   118
> > VecScale             594 1.0 2.9330e-02 1.5 2.61e+08
> > 1.5 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0
> >  0  0  1361
> > VecCopy              594 1.0 2.7552e-01 1.3 0.00e+00
> > 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0
> >  0  0     0
> > VecSet              3665 1.0 6.0793e-01 1.4 0.00e+00
> > 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0
> >  0  0     0
> > VecAXPY             5251 1.0 2.5892e+00 1.2 3.31e+08
> > 1.2 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   1  4  0
> >  0  0  2137
> > VecAYPX             1883 1.0 8.6419e-01 1.3 3.62e+08
> > 1.3 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  1  0
> >  0  0  2296
> > VecScatterBegin     2873 1.0 1.7569e+01 3.0 0.00e+00
> > 0.0 3.8e+04 1.6e+05 0.0e+00  1  0 10 20  0   5 
> > 0100100  0     0
> > VecScatterEnd       2774 1.0 5.1519e+0110.9 0.00e+00
> > 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   7  0  0
> >  0  0     0
> > MatMult             2477 1.0 7.9186e+01 2.4 2.34e+08
> > 2.4 3.5e+04 1.7e+05 0.0e+00  3 11  9 20  0  20 48 91
> > 98  0   850
> > MatMultAdd           297 1.0 2.8161e+00 5.4 4.46e+07
> > 2.2 3.6e+03 3.4e+04 0.0e+00  0  0  1  0  0   0  0  9
> >  2  0   125
> > MatSolve            2477 1.0 6.2245e+01 1.2 1.41e+08
> > 1.2 0.0e+00 0.0e+00 0.0e+00  4 10  0  0  0  22 41  0
> >  0  0   926
> > MatLUFactorNum         3 1.0 2.7686e-01 1.1 2.79e+08
> > 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0
> >  0  0  2016
> > MatGetRow        19560420 1.0 5.5195e+01 1.6
> > 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0 
> > 20  0  0  0  0     0
> > KSPSetup               6 1.0 3.0756e-05 2.8 0.00e+00
> > 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0
> >  0  0     0
> > KSPSolve             297 1.0 1.3142e+02 1.0 1.31e+08
> > 1.1 3.1e+04 1.7e+05 7.1e+03  8 22  8 18  9  50 93 80
> > 86100  1001
> > PCSetUp                6 1.0 2.7700e-01 1.1 2.78e+08
> > 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0
> >  0  0  2015
> > PCSetUpOnBlocks      297 1.0 2.7794e-01 1.1 2.78e+08
> > 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0
> >  0  0  2008
> > PCApply             2477 1.0 6.2772e+01 1.2 1.39e+08
> > 1.2 0.0e+00 0.0e+00 0.0e+00  4 10  0  0  0  23 41  0
> >  0  0   918
> > 
> > 
> 
> 
> 
>  
> ____________________________________________________________________________________
> It's here! Your new message!  
> Get new email alerts with the free Yahoo! Toolbar.
> http://tools.search.yahoo.com/toolbar/features/mail/
> 
> 



From zonexo at gmail.com  Sat Feb 10 19:02:54 2007
From: zonexo at gmail.com (Ben Tay)
Date: Sun, 11 Feb 2007 09:02:54 +0800
Subject: understanding the output from -info
In-Reply-To: <Pine.OSX.4.64.0702101305160.21480@barry-smiths-computer.local>
References: <804ab5d40702080747g69ee8b44h54cce509177ee0a8@mail.gmail.com>
	 <804ab5d40702090816qb6d1325g1d311a0eb53eec26@mail.gmail.com>
	 <a9f269830702090820ha6a989br4ef18bedc1662665@mail.gmail.com>
	 <804ab5d40702091651h6265a510jf5d4ca46cd526876@mail.gmail.com>
	 <a9f269830702091715qfe72360nf4657f0019b858cf@mail.gmail.com>
	 <Pine.OSX.4.64.0702091921170.20722@barry-smiths-computer.local>
	 <Pine.LNX.4.64.0702091940080.3665@asterix>
	 <804ab5d40702100028sf595a2apae8aba2fda9251f3@mail.gmail.com>
	 <804ab5d40702100117i5977f5bh9b161c026f16a32a@mail.gmail.com>
	 <Pine.OSX.4.64.0702101305160.21480@barry-smiths-computer.local>
Message-ID: <804ab5d40702101702s71c974d7u39a97d6ab8058cf4@mail.gmail.com>

Hi,

In other words, for my CFD code, it is not possible to parallelize it
effectively because the problem is too small?

Is these true for all parallel solver, or just PETSc? I was hoping to reduce
the runtime since mine is an unsteady problem which requires many steps to
reach a periodic state and it takes many hours to reach it.

Lastly, if I'm running on 2 processors, will there be improvement likely?

Thank you.


On 2/11/07, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
>
>
> On Sat, 10 Feb 2007, Ben Tay wrote:
>
> > Hi,
> >
> > I've repeated the test with n,m = 800. Now serial takes around 11mins
> while
> > parallel with 4 processors took 6mins. Does it mean that the problem
> must be
> > pretty large before it is more superior to use parallel?  Moreover
> 800x800
> > means there's 640000 unknowns. My problem is a 2D CFD code which
> typically
> > has 200x80=16000 unknowns. Does it mean that I won't be able to benefit
> from
>      ^^^^^^^^^^^
> You'll never get much performance past 2 processors; its not even worth
> all the work of having a parallel code in this case. I'd just optimize the
> heck out of the serial code.
>
>   Barry
>
>
>
> > running in parallel?
> >
> > Btw, this is the parallel's log_summary:
> >
> >
> > Event                Count      Time (sec)
> > Flops/sec                         --- Global ---  --- Stage ---   Total
> >                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
> > Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
> >
> ------------------------------------------------------------------------------------------------------------------------
> >
> > --- Event Stage 0: Main Stage
> >
> > MatMult             1265 1.0 7.0615e+01 1.2 3.22e+07 1.2 7.6e+03 6.4e+03
> > 0.0e+00 16 11100100  0  16 11100100  0   103
> > MatSolve            1265 1.0 4.7820e+01 1.2 4.60e+07 1.2 0.0e+00 0.0e+00
> > 0.0e+00 11 11  0  0  0  11 11  0  0  0   152
> > MatLUFactorNum         1 1.0 2.5703e-01 2.3 1.27e+07 2.3 0.0e+00 0.0e+00
> > 0.0e+00  0  0  0  0  0   0  0  0  0  0    22
> > MatILUFactorSym        1 1.0 1.8933e-01 4.1 0.00e+00 0.0 0.0e+00 0.0e+00
> > 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > MatAssemblyBegin       1 1.0 4.2153e-01 3.5 0.00e+00 0.0 0.0e+00 0.0e+00
> > 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > MatAssemblyEnd         1 1.0 3.6475e-01 1.5 0.00e+00 0.0 6.0e+00 3.2e+03
> > 1.3e+01  0  0  0  0  0   0  0  0  0  0     0
> > MatGetOrdering         1 1.0 1.2088e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> > 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > VecMDot             1224 1.0 1.5314e+02 1.2 4.63e+07 1.2 0.0e+00 0.0e+00
> > 1.2e+03 36 36  0  0 31  36 36  0  0 31   158
> > VecNorm             1266 1.0 1.0215e+02 1.1 4.31e+06 1.1 0.0e+00 0.0e+00
> > 1.3e+03 24  2  0  0 33  24  2  0  0 33    16
> > VecScale            1265 1.0 3.7467e+00 1.5 8.34e+07 1.5 0.0e+00 0.0e+00
> > 0.0e+00  1  1  0  0  0   1  1  0  0  0   216
> > VecCopy               41 1.0 2.5530e-01 2.8 0.00e+00 0.0 0.0e+00 0.0e+00
> > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > VecSet              1308 1.0 3.2717e+00 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
> > 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
> > VecAXPY               82 1.0 5.3338e-01 2.8 1.40e+08 2.8 0.0e+00 0.0e+00
> > 0.0e+00  0  0  0  0  0   0  0  0  0  0   197
> > VecMAXPY            1265 1.0 4.6234e+01 1.2 1.74e+08 1.2 0.0e+00 0.0e+00
> > 0.0e+00 10 38  0  0  0  10 38  0  0  0   557
> > VecScatterBegin     1265 1.0 1.5684e-01 1.6 0.00e+00 0.0 7.6e+03 6.4e+03
> > 0.0e+00  0  0100100  0   0  0100100  0     0
> > VecScatterEnd       1265 1.0 4.3167e+01 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
> > 0.0e+00  9  0  0  0  0   9  0  0  0  0     0
> > VecNormalize        1265 1.0 1.0459e+02 1.1 6.21e+06 1.1 0.0e+00 0.0e+00
> > 1.3e+03 25  4  0  0 32  25  4  0  0 32    23
> > KSPGMRESOrthog      1224 1.0 1.9035e+02 1.1 7.00e+07 1.1 0.0e+00 0.0e+00
> > 1.2e+03 45 72  0  0 31  45 72  0  0 31   254
> > KSPSetup               2 1.0 5.1674e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
> > 1.0e+01  0  0  0  0  0   0  0  0  0  0     0
> > KSPSolve               1 1.0 4.0269e+02 1.0 4.16e+07 1.0 7.6e+03 6.4e+03
> > 3.9e+03 99100100100 99  99100100100 99   166
> > PCSetUp                2 1.0 4.5924e-01 2.6 8.23e+06 2.6 0.0e+00 0.0e+00
> > 6.0e+00  0  0  0  0  0   0  0  0  0  0    12
> > PCSetUpOnBlocks        1 1.0 4.5847e-01 2.6 8.26e+06 2.6 0.0e+00 0.0e+00
> > 4.0e+00  0  0  0  0  0   0  0  0  0  0    13
> > PCApply             1265 1.0 5.0990e+01 1.2 4.33e+07 1.2 0.0e+00 0.0e+00
> > 1.3e+03 12 11  0  0 32  12 11  0  0 32   143
> >
> ------------------------------------------------------------------------------------------------------------------------
> >
> > Memory usage is given in bytes:
> >
> > Object Type          Creations   Destructions   Memory  Descendants'
> Mem.
> >
> > --- Event Stage 0: Main Stage
> >
> >              Matrix     4              4     643208     0
> >           Index Set     5              5    1924296     0
> >                 Vec    41             41   47379984     0
> >         Vec Scatter     1              1          0     0
> >       Krylov Solver     2              2      16880     0
> >      Preconditioner     2              2        196     0
> >
> ========================================================================================================================
> > Average time to get PetscTime(): 1.00136e-06
> > Average time for MPI_Barrier(): 4.00066e-05
> > Average time for zero size MPI_Send(): 1.70469e-05
> > OptionTable: -log_summary
> > Compiled without FORTRAN kernels
> > Compiled with full precision matrices (default)
> > sizeof(short) 2 sizeof(int) 4 sizeof(long) 4 sizeof(void*) 4
> > sizeof(PetscScalar) 8
> > Configure run at: Thu Jan 18 12:23:31 2007
> > Configure options: --with-vendor-compilers=intel --with-x=0
> --with-shared
> > --with-blas-lapack-dir=/lsftmp/g0306332/inter/mkl/lib/32
> > --with-mpi-dir=/opt/mpich/myrinet/intel/
> > -----------------------------------------
> >
> >
> >
> >
> >
> >
> >
> > On 2/10/07, Ben Tay <zonexo at gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > I tried to use ex2f.F as a test code. I've changed the number n,m from
> 3
> > > to 500 each. I ran the code using 1 processor and then with 4
> processor. I
> > > then repeat the same with the following modification:
> > >
> > >
> > > do i=1,10
> > >
> > >       call KSPSolve(ksp,b,x,ierr)
> > >
> > > end do
> > > I've added to do loop to make the solving repeat 10 times.
> > >
> > > In both cases, the serial code is faster, e.g. 1 taking 2.4 min while
> the
> > > other 3.3 min.
> > >
> > > Here's the log_summary:
> > >
> > >
> > > ---------------------------------------------- PETSc Performance
> Summary:
> > > ----------------------------------------------
> > >
> > > ./ex2f on a linux-mpi named atlas12.nus.edu.sg with 4 processors, by
> > > g0306332 Sat Feb 10 16:21:36 2007
> > > Using Petsc Release Version 2.3.2, Patch 8, Tue Jan  2 14:33:59 PST
> 2007
> > > HG revision: ebeddcedcc065e32fc252af32cf1d01ed4fc7a80
> > >
> > >                          Max       Max/Min        Avg      Total
> > > Time (sec):           2.213e+02      1.00051   2.212e+02
> > > Objects:              5.500e+01      1.00000   5.500e+01
> > > Flops:                4.718e+09      1.00019   4.718e+09  1.887e+10
> > > Flops/sec:            2.134e+07       1.00070   2.133e+07  8.531e+07
> > >
> > > Memory:               3.186e+07      1.00069              1.274e+08
> > > MPI Messages:         1.832e+03      2.00000   1.374e+03  5.496e+03
> > > MPI Message Lengths:  7.324e+06       2.00000   3.998e+03  2.197e+07
> > > MPI Reductions:       7.112e+02      1.00000
> > >
> > > Flop counting convention: 1 flop = 1 real number operation of type
> > > (multiply/divide/add/subtract)
> > >                             e.g., VecAXPY() for real vectors of length
> N
> > > --> 2N flops
> > >                             and VecAXPY() for complex vectors of
> length N
> > > --> 8N flops
> > >
> > > Summary of Stages:   ----- Time ------  ----- Flops -----  ---
> Messages
> > > ---  -- Message Lengths --  -- Reductions --
> > >                         Avg     %Total     Avg     %Total   counts
> > > %Total     Avg         %Total   counts   %Total
> > >  0:      Main Stage: 2.2120e+02 100.0%  1.8871e+10 100.0%  5.496e+03
> > > 100.0%  3.998e+03      100.0%  2.845e+03 100.0%
> > >
> > >
> > >
> > >
> ------------------------------------------------------------------------------------------------------------------------
> > > See the 'Profiling' chapter of the users' manual for details on
> > > interpreting output.
> > > Phase summary info:
> > >    Count: number of times phase was executed
> > >    Time and Flops/sec: Max - maximum over all processors
> > >                        Ratio - ratio of maximum to minimum over all
> > > processors
> > >    Mess: number of messages sent
> > >    Avg. len: average message length
> > >    Reduct: number of global reductions
> > >    Global: entire computation
> > >    Stage: stages of a computation. Set stages with PetscLogStagePush()
> and
> > > PetscLogStagePop().
> > >       %T - percent time in this phase         %F - percent flops in
> this
> > > phase
> > >       %M - percent messages in this phase     %L - percent message
> lengths
> > > in this phase
> > >       %R - percent reductions in this phase
> > >    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
> > > over all processors)
> > >
> > >
> > >
> ------------------------------------------------------------------------------------------------------------------------
> > >
> > >       ##########################################################
> > >       #                                                        #
> > >       #                          WARNING!!!                    #
> > >       #                                                        #
> > >       #   This code was compiled with a debugging option,      #
> > >       #   To get timing results run config/configure.py        #
> > >       #   using --with-debugging=no, the performance will      #
> > >       #   be generally two or three times faster.              #
> > >       #                                                        #
> > >       ##########################################################
> > >
> > >
> > >
> > >
> > >       ##########################################################
> > >       #                                                        #
> > >       #                          WARNING!!!                    #
> > >       #                                                        #
> > >       #   This code was run without the PreLoadBegin()         #
> > >       #   macros. To get timing results we always recommend    #
> > >       #   preloading. otherwise timing numbers may be          #
> > >       #   meaningless.                                         #
> > >       ##########################################################
> > >
> > >
> > > Event                Count      Time (sec)
> > > Flops/sec                         --- Global ---  --- Stage ---
> Total
> > >                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg
> len
> > > Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
> > >
> > >
> > >
> ------------------------------------------------------------------------------------------------------------------------
> > >
> > > --- Event Stage 0: Main Stage
> > >
> > > MatMult              915 1.0 4.4291e+01 1.3 1.50e+07 1.3 5.5e+03
> 4.0e+03
> > > 0.0e+00 18 11100100  0  18 11100100  0    46
> > > MatSolve             915 1.0 1.5684e+01 1.1 3.56e+07 1.1 0.0e+00
> 0.0e+00
> > > 0.0e+00  7 11  0  0  0   7 11  0  0  0   131
> > > MatLUFactorNum         1 1.0 5.1654e-02 1.4 1.48e+07 1.4 0.0e+00
> 0.0e+00
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0    43
> > > MatILUFactorSym        1 1.0 1.6838e-02 1.1 0.00e+00 0.0 0.0e+00
> 0.0e+00
> > > 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > MatAssemblyBegin       1 1.0 3.2428e-01 1.6 0.00e+00 0.0 0.0e+00
> 0.0e+00
> > > 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > MatAssemblyEnd         1 1.0 1.3120e+00 1.1 0.00e+00 0.0 6.0e+00
> 2.0e+03
> > > 1.3e+01  1  0  0  0  0   1  0  0  0  0     0
> > > MatGetOrdering         1 1.0 4.1590e-03 1.2 0.00e+00 0.0 0.0e+00
> 0.0e+00
> > > 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > VecMDot              885 1.0 8.5091e+01 1.1 2.27e+07 1.1 0.0e+00
> 0.0e+00
> > > 8.8e+02 36 36  0  0 31  36 36  0  0 31    80
> > > VecNorm              916 1.0 6.6747e+01 1.1 1.81e+06 1.1 0.0e+00
> 0.0e+00
> > > 9.2e+02 29  2  0  0 32  29  2  0  0 32     7
> > > VecScale             915 1.0 1.1430e+00 2.2 1.12e+08 2.2 0.0e+00
> 0.0e+00
> > > 0.0e+00  0  1  0  0  0   0  1  0  0  0   200
> > > VecCopy               30 1.0 1.2816e-01 5.7 0.00e+00 0.0 0.0e+00
> 0.0e+00
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > VecSet               947 1.0 7.8979e-01 1.3 0.00e+00 0.0 0.0e+00
> 0.0e+00
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > VecAXPY               60 1.0 5.5332e-02 1.1 1.51e+08 1.1 0.0e+00
> 0.0e+00
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0   542
> > > VecMAXPY             915 1.0 1.5004e+01 1.3 1.54e+08 1.3 0.0e+00
> 0.0e+00
> > > 0.0e+00  6 38  0  0  0   6 38  0  0  0   483
> > > VecScatterBegin      915 1.0 9.0358e-02 1.4 0.00e+00 0.0 5.5e+03
> 4.0e+03
> > > 0.0e+00  0  0100100  0   0  0100100  0     0
> > > VecScatterEnd        915 1.0 3.5136e+01 1.4 0.00e+00 0.0 0.0e+00
> 0.0e+00
> > > 0.0e+00 14  0  0  0  0  14  0  0  0  0     0
> > > VecNormalize         915 1.0 6.7272e+01 1.0 2.68e+06 1.0 0.0e+00
> 0.0e+00
> > > 9.2e+02 30  4  0  0 32  30  4  0  0 32    10
> > > KSPGMRESOrthog       885 1.0 9.8478e+01 1.1 3.87e+07 1.1 0.0e+00
> 0.0e+00
> > > 8.8e+02 42 72  0  0 31  42 72  0  0 31   138
> > > KSPSetup               2 1.0 6.1918e-01 1.2 0.00e+00 0.0 0.0e+00
> 0.0e+00
> > > 1.0e+01  0  0  0  0  0   0  0  0  0  0     0
> > > KSPSolve               1 1.0 2.1892e+02 1.0 2.15e+07 1.0 5.5e+03
> 4.0e+03
> > > 2.8e+03 99100100100 99  99100100100 99    86
> > > PCSetUp                2 1.0 7.3292e-02 1.3 9.84e+06 1.3 0.0e+00
> 0.0e+00
> > > 6.0e+00  0  0  0  0  0   0  0  0  0  0    30
> > > PCSetUpOnBlocks        1 1.0 7.2706e-02 1.3 9.97e+06 1.3 0.0e+00
> 0.0e+00
> > > 4.0e+00  0  0  0  0  0   0  0  0  0  0    31
> > > PCApply              915 1.0 1.6508e+01 1.1 3.27e+07 1.1 0.0e+00
> 0.0e+00
> > > 9.2e+02  7 11  0  0 32   7 11  0  0 32   124
> > >
> > >
> ------------------------------------------------------------------------------------------------------------------------
> > >
> > >
> > > Memory usage is given in bytes:
> > >
> > > Object Type          Creations   Destructions   Memory  Descendants'
> Mem.
> > >
> > > --- Event Stage 0: Main Stage
> > >
> > >               Matrix     4              4     252008     0
> > >            Index Set     5              5     753096     0
> > >                  Vec    41             41   18519984     0
> > >          Vec Scatter     1              1          0     0
> > >        Krylov Solver     2              2      16880     0
> > >       Preconditioner     2              2        196     0
> > >
> ========================================================================================================================
> > >
> > > Average time to get PetscTime(): 1.09673e-06
> > > Average time for MPI_Barrier(): 4.18186e-05
> > > Average time for zero size MPI_Send(): 2.62856e-05
> > > OptionTable: -log_summary
> > > Compiled without FORTRAN kernels
> > > Compiled with full precision matrices (default)
> > > sizeof(short) 2 sizeof(int) 4 sizeof(long) 4 sizeof(void*) 4
> > > sizeof(PetscScalar) 8
> > > Configure run at: Thu Jan 18 12:23:31 2007
> > > Configure options: --with-vendor-compilers=intel --with-x=0
> --with-shared
> > > --with-blas-lapack-dir=/lsftmp/g0306332/inter/mkl/lib/32
> > > --with-mpi-dir=/opt/mpich/myrinet/intel/
> > > -----------------------------------------
> > > Libraries compiled on Thu Jan 18 12:24:41 SGT 2007 on
> atlas1.nus.edu.sg
> > > Machine characteristics: Linux atlas1.nus.edu.sg 2.4.21-20.ELsmp #1
> SMP
> > > Wed Sep 8 17:29:34 GMT 2004 i686 i686 i386 GNU/Linux
> > > Using PETSc directory: /nas/lsftmp/g0306332/petsc-2.3.2-p8
> > > Using PETSc arch: linux-mpif90
> > > -----------------------------------------
> > > Using C compiler: /opt/mpich/myrinet/intel/bin/mpicc -fPIC -g
> > > Using Fortran compiler: /opt/mpich/myrinet/intel/bin/mpif90 -I. -fPIC
> -g
> > > -w90 -w
> > > -----------------------------------------
> > > Using include paths: -I/nas/lsftmp/g0306332/petsc-
> > > 2.3.2-p8-I/nas/lsftmp/g0306332/petsc-
> > > 2.3.2-p8/bmake/linux-mpif90 -I/nas/lsftmp/g0306332/petsc-2.3.2-p8
> /include
> > > -I/opt/mpich/myrinet/intel/include
> > > ------------------------------------------
> > > Using C linker: /opt/mpich/myrinet/intel/bin/mpicc -fPIC -g
> > > Using Fortran linker: /opt/mpich/myrinet/intel/bin/mpif90 -I. -fPIC -g
> > > -w90 -w
> > > Using libraries:
> > > -Wl,-rpath,/nas/lsftmp/g0306332/petsc-2.3.2-p8/lib/linux-mpif90
> > > -L/nas/lsftmp/g0306332/petsc-2.3.2-p8/lib/linux-mpif90 -lpetscts
> > > -lpetscsnes -lpetscksp -lpetscdm -lpetscmat -lpetscvec -lpetsc
> > > -Wl,-rpath,/lsftmp/g0306332/inter/mkl/lib/32
> > > -L/lsftmp/g0306332/inter/mkl/lib/32 -lmkl_lapack -lmkl_ia32 -lguide
> > > -lPEPCF90 -Wl,-rpath,/opt/intel/compiler70/ia32/lib
> > > -Wl,-rpath,/opt/mpich/myrinet/intel/lib -L/opt/mpich/myrinet/intel/lib
> > > -Wl,-rpath,-rpath -Wl,-rpath,-ldl -L-ldl -lmpich -Wl,-rpath,-L -lgm
> > > -lpthread -Wl,-rpath,/opt/intel/compiler70/ia32/lib
> > > -Wl,-rpath,/opt/intel/compiler70/ia32/lib
> -L/opt/intel/compiler70/ia32/lib
> > > -Wl,-rpath,/usr/lib -Wl,-rpath,/usr/lib -L/usr/lib -limf -lirc -lcprts
> -lcxa
> > > -lunwind -ldl -lmpichf90 -Wl,-rpath,/opt/gm/lib -L/opt/gm/lib
> -lPEPCF90
> > > -Wl,-rpath,/opt/intel/compiler70/ia32/lib
> -L/opt/intel/compiler70/ia32/lib
> > > -Wl,-rpath,/usr/lib -L/usr/lib -lintrins -lIEPCF90 -lF90
> -lm  -Wl,-rpath,\
> > > -Wl,-rpath,\ -L\ -ldl -lmpich -Wl,-rpath,\ -L\ -lgm -lpthread
> > > -Wl,-rpath,/opt/intel/compiler70/ia32/lib
> -L/opt/intel/compiler70/ia32/lib
> > > -Wl,-rpath,/usr/lib -L/usr/lib -limf -lirc -lcprts -lcxa -lunwind -ldl
> > > ------------------------------------------
> > >
> > >  So is there something wrong with the server's mpi implementation?
> > >
> > > Thank you.
> > >
> > >
> > >
> > > On 2/10/07, Satish Balay <balay at mcs.anl.gov> wrote:
> > > >
> > > > Looks like MatMult = 24sec Out of this the scatter time is: 22sec.
> > > > Either something is wrong with your run - or MPI is really broken..
> > > >
> > > > Satish
> > > >
> > > > > > > MatMult             3927 1.0 2.4071e+01 1.3 6.14e+06 1.4
> 2.4e+04
> > > > 1.3e+03
> > > > > > > VecScatterBegin     3927 1.0 2.8672e-01 3.9 0.00e+00 0.0
> 2.4e+04
> > > > 1.3e+03
> > > > > > > VecScatterEnd       3927 1.0 2.2135e+01 1.5 0.00e+00 0.0
> 0.0e+00
> > > > 0.0e+00
> > > >
> > > >
> > >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070211/f4fbb50d/attachment.htm>

From bsmith at mcs.anl.gov  Sat Feb 10 21:26:07 2007
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Sat, 10 Feb 2007 21:26:07 -0600 (CST)
Subject: understanding the output from -info
In-Reply-To: <804ab5d40702101702s71c974d7u39a97d6ab8058cf4@mail.gmail.com>
References: <804ab5d40702080747g69ee8b44h54cce509177ee0a8@mail.gmail.com> 
 <804ab5d40702090816qb6d1325g1d311a0eb53eec26@mail.gmail.com> 
 <a9f269830702090820ha6a989br4ef18bedc1662665@mail.gmail.com> 
 <804ab5d40702091651h6265a510jf5d4ca46cd526876@mail.gmail.com> 
 <a9f269830702091715qfe72360nf4657f0019b858cf@mail.gmail.com> 
 <Pine.OSX.4.64.0702091921170.20722@barry-smiths-computer.local> 
 <Pine.LNX.4.64.0702091940080.3665@asterix>  <804ab5d40702100028sf595a2apae8aba2fda9251f3@mail.gmail.com>
  <804ab5d40702100117i5977f5bh9b161c026f16a32a@mail.gmail.com> 
 <Pine.OSX.4.64.0702101305160.21480@barry-smiths-computer.local>
 <804ab5d40702101702s71c974d7u39a97d6ab8058cf4@mail.gmail.com>
Message-ID: <Pine.OSX.4.64.0702102123080.21480@barry-smiths-computer.local>


  My recommendation is just to try to optimize sequential runs
by using the most appropriate solver algorithms, the best sequential
processor with the fastest memory and slickest code.

  Parallel computing is to solve big problems, not to solve little problems 
fast. (anything less then 100k unknowns or even more is in my opinion is small).

   Barry 


On Sun, 11 Feb 2007, Ben Tay wrote:

> Hi,
> 
> In other words, for my CFD code, it is not possible to parallelize it
> effectively because the problem is too small?
> 
> Is these true for all parallel solver, or just PETSc? I was hoping to reduce
> the runtime since mine is an unsteady problem which requires many steps to
> reach a periodic state and it takes many hours to reach it.
> 
> Lastly, if I'm running on 2 processors, will there be improvement likely?
> 
> Thank you.
> 
> 
> On 2/11/07, Barry Smith <bsmith at mcs.anl.gov> wrote:
> > 
> > 
> > 
> > On Sat, 10 Feb 2007, Ben Tay wrote:
> > 
> > > Hi,
> > >
> > > I've repeated the test with n,m = 800. Now serial takes around 11mins
> > while
> > > parallel with 4 processors took 6mins. Does it mean that the problem
> > must be
> > > pretty large before it is more superior to use parallel?  Moreover
> > 800x800
> > > means there's 640000 unknowns. My problem is a 2D CFD code which
> > typically
> > > has 200x80=16000 unknowns. Does it mean that I won't be able to benefit
> > from
> >      ^^^^^^^^^^^
> > You'll never get much performance past 2 processors; its not even worth
> > all the work of having a parallel code in this case. I'd just optimize the
> > heck out of the serial code.
> > 
> >   Barry
> > 
> > 
> > 
> > > running in parallel?
> > >
> > > Btw, this is the parallel's log_summary:
> > >
> > >
> > > Event                Count      Time (sec)
> > > Flops/sec                         --- Global ---  --- Stage ---   Total
> > >                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
> > > Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
> > >
> > 
> > ------------------------------------------------------------------------------------------------------------------------
> > >
> > > --- Event Stage 0: Main Stage
> > >
> > > MatMult             1265 1.0 7.0615e+01 1.2 3.22e+07 1.2 7.6e+03 6.4e+03
> > > 0.0e+00 16 11100100  0  16 11100100  0   103
> > > MatSolve            1265 1.0 4.7820e+01 1.2 4.60e+07 1.2 0.0e+00 0.0e+00
> > > 0.0e+00 11 11  0  0  0  11 11  0  0  0   152
> > > MatLUFactorNum         1 1.0 2.5703e-01 2.3 1.27e+07 2.3 0.0e+00 0.0e+00
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0    22
> > > MatILUFactorSym        1 1.0 1.8933e-01 4.1 0.00e+00 0.0 0.0e+00 0.0e+00
> > > 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > MatAssemblyBegin       1 1.0 4.2153e-01 3.5 0.00e+00 0.0 0.0e+00 0.0e+00
> > > 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > MatAssemblyEnd         1 1.0 3.6475e-01 1.5 0.00e+00 0.0 6.0e+00 3.2e+03
> > > 1.3e+01  0  0  0  0  0   0  0  0  0  0     0
> > > MatGetOrdering         1 1.0 1.2088e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> > > 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > VecMDot             1224 1.0 1.5314e+02 1.2 4.63e+07 1.2 0.0e+00 0.0e+00
> > > 1.2e+03 36 36  0  0 31  36 36  0  0 31   158
> > > VecNorm             1266 1.0 1.0215e+02 1.1 4.31e+06 1.1 0.0e+00 0.0e+00
> > > 1.3e+03 24  2  0  0 33  24  2  0  0 33    16
> > > VecScale            1265 1.0 3.7467e+00 1.5 8.34e+07 1.5 0.0e+00 0.0e+00
> > > 0.0e+00  1  1  0  0  0   1  1  0  0  0   216
> > > VecCopy               41 1.0 2.5530e-01 2.8 0.00e+00 0.0 0.0e+00 0.0e+00
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > VecSet              1308 1.0 3.2717e+00 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
> > > 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
> > > VecAXPY               82 1.0 5.3338e-01 2.8 1.40e+08 2.8 0.0e+00 0.0e+00
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0   197
> > > VecMAXPY            1265 1.0 4.6234e+01 1.2 1.74e+08 1.2 0.0e+00 0.0e+00
> > > 0.0e+00 10 38  0  0  0  10 38  0  0  0   557
> > > VecScatterBegin     1265 1.0 1.5684e-01 1.6 0.00e+00 0.0 7.6e+03 6.4e+03
> > > 0.0e+00  0  0100100  0   0  0100100  0     0
> > > VecScatterEnd       1265 1.0 4.3167e+01 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
> > > 0.0e+00  9  0  0  0  0   9  0  0  0  0     0
> > > VecNormalize        1265 1.0 1.0459e+02 1.1 6.21e+06 1.1 0.0e+00 0.0e+00
> > > 1.3e+03 25  4  0  0 32  25  4  0  0 32    23
> > > KSPGMRESOrthog      1224 1.0 1.9035e+02 1.1 7.00e+07 1.1 0.0e+00 0.0e+00
> > > 1.2e+03 45 72  0  0 31  45 72  0  0 31   254
> > > KSPSetup               2 1.0 5.1674e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
> > > 1.0e+01  0  0  0  0  0   0  0  0  0  0     0
> > > KSPSolve               1 1.0 4.0269e+02 1.0 4.16e+07 1.0 7.6e+03 6.4e+03
> > > 3.9e+03 99100100100 99  99100100100 99   166
> > > PCSetUp                2 1.0 4.5924e-01 2.6 8.23e+06 2.6 0.0e+00 0.0e+00
> > > 6.0e+00  0  0  0  0  0   0  0  0  0  0    12
> > > PCSetUpOnBlocks        1 1.0 4.5847e-01 2.6 8.26e+06 2.6 0.0e+00 0.0e+00
> > > 4.0e+00  0  0  0  0  0   0  0  0  0  0    13
> > > PCApply             1265 1.0 5.0990e+01 1.2 4.33e+07 1.2 0.0e+00 0.0e+00
> > > 1.3e+03 12 11  0  0 32  12 11  0  0 32   143
> > >
> > 
> > ------------------------------------------------------------------------------------------------------------------------
> > >
> > > Memory usage is given in bytes:
> > >
> > > Object Type          Creations   Destructions   Memory  Descendants'
> > Mem.
> > >
> > > --- Event Stage 0: Main Stage
> > >
> > >              Matrix     4              4     643208     0
> > >           Index Set     5              5    1924296     0
> > >                 Vec    41             41   47379984     0
> > >         Vec Scatter     1              1          0     0
> > >       Krylov Solver     2              2      16880     0
> > >      Preconditioner     2              2        196     0
> > >
> > ========================================================================================================================
> > > Average time to get PetscTime(): 1.00136e-06
> > > Average time for MPI_Barrier(): 4.00066e-05
> > > Average time for zero size MPI_Send(): 1.70469e-05
> > > OptionTable: -log_summary
> > > Compiled without FORTRAN kernels
> > > Compiled with full precision matrices (default)
> > > sizeof(short) 2 sizeof(int) 4 sizeof(long) 4 sizeof(void*) 4
> > > sizeof(PetscScalar) 8
> > > Configure run at: Thu Jan 18 12:23:31 2007
> > > Configure options: --with-vendor-compilers=intel --with-x=0
> > --with-shared
> > > --with-blas-lapack-dir=/lsftmp/g0306332/inter/mkl/lib/32
> > > --with-mpi-dir=/opt/mpich/myrinet/intel/
> > > -----------------------------------------
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On 2/10/07, Ben Tay <zonexo at gmail.com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > I tried to use ex2f.F as a test code. I've changed the number n,m from
> > 3
> > > > to 500 each. I ran the code using 1 processor and then with 4
> > processor. I
> > > > then repeat the same with the following modification:
> > > >
> > > >
> > > > do i=1,10
> > > >
> > > >       call KSPSolve(ksp,b,x,ierr)
> > > >
> > > > end do
> > > > I've added to do loop to make the solving repeat 10 times.
> > > >
> > > > In both cases, the serial code is faster, e.g. 1 taking 2.4 min while
> > the
> > > > other 3.3 min.
> > > >
> > > > Here's the log_summary:
> > > >
> > > >
> > > > ---------------------------------------------- PETSc Performance
> > Summary:
> > > > ----------------------------------------------
> > > >
> > > > ./ex2f on a linux-mpi named atlas12.nus.edu.sg with 4 processors, by
> > > > g0306332 Sat Feb 10 16:21:36 2007
> > > > Using Petsc Release Version 2.3.2, Patch 8, Tue Jan  2 14:33:59 PST
> > 2007
> > > > HG revision: ebeddcedcc065e32fc252af32cf1d01ed4fc7a80
> > > >
> > > >                          Max       Max/Min        Avg      Total
> > > > Time (sec):           2.213e+02      1.00051   2.212e+02
> > > > Objects:              5.500e+01      1.00000   5.500e+01
> > > > Flops:                4.718e+09      1.00019   4.718e+09  1.887e+10
> > > > Flops/sec:            2.134e+07       1.00070   2.133e+07  8.531e+07
> > > >
> > > > Memory:               3.186e+07      1.00069              1.274e+08
> > > > MPI Messages:         1.832e+03      2.00000   1.374e+03  5.496e+03
> > > > MPI Message Lengths:  7.324e+06       2.00000   3.998e+03  2.197e+07
> > > > MPI Reductions:       7.112e+02      1.00000
> > > >
> > > > Flop counting convention: 1 flop = 1 real number operation of type
> > > > (multiply/divide/add/subtract)
> > > >                             e.g., VecAXPY() for real vectors of length
> > N
> > > > --> 2N flops
> > > >                             and VecAXPY() for complex vectors of
> > length N
> > > > --> 8N flops
> > > >
> > > > Summary of Stages:   ----- Time ------  ----- Flops -----  ---
> > Messages
> > > > ---  -- Message Lengths --  -- Reductions --
> > > >                         Avg     %Total     Avg     %Total   counts
> > > > %Total     Avg         %Total   counts   %Total
> > > >  0:      Main Stage: 2.2120e+02 100.0%  1.8871e+10 100.0%  5.496e+03
> > > > 100.0%  3.998e+03      100.0%  2.845e+03 100.0%
> > > >
> > > >
> > > >
> > > >
> > 
> > ------------------------------------------------------------------------------------------------------------------------
> > > > See the 'Profiling' chapter of the users' manual for details on
> > > > interpreting output.
> > > > Phase summary info:
> > > >    Count: number of times phase was executed
> > > >    Time and Flops/sec: Max - maximum over all processors
> > > >                        Ratio - ratio of maximum to minimum over all
> > > > processors
> > > >    Mess: number of messages sent
> > > >    Avg. len: average message length
> > > >    Reduct: number of global reductions
> > > >    Global: entire computation
> > > >    Stage: stages of a computation. Set stages with PetscLogStagePush()
> > and
> > > > PetscLogStagePop().
> > > >       %T - percent time in this phase         %F - percent flops in
> > this
> > > > phase
> > > >       %M - percent messages in this phase     %L - percent message
> > lengths
> > > > in this phase
> > > >       %R - percent reductions in this phase
> > > >    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
> > > > over all processors)
> > > >
> > > >
> > > >
> > 
> > ------------------------------------------------------------------------------------------------------------------------
> > > >
> > > >       ##########################################################
> > > >       #                                                        #
> > > >       #                          WARNING!!!                    #
> > > >       #                                                        #
> > > >       #   This code was compiled with a debugging option,      #
> > > >       #   To get timing results run config/configure.py        #
> > > >       #   using --with-debugging=no, the performance will      #
> > > >       #   be generally two or three times faster.              #
> > > >       #                                                        #
> > > >       ##########################################################
> > > >
> > > >
> > > >
> > > >
> > > >       ##########################################################
> > > >       #                                                        #
> > > >       #                          WARNING!!!                    #
> > > >       #                                                        #
> > > >       #   This code was run without the PreLoadBegin()         #
> > > >       #   macros. To get timing results we always recommend    #
> > > >       #   preloading. otherwise timing numbers may be          #
> > > >       #   meaningless.                                         #
> > > >       ##########################################################
> > > >
> > > >
> > > > Event                Count      Time (sec)
> > > > Flops/sec                         --- Global ---  --- Stage ---
> > Total
> > > >                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg
> > len
> > > > Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
> > > >
> > > >
> > > >
> > 
> > ------------------------------------------------------------------------------------------------------------------------
> > > >
> > > > --- Event Stage 0: Main Stage
> > > >
> > > > MatMult              915 1.0 4.4291e+01 1.3 1.50e+07 1.3 5.5e+03
> > 4.0e+03
> > > > 0.0e+00 18 11100100  0  18 11100100  0    46
> > > > MatSolve             915 1.0 1.5684e+01 1.1 3.56e+07 1.1 0.0e+00
> > 0.0e+00
> > > > 0.0e+00  7 11  0  0  0   7 11  0  0  0   131
> > > > MatLUFactorNum         1 1.0 5.1654e-02 1.4 1.48e+07 1.4 0.0e+00
> > 0.0e+00
> > > > 0.0e+00  0  0  0  0  0   0  0  0  0  0    43
> > > > MatILUFactorSym        1 1.0 1.6838e-02 1.1 0.00e+00 0.0 0.0e+00
> > 0.0e+00
> > > > 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > > MatAssemblyBegin       1 1.0 3.2428e-01 1.6 0.00e+00 0.0 0.0e+00
> > 0.0e+00
> > > > 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > > MatAssemblyEnd         1 1.0 1.3120e+00 1.1 0.00e+00 0.0 6.0e+00
> > 2.0e+03
> > > > 1.3e+01  1  0  0  0  0   1  0  0  0  0     0
> > > > MatGetOrdering         1 1.0 4.1590e-03 1.2 0.00e+00 0.0 0.0e+00
> > 0.0e+00
> > > > 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > > VecMDot              885 1.0 8.5091e+01 1.1 2.27e+07 1.1 0.0e+00
> > 0.0e+00
> > > > 8.8e+02 36 36  0  0 31  36 36  0  0 31    80
> > > > VecNorm              916 1.0 6.6747e+01 1.1 1.81e+06 1.1 0.0e+00
> > 0.0e+00
> > > > 9.2e+02 29  2  0  0 32  29  2  0  0 32     7
> > > > VecScale             915 1.0 1.1430e+00 2.2 1.12e+08 2.2 0.0e+00
> > 0.0e+00
> > > > 0.0e+00  0  1  0  0  0   0  1  0  0  0   200
> > > > VecCopy               30 1.0 1.2816e-01 5.7 0.00e+00 0.0 0.0e+00
> > 0.0e+00
> > > > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > > VecSet               947 1.0 7.8979e-01 1.3 0.00e+00 0.0 0.0e+00
> > 0.0e+00
> > > > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > > VecAXPY               60 1.0 5.5332e-02 1.1 1.51e+08 1.1 0.0e+00
> > 0.0e+00
> > > > 0.0e+00  0  0  0  0  0   0  0  0  0  0   542
> > > > VecMAXPY             915 1.0 1.5004e+01 1.3 1.54e+08 1.3 0.0e+00
> > 0.0e+00
> > > > 0.0e+00  6 38  0  0  0   6 38  0  0  0   483
> > > > VecScatterBegin      915 1.0 9.0358e-02 1.4 0.00e+00 0.0 5.5e+03
> > 4.0e+03
> > > > 0.0e+00  0  0100100  0   0  0100100  0     0
> > > > VecScatterEnd        915 1.0 3.5136e+01 1.4 0.00e+00 0.0 0.0e+00
> > 0.0e+00
> > > > 0.0e+00 14  0  0  0  0  14  0  0  0  0     0
> > > > VecNormalize         915 1.0 6.7272e+01 1.0 2.68e+06 1.0 0.0e+00
> > 0.0e+00
> > > > 9.2e+02 30  4  0  0 32  30  4  0  0 32    10
> > > > KSPGMRESOrthog       885 1.0 9.8478e+01 1.1 3.87e+07 1.1 0.0e+00
> > 0.0e+00
> > > > 8.8e+02 42 72  0  0 31  42 72  0  0 31   138
> > > > KSPSetup               2 1.0 6.1918e-01 1.2 0.00e+00 0.0 0.0e+00
> > 0.0e+00
> > > > 1.0e+01  0  0  0  0  0   0  0  0  0  0     0
> > > > KSPSolve               1 1.0 2.1892e+02 1.0 2.15e+07 1.0 5.5e+03
> > 4.0e+03
> > > > 2.8e+03 99100100100 99  99100100100 99    86
> > > > PCSetUp                2 1.0 7.3292e-02 1.3 9.84e+06 1.3 0.0e+00
> > 0.0e+00
> > > > 6.0e+00  0  0  0  0  0   0  0  0  0  0    30
> > > > PCSetUpOnBlocks        1 1.0 7.2706e-02 1.3 9.97e+06 1.3 0.0e+00
> > 0.0e+00
> > > > 4.0e+00  0  0  0  0  0   0  0  0  0  0    31
> > > > PCApply              915 1.0 1.6508e+01 1.1 3.27e+07 1.1 0.0e+00
> > 0.0e+00
> > > > 9.2e+02  7 11  0  0 32   7 11  0  0 32   124
> > > >
> > > >
> > 
> > ------------------------------------------------------------------------------------------------------------------------
> > > >
> > > >
> > > > Memory usage is given in bytes:
> > > >
> > > > Object Type          Creations   Destructions   Memory  Descendants'
> > Mem.
> > > >
> > > > --- Event Stage 0: Main Stage
> > > >
> > > >               Matrix     4              4     252008     0
> > > >            Index Set     5              5     753096     0
> > > >                  Vec    41             41   18519984     0
> > > >          Vec Scatter     1              1          0     0
> > > >        Krylov Solver     2              2      16880     0
> > > >       Preconditioner     2              2        196     0
> > > >
> > ========================================================================================================================
> > > >
> > > > Average time to get PetscTime(): 1.09673e-06
> > > > Average time for MPI_Barrier(): 4.18186e-05
> > > > Average time for zero size MPI_Send(): 2.62856e-05
> > > > OptionTable: -log_summary
> > > > Compiled without FORTRAN kernels
> > > > Compiled with full precision matrices (default)
> > > > sizeof(short) 2 sizeof(int) 4 sizeof(long) 4 sizeof(void*) 4
> > > > sizeof(PetscScalar) 8
> > > > Configure run at: Thu Jan 18 12:23:31 2007
> > > > Configure options: --with-vendor-compilers=intel --with-x=0
> > --with-shared
> > > > --with-blas-lapack-dir=/lsftmp/g0306332/inter/mkl/lib/32
> > > > --with-mpi-dir=/opt/mpich/myrinet/intel/
> > > > -----------------------------------------
> > > > Libraries compiled on Thu Jan 18 12:24:41 SGT 2007 on
> > atlas1.nus.edu.sg
> > > > Machine characteristics: Linux atlas1.nus.edu.sg 2.4.21-20.ELsmp #1
> > SMP
> > > > Wed Sep 8 17:29:34 GMT 2004 i686 i686 i386 GNU/Linux
> > > > Using PETSc directory: /nas/lsftmp/g0306332/petsc-2.3.2-p8
> > > > Using PETSc arch: linux-mpif90
> > > > -----------------------------------------
> > > > Using C compiler: /opt/mpich/myrinet/intel/bin/mpicc -fPIC -g
> > > > Using Fortran compiler: /opt/mpich/myrinet/intel/bin/mpif90 -I. -fPIC
> > -g
> > > > -w90 -w
> > > > -----------------------------------------
> > > > Using include paths: -I/nas/lsftmp/g0306332/petsc-
> > > > 2.3.2-p8-I/nas/lsftmp/g0306332/petsc-
> > > > 2.3.2-p8/bmake/linux-mpif90 -I/nas/lsftmp/g0306332/petsc-2.3.2-p8
> > /include
> > > > -I/opt/mpich/myrinet/intel/include
> > > > ------------------------------------------
> > > > Using C linker: /opt/mpich/myrinet/intel/bin/mpicc -fPIC -g
> > > > Using Fortran linker: /opt/mpich/myrinet/intel/bin/mpif90 -I. -fPIC -g
> > > > -w90 -w
> > > > Using libraries:
> > > > -Wl,-rpath,/nas/lsftmp/g0306332/petsc-2.3.2-p8/lib/linux-mpif90
> > > > -L/nas/lsftmp/g0306332/petsc-2.3.2-p8/lib/linux-mpif90 -lpetscts
> > > > -lpetscsnes -lpetscksp -lpetscdm -lpetscmat -lpetscvec -lpetsc
> > > > -Wl,-rpath,/lsftmp/g0306332/inter/mkl/lib/32
> > > > -L/lsftmp/g0306332/inter/mkl/lib/32 -lmkl_lapack -lmkl_ia32 -lguide
> > > > -lPEPCF90 -Wl,-rpath,/opt/intel/compiler70/ia32/lib
> > > > -Wl,-rpath,/opt/mpich/myrinet/intel/lib -L/opt/mpich/myrinet/intel/lib
> > > > -Wl,-rpath,-rpath -Wl,-rpath,-ldl -L-ldl -lmpich -Wl,-rpath,-L -lgm
> > > > -lpthread -Wl,-rpath,/opt/intel/compiler70/ia32/lib
> > > > -Wl,-rpath,/opt/intel/compiler70/ia32/lib
> > -L/opt/intel/compiler70/ia32/lib
> > > > -Wl,-rpath,/usr/lib -Wl,-rpath,/usr/lib -L/usr/lib -limf -lirc -lcprts
> > -lcxa
> > > > -lunwind -ldl -lmpichf90 -Wl,-rpath,/opt/gm/lib -L/opt/gm/lib
> > -lPEPCF90
> > > > -Wl,-rpath,/opt/intel/compiler70/ia32/lib
> > -L/opt/intel/compiler70/ia32/lib
> > > > -Wl,-rpath,/usr/lib -L/usr/lib -lintrins -lIEPCF90 -lF90
> > -lm  -Wl,-rpath,\
> > > > -Wl,-rpath,\ -L\ -ldl -lmpich -Wl,-rpath,\ -L\ -lgm -lpthread
> > > > -Wl,-rpath,/opt/intel/compiler70/ia32/lib
> > -L/opt/intel/compiler70/ia32/lib
> > > > -Wl,-rpath,/usr/lib -L/usr/lib -limf -lirc -lcprts -lcxa -lunwind -ldl
> > > > ------------------------------------------
> > > >
> > > >  So is there something wrong with the server's mpi implementation?
> > > >
> > > > Thank you.
> > > >
> > > >
> > > >
> > > > On 2/10/07, Satish Balay <balay at mcs.anl.gov> wrote:
> > > > >
> > > > > Looks like MatMult = 24sec Out of this the scatter time is: 22sec.
> > > > > Either something is wrong with your run - or MPI is really broken..
> > > > >
> > > > > Satish
> > > > >
> > > > > > > > MatMult             3927 1.0 2.4071e+01 1.3 6.14e+06 1.4
> > 2.4e+04
> > > > > 1.3e+03
> > > > > > > > VecScatterBegin     3927 1.0 2.8672e-01 3.9 0.00e+00 0.0
> > 2.4e+04
> > > > > 1.3e+03
> > > > > > > > VecScatterEnd       3927 1.0 2.2135e+01 1.5 0.00e+00 0.0
> > 0.0e+00
> > > > > 0.0e+00
> > > > >
> > > > >
> > > >
> > >
> > 
> > 
> 



From dalcinl at gmail.com  Sat Feb 10 22:03:20 2007
From: dalcinl at gmail.com (Lisandro Dalcin)
Date: Sun, 11 Feb 2007 01:03:20 -0300
Subject: understanding the output from -info
In-Reply-To: <804ab5d40702101702s71c974d7u39a97d6ab8058cf4@mail.gmail.com>
References: <804ab5d40702080747g69ee8b44h54cce509177ee0a8@mail.gmail.com>
	 <a9f269830702090820ha6a989br4ef18bedc1662665@mail.gmail.com>
	 <804ab5d40702091651h6265a510jf5d4ca46cd526876@mail.gmail.com>
	 <a9f269830702091715qfe72360nf4657f0019b858cf@mail.gmail.com>
	 <Pine.OSX.4.64.0702091921170.20722@barry-smiths-computer.local>
	 <Pine.LNX.4.64.0702091940080.3665@asterix>
	 <804ab5d40702100028sf595a2apae8aba2fda9251f3@mail.gmail.com>
	 <804ab5d40702100117i5977f5bh9b161c026f16a32a@mail.gmail.com>
	 <Pine.OSX.4.64.0702101305160.21480@barry-smiths-computer.local>
	 <804ab5d40702101702s71c974d7u39a97d6ab8058cf4@mail.gmail.com>
Message-ID: <e7ba66e40702102003i68c7f3bds40d8343c23616998@mail.gmail.com>

On 2/10/07, Ben Tay <zonexo at gmail.com> wrote:
> In other words, for my CFD code, it is not possible to parallelize it
> effectively because the problem is too small?
>
> Is these true for all parallel solver, or just PETSc? I was hoping to reduce
> the runtime since mine is an unsteady problem which requires many steps to
> reach a periodic state and it takes many hours to reach it.

Can you describe your specific application and how are you solving it?
As Barry said, your need-for-speed is not likely to be solved by
running in parallel.


-- 
Lisandro Dalc?n
---------------
Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
PTLC - G?emes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



From zonexo at gmail.com  Sat Feb 10 23:41:31 2007
From: zonexo at gmail.com (Ben Tay)
Date: Sun, 11 Feb 2007 13:41:31 +0800
Subject: understanding the output from -info
In-Reply-To: <e7ba66e40702102003i68c7f3bds40d8343c23616998@mail.gmail.com>
References: <804ab5d40702080747g69ee8b44h54cce509177ee0a8@mail.gmail.com>
	 <804ab5d40702091651h6265a510jf5d4ca46cd526876@mail.gmail.com>
	 <a9f269830702091715qfe72360nf4657f0019b858cf@mail.gmail.com>
	 <Pine.OSX.4.64.0702091921170.20722@barry-smiths-computer.local>
	 <Pine.LNX.4.64.0702091940080.3665@asterix>
	 <804ab5d40702100028sf595a2apae8aba2fda9251f3@mail.gmail.com>
	 <804ab5d40702100117i5977f5bh9b161c026f16a32a@mail.gmail.com>
	 <Pine.OSX.4.64.0702101305160.21480@barry-smiths-computer.local>
	 <804ab5d40702101702s71c974d7u39a97d6ab8058cf4@mail.gmail.com>
	 <e7ba66e40702102003i68c7f3bds40d8343c23616998@mail.gmail.com>
Message-ID: <804ab5d40702102141s258ef22due0093263f83dc7bb@mail.gmail.com>

Well,

I am simulating unsteady flow past a moving airfoil at Re~10^4. I'm using
fractional step FVM, which means that I need to solve a momentum and poisson
equation.

To reach a periodic state takes quite a few hours and so I'm trying to find
ways to speed up the process. I thought parallelizing the code would help
but it seems like it's not the case.

I'm now trying out different types of solver/preconditioner available on
PETSc to assess their performance. Is there other external solvers, which
PETSc interfaces, which are recommended? I'm thinking of using multigrid to
solve the poisson eqn... wonder if hypre/BoomerAMG etc would help...


On 2/11/07, Lisandro Dalcin <dalcinl at gmail.com> wrote:
>
> On 2/10/07, Ben Tay <zonexo at gmail.com> wrote:
> > In other words, for my CFD code, it is not possible to parallelize it
> > effectively because the problem is too small?
> >
> > Is these true for all parallel solver, or just PETSc? I was hoping to
> reduce
> > the runtime since mine is an unsteady problem which requires many steps
> to
> > reach a periodic state and it takes many hours to reach it.
>
> Can you describe your specific application and how are you solving it?
> As Barry said, your need-for-speed is not likely to be solved by
> running in parallel.
>
>
> --
> Lisandro Dalc?n
> ---------------
> Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
> Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
> Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
> PTLC - G?emes 3450, (3000) Santa Fe, Argentina
> Tel/Fax: +54-(0)342-451.1594
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070211/2cb2a768/attachment.htm>

From bsmith at mcs.anl.gov  Sun Feb 11 10:42:11 2007
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Sun, 11 Feb 2007 10:42:11 -0600 (CST)
Subject: understanding the output from -info
In-Reply-To: <804ab5d40702102141s258ef22due0093263f83dc7bb@mail.gmail.com>
References: <804ab5d40702080747g69ee8b44h54cce509177ee0a8@mail.gmail.com> 
 <804ab5d40702091651h6265a510jf5d4ca46cd526876@mail.gmail.com> 
 <a9f269830702091715qfe72360nf4657f0019b858cf@mail.gmail.com> 
 <Pine.OSX.4.64.0702091921170.20722@barry-smiths-computer.local> 
 <Pine.LNX.4.64.0702091940080.3665@asterix>  <804ab5d40702100028sf595a2apae8aba2fda9251f3@mail.gmail.com>
  <804ab5d40702100117i5977f5bh9b161c026f16a32a@mail.gmail.com> 
 <Pine.OSX.4.64.0702101305160.21480@barry-smiths-computer.local> 
 <804ab5d40702101702s71c974d7u39a97d6ab8058cf4@mail.gmail.com> 
 <e7ba66e40702102003i68c7f3bds40d8343c23616998@mail.gmail.com>
 <804ab5d40702102141s258ef22due0093263f83dc7bb@mail.gmail.com>
Message-ID: <Pine.OSX.4.64.0702111040490.21480@barry-smiths-computer.local>


  hypre/boomeramg may be the way to go, especially for the Poisson
problem. -pc_type hypre -pc_hypre_type boomeramg (-help for lots of
tuning options.).

   Barry


On Sun, 11 Feb 2007, Ben Tay wrote:

> Well,
> 
> I am simulating unsteady flow past a moving airfoil at Re~10^4. I'm using
> fractional step FVM, which means that I need to solve a momentum and poisson
> equation.
> 
> To reach a periodic state takes quite a few hours and so I'm trying to find
> ways to speed up the process. I thought parallelizing the code would help
> but it seems like it's not the case.
> 
> I'm now trying out different types of solver/preconditioner available on
> PETSc to assess their performance. Is there other external solvers, which
> PETSc interfaces, which are recommended? I'm thinking of using multigrid to
> solve the poisson eqn... wonder if hypre/BoomerAMG etc would help...
> 
> 
> On 2/11/07, Lisandro Dalcin <dalcinl at gmail.com> wrote:
> >
> > On 2/10/07, Ben Tay <zonexo at gmail.com> wrote:
> > > In other words, for my CFD code, it is not possible to parallelize it
> > > effectively because the problem is too small?
> > >
> > > Is these true for all parallel solver, or just PETSc? I was hoping to
> > reduce
> > > the runtime since mine is an unsteady problem which requires many steps
> > to
> > > reach a periodic state and it takes many hours to reach it.
> >
> > Can you describe your specific application and how are you solving it?
> > As Barry said, your need-for-speed is not likely to be solved by
> > running in parallel.
> >
> >
> > --
> > Lisandro Dalc??n
> > ---------------
> > Centro Internacional de M??todos Computacionales en Ingenier??a (CIMEC)
> > Instituto de Desarrollo Tecnol??gico para la Industria Qu??mica (INTEC)
> > Consejo Nacional de Investigaciones Cient??ficas y T??cnicas (CONICET)
> > PTLC - G??emes 3450, (3000) Santa Fe, Argentina
> > Tel/Fax: +54-(0)342-451.1594
> >
> >
> 

From zonexo at gmail.com  Sun Feb 11 18:26:26 2007
From: zonexo at gmail.com (Ben Tay)
Date: Mon, 12 Feb 2007 08:26:26 +0800
Subject: understanding the output from -info
In-Reply-To: <Pine.OSX.4.64.0702111040490.21480@barry-smiths-computer.local>
References: <804ab5d40702080747g69ee8b44h54cce509177ee0a8@mail.gmail.com>
	 <Pine.OSX.4.64.0702091921170.20722@barry-smiths-computer.local>
	 <Pine.LNX.4.64.0702091940080.3665@asterix>
	 <804ab5d40702100028sf595a2apae8aba2fda9251f3@mail.gmail.com>
	 <804ab5d40702100117i5977f5bh9b161c026f16a32a@mail.gmail.com>
	 <Pine.OSX.4.64.0702101305160.21480@barry-smiths-computer.local>
	 <804ab5d40702101702s71c974d7u39a97d6ab8058cf4@mail.gmail.com>
	 <e7ba66e40702102003i68c7f3bds40d8343c23616998@mail.gmail.com>
	 <804ab5d40702102141s258ef22due0093263f83dc7bb@mail.gmail.com>
	 <Pine.OSX.4.64.0702111040490.21480@barry-smiths-computer.local>
Message-ID: <804ab5d40702111626p2cbbf495ma954bcda1e3b75e8@mail.gmail.com>

Hi,

I have some questions regarding the use of hypre/boomeramg:

1. Is there anything I need to change in the assembly of matrix etc besides
adding  -pc_type hypre -pc_hypre_type boomeramg ?

2. Can it work in a sequential code?

3. I have 2 eqns to solve - momentum and poisson. if I used the options,
will both equations be solved using hypre? Can I select which solver to
solve with which equation?

Thank you.


On 2/12/07, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
>
> hypre/boomeramg may be the way to go, especially for the Poisson
> problem. -pc_type hypre -pc_hypre_type boomeramg (-help for lots of
> tuning options.).
>
>   Barry
>
>
> On Sun, 11 Feb 2007, Ben Tay wrote:
>
> > Well,
> >
> > I am simulating unsteady flow past a moving airfoil at Re~10^4. I'm
> using
> > fractional step FVM, which means that I need to solve a momentum and
> poisson
> > equation.
> >
> > To reach a periodic state takes quite a few hours and so I'm trying to
> find
> > ways to speed up the process. I thought parallelizing the code would
> help
> > but it seems like it's not the case.
> >
> > I'm now trying out different types of solver/preconditioner available on
> > PETSc to assess their performance. Is there other external solvers,
> which
> > PETSc interfaces, which are recommended? I'm thinking of using multigrid
> to
> > solve the poisson eqn... wonder if hypre/BoomerAMG etc would help...
> >
> >
> > On 2/11/07, Lisandro Dalcin <dalcinl at gmail.com> wrote:
> > >
> > > On 2/10/07, Ben Tay <zonexo at gmail.com> wrote:
> > > > In other words, for my CFD code, it is not possible to parallelize
> it
> > > > effectively because the problem is too small?
> > > >
> > > > Is these true for all parallel solver, or just PETSc? I was hoping
> to
> > > reduce
> > > > the runtime since mine is an unsteady problem which requires many
> steps
> > > to
> > > > reach a periodic state and it takes many hours to reach it.
> > >
> > > Can you describe your specific application and how are you solving it?
> > > As Barry said, your need-for-speed is not likely to be solved by
> > > running in parallel.
> > >
> > >
> > > --
> > > Lisandro Dalc?n
> > > ---------------
> > > Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
> > > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
> > > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
> > > PTLC - G?emes 3450, (3000) Santa Fe, Argentina
> > > Tel/Fax: +54-(0)342-451.1594
> > >
> > >
> >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070212/4c6b63e4/attachment.htm>

From bsmith at mcs.anl.gov  Sun Feb 11 18:50:16 2007
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Sun, 11 Feb 2007 18:50:16 -0600 (CST)
Subject: understanding the output from -info
In-Reply-To: <804ab5d40702111626p2cbbf495ma954bcda1e3b75e8@mail.gmail.com>
References: <804ab5d40702080747g69ee8b44h54cce509177ee0a8@mail.gmail.com> 
 <Pine.OSX.4.64.0702091921170.20722@barry-smiths-computer.local> 
 <Pine.LNX.4.64.0702091940080.3665@asterix>  <804ab5d40702100028sf595a2apae8aba2fda9251f3@mail.gmail.com>
  <804ab5d40702100117i5977f5bh9b161c026f16a32a@mail.gmail.com> 
 <Pine.OSX.4.64.0702101305160.21480@barry-smiths-computer.local> 
 <804ab5d40702101702s71c974d7u39a97d6ab8058cf4@mail.gmail.com> 
 <e7ba66e40702102003i68c7f3bds40d8343c23616998@mail.gmail.com> 
 <804ab5d40702102141s258ef22due0093263f83dc7bb@mail.gmail.com> 
 <Pine.OSX.4.64.0702111040490.21480@barry-smiths-computer.local>
 <804ab5d40702111626p2cbbf495ma954bcda1e3b75e8@mail.gmail.com>
Message-ID: <Pine.OSX.4.64.0702111847410.21480@barry-smiths-computer.local>



On Mon, 12 Feb 2007, Ben Tay wrote:

> Hi,
> 
> I have some questions regarding the use of hypre/boomeramg:
> 
> 1. Is there anything I need to change in the assembly of matrix etc besides
> adding  -pc_type hypre -pc_hypre_type boomeramg ?

   No
> 
> 2. Can it work in a sequential code?

   yes
> 
> 3. I have 2 eqns to solve - momentum and poisson. if I used the options,
> will both equations be solved using hypre?

  yes

>  Can I select which solver to
> solve with which equation?

  yes. For each KSP call KSPSetOptionsPrefix() for example 
KSPSetOptionsPrefix(kspmo,"momentum); KSPSetOptionsPrefix(ksppo,"poisson");
then from the command line use -momentum_ksp_type gmres -poisson_ksp_type cg
-momentum_pc_type lusomething etc. For any solver option.

   Barry

> 
> Thank you.
> 
> 
> On 2/12/07, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> >
> > hypre/boomeramg may be the way to go, especially for the Poisson
> > problem. -pc_type hypre -pc_hypre_type boomeramg (-help for lots of
> > tuning options.).
> >
> >   Barry
> >
> >
> > On Sun, 11 Feb 2007, Ben Tay wrote:
> >
> > > Well,
> > >
> > > I am simulating unsteady flow past a moving airfoil at Re~10^4. I'm
> > using
> > > fractional step FVM, which means that I need to solve a momentum and
> > poisson
> > > equation.
> > >
> > > To reach a periodic state takes quite a few hours and so I'm trying to
> > find
> > > ways to speed up the process. I thought parallelizing the code would
> > help
> > > but it seems like it's not the case.
> > >
> > > I'm now trying out different types of solver/preconditioner available on
> > > PETSc to assess their performance. Is there other external solvers,
> > which
> > > PETSc interfaces, which are recommended? I'm thinking of using multigrid
> > to
> > > solve the poisson eqn... wonder if hypre/BoomerAMG etc would help...
> > >
> > >
> > > On 2/11/07, Lisandro Dalcin <dalcinl at gmail.com> wrote:
> > > >
> > > > On 2/10/07, Ben Tay <zonexo at gmail.com> wrote:
> > > > > In other words, for my CFD code, it is not possible to parallelize
> > it
> > > > > effectively because the problem is too small?
> > > > >
> > > > > Is these true for all parallel solver, or just PETSc? I was hoping
> > to
> > > > reduce
> > > > > the runtime since mine is an unsteady problem which requires many
> > steps
> > > > to
> > > > > reach a periodic state and it takes many hours to reach it.
> > > >
> > > > Can you describe your specific application and how are you solving it?
> > > > As Barry said, your need-for-speed is not likely to be solved by
> > > > running in parallel.
> > > >
> > > >
> > > > --
> > > > Lisandro Dalc??n
> > > > ---------------
> > > > Centro Internacional de M??todos Computacionales en Ingenier??a (CIMEC)
> > > > Instituto de Desarrollo Tecnol??gico para la Industria Qu??mica (INTEC)
> > > > Consejo Nacional de Investigaciones Cient??ficas y T??cnicas (CONICET)
> > > > PTLC - G??emes 3450, (3000) Santa Fe, Argentina
> > > > Tel/Fax: +54-(0)342-451.1594
> > > >
> > > >
> > >
> 

From zonexo at gmail.com  Sun Feb 11 21:21:48 2007
From: zonexo at gmail.com (Ben Tay)
Date: Mon, 12 Feb 2007 11:21:48 +0800
Subject: understanding the output from -info
In-Reply-To: <Pine.OSX.4.64.0702111847410.21480@barry-smiths-computer.local>
References: <804ab5d40702080747g69ee8b44h54cce509177ee0a8@mail.gmail.com>
	 <804ab5d40702100028sf595a2apae8aba2fda9251f3@mail.gmail.com>
	 <804ab5d40702100117i5977f5bh9b161c026f16a32a@mail.gmail.com>
	 <Pine.OSX.4.64.0702101305160.21480@barry-smiths-computer.local>
	 <804ab5d40702101702s71c974d7u39a97d6ab8058cf4@mail.gmail.com>
	 <e7ba66e40702102003i68c7f3bds40d8343c23616998@mail.gmail.com>
	 <804ab5d40702102141s258ef22due0093263f83dc7bb@mail.gmail.com>
	 <Pine.OSX.4.64.0702111040490.21480@barry-smiths-computer.local>
	 <804ab5d40702111626p2cbbf495ma954bcda1e3b75e8@mail.gmail.com>
	 <Pine.OSX.4.64.0702111847410.21480@barry-smiths-computer.local>
Message-ID: <804ab5d40702111921q2767248dte540b04e38a71236@mail.gmail.com>

Hi,

I tried to compile PETSc again and using --download-hypre=1. My command
given is

./config/configure.py --with-vendor-compilers=intel
--with-blas-lapack-dir=/lsftmp/g0306332/inter/mkl/ --wit
h-x=0 --with-shared --with-mpi-dir=/opt/mpich/myrinet/intel/
--with-debugging=0 --download-hypre=1

I tried twice and the same error msg appears:

Downloaded hypre could not be used. Please check install in
/nas/lsftmp/g0306332/petsc-2.3.2-p8/externalpackages/hypre-1.11.1b/linux-hypre.
I've attached the configure.log for your reference.

Thank you.




On 2/12/07, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
>
>
> On Mon, 12 Feb 2007, Ben Tay wrote:
>
> > Hi,
> >
> > I have some questions regarding the use of hypre/boomeramg:
> >
> > 1. Is there anything I need to change in the assembly of matrix etc
> besides
> > adding  -pc_type hypre -pc_hypre_type boomeramg ?
>
>   No
> >
> > 2. Can it work in a sequential code?
>
>   yes
> >
> > 3. I have 2 eqns to solve - momentum and poisson. if I used the options,
> > will both equations be solved using hypre?
>
> yes
>
> >  Can I select which solver to
> > solve with which equation?
>
> yes. For each KSP call KSPSetOptionsPrefix() for example
> KSPSetOptionsPrefix(kspmo,"momentum);
> KSPSetOptionsPrefix(ksppo,"poisson");
> then from the command line use -momentum_ksp_type gmres -poisson_ksp_type
> cg
> -momentum_pc_type lusomething etc. For any solver option.
>
>   Barry
>
> >
> > Thank you.
> >
> >
> > On 2/12/07, Barry Smith <bsmith at mcs.anl.gov> wrote:
> > >
> > >
> > > hypre/boomeramg may be the way to go, especially for the Poisson
> > > problem. -pc_type hypre -pc_hypre_type boomeramg (-help for lots of
> > > tuning options.).
> > >
> > >   Barry
> > >
> > >
> > > On Sun, 11 Feb 2007, Ben Tay wrote:
> > >
> > > > Well,
> > > >
> > > > I am simulating unsteady flow past a moving airfoil at Re~10^4. I'm
> > > using
> > > > fractional step FVM, which means that I need to solve a momentum and
> > > poisson
> > > > equation.
> > > >
> > > > To reach a periodic state takes quite a few hours and so I'm trying
> to
> > > find
> > > > ways to speed up the process. I thought parallelizing the code would
> > > help
> > > > but it seems like it's not the case.
> > > >
> > > > I'm now trying out different types of solver/preconditioner
> available on
> > > > PETSc to assess their performance. Is there other external solvers,
> > > which
> > > > PETSc interfaces, which are recommended? I'm thinking of using
> multigrid
> > > to
> > > > solve the poisson eqn... wonder if hypre/BoomerAMG etc would help...
> > > >
> > > >
> > > > On 2/11/07, Lisandro Dalcin <dalcinl at gmail.com> wrote:
> > > > >
> > > > > On 2/10/07, Ben Tay <zonexo at gmail.com> wrote:
> > > > > > In other words, for my CFD code, it is not possible to
> parallelize
> > > it
> > > > > > effectively because the problem is too small?
> > > > > >
> > > > > > Is these true for all parallel solver, or just PETSc? I was
> hoping
> > > to
> > > > > reduce
> > > > > > the runtime since mine is an unsteady problem which requires
> many
> > > steps
> > > > > to
> > > > > > reach a periodic state and it takes many hours to reach it.
> > > > >
> > > > > Can you describe your specific application and how are you solving
> it?
> > > > > As Barry said, your need-for-speed is not likely to be solved by
> > > > > running in parallel.
> > > > >
> > > > >
> > > > > --
> > > > > Lisandro Dalc?n
> > > > > ---------------
> > > > > Centro Internacional de M?todos Computacionales en Ingenier?a
> (CIMEC)
> > > > > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica
> (INTEC)
> > > > > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas
> (CONICET)
> > > > > PTLC - G?emes 3450, (3000) Santa Fe, Argentina
> > > > > Tel/Fax: +54-(0)342-451.1594
> > > > >
> > > > >
> > > >
> >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070212/5041e2e1/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: configure.log
Type: application/octet-stream
Size: 2817836 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070212/5041e2e1/attachment.obj>

From balay at mcs.anl.gov  Sun Feb 11 21:41:29 2007
From: balay at mcs.anl.gov (Satish Balay)
Date: Sun, 11 Feb 2007 21:41:29 -0600 (CST)
Subject: understanding the output from -info
In-Reply-To: <804ab5d40702111921q2767248dte540b04e38a71236@mail.gmail.com>
References: <804ab5d40702080747g69ee8b44h54cce509177ee0a8@mail.gmail.com> 
 <804ab5d40702100028sf595a2apae8aba2fda9251f3@mail.gmail.com> 
 <804ab5d40702100117i5977f5bh9b161c026f16a32a@mail.gmail.com> 
 <Pine.OSX.4.64.0702101305160.21480@barry-smiths-computer.local> 
 <804ab5d40702101702s71c974d7u39a97d6ab8058cf4@mail.gmail.com> 
 <e7ba66e40702102003i68c7f3bds40d8343c23616998@mail.gmail.com> 
 <804ab5d40702102141s258ef22due0093263f83dc7bb@mail.gmail.com> 
 <Pine.OSX.4.64.0702111040490.21480@barry-smiths-computer.local> 
 <804ab5d40702111626p2cbbf495ma954bcda1e3b75e8@mail.gmail.com> 
 <Pine.OSX.4.64.0702111847410.21480@barry-smiths-computer.local>
 <804ab5d40702111921q2767248dte540b04e38a71236@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0702112137380.6359@asterix>

- If you have build isses [involing sending configure.log] please use
petsc-maint at mcs.anl.gov address [not the mailing list]

- Looks like you were using the following configure options:

--with-cc=/scratch/g0306332/intel/cc/bin/icc
--with-fc=/lsftmp/g0306332/inter/fc/bin/ifort
--with-blas-lapack-dir=/lsftmp/g0306332/inter/mkl/lib/32
--with-mpi=0 --with-x=0 --with-shared

But now - you are not specifing the compilers. The default compiler in
your path must be Intel compilers version 7. Configure breaks with it.
So sugest using the compilers that worked for you before. i.e

--with-cc=/scratch/g0306332/intel/cc/bin/icc
--with-fc=/lsftmp/g0306332/inter/fc/bin/ifort

If you still have problem with hypre - remove
externalpackages/hypre-1.11.1b and retry.

Satish

On Mon, 12 Feb 2007, Ben Tay wrote:

> Hi,
> 
> I tried to compile PETSc again and using --download-hypre=1. My command
> given is
> 
> ./config/configure.py --with-vendor-compilers=intel
> --with-blas-lapack-dir=/lsftmp/g0306332/inter/mkl/ --wit
> h-x=0 --with-shared --with-mpi-dir=/opt/mpich/myrinet/intel/
> --with-debugging=0 --download-hypre=1
> 
> I tried twice and the same error msg appears:
> 
> Downloaded hypre could not be used. Please check install in
> /nas/lsftmp/g0306332/petsc-2.3.2-p8/externalpackages/hypre-1.11.1b/linux-hypre.
> I've attached the configure.log for your reference.
> 
> Thank you.
> 



From dimitri.lecas at c-s.fr  Mon Feb 12 04:57:52 2007
From: dimitri.lecas at c-s.fr (LECAS Dimitri)
Date: Mon, 12 Feb 2007 11:57:52 +0100
Subject: Partitioning on a mpiaij matrix
Message-ID: <6590361b.361b6590@c-s.fr>



----- Original Message -----
From: Barry Smith <bsmith at mcs.anl.gov>
Date: Friday, February 9, 2007 8:09 pm
Subject: Re: Partitioning on a mpiaij matrix

> 
>  MatConvert() checks for a variety of converts; from the code
> 
>    /* 3) See if a good general converter is registered for the 
> desired class */
>    conv = B->ops->convertfrom;
>    ierr = MatDestroy(B);CHKERRQ(ierr);
>    if (conv) goto foundconv;
> 
> now MATMPIADJ has a MatConvertFrom that SHOULD be listed in the 
> function table
> so it should not fall into the default MatConvert_Basic().
> 
>  What version of PETSc are you using? Maybe an older one that does 
> not have
> this converter? If you are using 2.3.2 or petsc-dev you can put a 
> breakpoint in MatConvert() and try to see why it is not picking up 
> the 
> convertfrom function? It is possible some bug that we are not aware of
> but I have difficulty seeing what could be going wrong.
> 
>   Good luck,
> 
>   Barry
> 

I add a line in matrix.c :
/* 3) See if a good general converter is registered for the desired class */
fprintf(stderr, "Breakpoint : %p %p %p\n", B, B->ops, B->ops->convert);
if (!conv) conv = B->ops->convert;
    ierr = MatDestroy(B);CHKERRQ(ierr);
    if (conv) goto foundconv;

The output is :
Breakpoint : 0x11c6670 0x11c6e30 (nil)
[0]PETSC ERROR: --------------------- Error Message
----------------------------
--------
[0]PETSC ERROR: No support for this operation for this object type!
[0]PETSC ERROR: Mat type mpiadj!
[0]PETSC ERROR:
----------------------------------------------------------------
--------
[0]PETSC ERROR: Petsc Release Version 2.3.2, Patch 8, Tue Jan  2
14:33:59 PST 20
07 HG revision: ebeddcedcc065e32fc252af32cf1d01ed4fc7a80

Where is the function that can convert a mpiaij into a mpiadj matrix ?

-- 
Dimitri Lecas



From bsmith at mcs.anl.gov  Mon Feb 12 07:42:04 2007
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Mon, 12 Feb 2007 07:42:04 -0600 (CST)
Subject: Partitioning on a mpiaij matrix
In-Reply-To: <6590361b.361b6590@c-s.fr>
References: <6590361b.361b6590@c-s.fr>
Message-ID: <Pine.OSX.4.64.0702120738450.21480@barry-smiths-computer.local>


  It is convertfrom, not convert you need to check.

  In src/mat/impls/adj/mpi/mpiadj.c MatCreate_MPIAdj
there is the line
  ierr                = PetscMemcpy(B->ops,&MatOps_Values,sizeof(struct _MatOps));CHKERRQ(ierr);
in the MatOps_Values above it there is 
/*60*/ 0,
       MatDestroy_MPIAdj,
       MatView_MPIAdj,
       MatConvertFrom_MPIAdj,
       0,

Therefor the conversion function convertfrom MUST be in the matrix ops table
when the convert is called. But it is not for you, how is this possible?


   Barry

On Mon, 12 Feb 2007, LECAS Dimitri wrote:

> 
> 
> ----- Original Message -----
> From: Barry Smith <bsmith at mcs.anl.gov>
> Date: Friday, February 9, 2007 8:09 pm
> Subject: Re: Partitioning on a mpiaij matrix
> 
> > 
> >  MatConvert() checks for a variety of converts; from the code
> > 
> >    /* 3) See if a good general converter is registered for the 
> > desired class */
> >    conv = B->ops->convertfrom;
> >    ierr = MatDestroy(B);CHKERRQ(ierr);
> >    if (conv) goto foundconv;
> > 
> > now MATMPIADJ has a MatConvertFrom that SHOULD be listed in the 
> > function table
> > so it should not fall into the default MatConvert_Basic().
> > 
> >  What version of PETSc are you using? Maybe an older one that does 
> > not have
> > this converter? If you are using 2.3.2 or petsc-dev you can put a 
> > breakpoint in MatConvert() and try to see why it is not picking up 
> > the 
> > convertfrom function? It is possible some bug that we are not aware of
> > but I have difficulty seeing what could be going wrong.
> > 
> >   Good luck,
> > 
> >   Barry
> > 
> 
> I add a line in matrix.c :
> /* 3) See if a good general converter is registered for the desired class */
> fprintf(stderr, "Breakpoint : %p %p %p\n", B, B->ops, B->ops->convert);
> if (!conv) conv = B->ops->convert;
>     ierr = MatDestroy(B);CHKERRQ(ierr);
>     if (conv) goto foundconv;
> 
> The output is :
> Breakpoint : 0x11c6670 0x11c6e30 (nil)
> [0]PETSC ERROR: --------------------- Error Message
> ----------------------------
> --------
> [0]PETSC ERROR: No support for this operation for this object type!
> [0]PETSC ERROR: Mat type mpiadj!
> [0]PETSC ERROR:
> ----------------------------------------------------------------
> --------
> [0]PETSC ERROR: Petsc Release Version 2.3.2, Patch 8, Tue Jan  2
> 14:33:59 PST 20
> 07 HG revision: ebeddcedcc065e32fc252af32cf1d01ed4fc7a80
> 
> Where is the function that can convert a mpiaij into a mpiadj matrix ?
> 
> 



From zonexo at gmail.com  Mon Feb 12 09:19:32 2007
From: zonexo at gmail.com (Ben Tay)
Date: Mon, 12 Feb 2007 23:19:32 +0800
Subject: External software help
Message-ID: <804ab5d40702120719ueaad01cy655372f7dbcd5d73@mail.gmail.com>

Hi,

I'm trying to experiment with using external solvers. I have some questions:

1. Is there any difference in speed with calling the external software from
PETSc or directly using them?

2. I tried to install MUMPS using  --download-mumps. I was prompted to
include --with-scalapack. After changing, I was again prompted to include
--with-blacs. I changed and again I was told the need for
--with-blacs-dir=<directory>. I thought I'm supposed to specify where to
install blacs and entered a directory. But it seems that I need to specify
the location of where blacs is. But I do not have it. So how do I solve
that?

3. I installed some other external packages. I wanted to test their speed at
solving equations. In the manual, I was told to use the runtime option
-mat_type <mattype> -ksp_type preonly -pc_type <pctype> and also -help to
get help msg. However when I tried to issue ./a.out -mat_type superlu
-ksp_type preonly -pc_type lu, nothing happened. How should the command be
issued? I tried to get help by running ./a.out -h what appears isn't what I
want.

Thank you very much. Regards.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070212/2965ffaf/attachment.htm>

From balay at mcs.anl.gov  Mon Feb 12 09:27:27 2007
From: balay at mcs.anl.gov (Satish Balay)
Date: Mon, 12 Feb 2007 09:27:27 -0600 (CST)
Subject: External software help
In-Reply-To: <804ab5d40702120719ueaad01cy655372f7dbcd5d73@mail.gmail.com>
References: <804ab5d40702120719ueaad01cy655372f7dbcd5d73@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0702120923050.6359@asterix>

On Mon, 12 Feb 2007, Ben Tay wrote:

> Hi,
> 
> I'm trying to experiment with using external solvers. I have some questions:
> 
> 1. Is there any difference in speed with calling the external software from
> PETSc or directly using them?

There is minor conversion overhead when you use them from PETSc.

> 
> 2. I tried to install MUMPS using  --download-mumps. I was prompted to
> include --with-scalapack. After changing, I was again prompted to include
> --with-blacs. I changed and again I was told the need for
> --with-blacs-dir=<directory>. I thought I'm supposed to specify where to
> install blacs and entered a directory. But it seems that I need to specify
> the location of where blacs is. But I do not have it. So how do I solve
> that?

Mumps requires blacs & scalapack. So use:
--download-blacs=1 --download-scalapack=1 --download-mumps=1

> 
> 3. I installed some other external packages. I wanted to test their speed at
> solving equations. In the manual, I was told to use the runtime option
> -mat_type <mattype> -ksp_type preonly -pc_type <pctype> and also -help to
> get help msg. However when I tried to issue ./a.out -mat_type superlu
> -ksp_type preonly -pc_type lu, nothing happened. How should the command be
> issued? I tried to get help by running ./a.out -h what appears isn't what I
> want.

Did you install PETSc with superlu_dist? If so use '-mat_type superlu_dist'

[Note: superlu & superlu_dist are different packages - the first one
is sequential - the second one is parallel]

Satish

> 
> Thank you very much. Regards.
> 



From zonexo at gmail.com  Mon Feb 12 09:53:35 2007
From: zonexo at gmail.com (Ben Tay)
Date: Mon, 12 Feb 2007 23:53:35 +0800
Subject: External software help
In-Reply-To: <Pine.LNX.4.64.0702120923050.6359@asterix>
References: <804ab5d40702120719ueaad01cy655372f7dbcd5d73@mail.gmail.com>
	 <Pine.LNX.4.64.0702120923050.6359@asterix>
Message-ID: <804ab5d40702120753j5f227637m70aae8e71a929e68@mail.gmail.com>

Hi Satish,

I've installed superlu. I issued the command ./a.out  -mat_type
superlu -ksp_type preonly -pc_type lu and it just hanged there. Is it
because I had install it with mpich? I also wanted to try umfpack and
plapack. Is it similar?

Btw, plapack 's option isn't in pg 82 of the manual.


Thank you.


On 2/12/07, Satish Balay <balay at mcs.anl.gov> wrote:
>
> On Mon, 12 Feb 2007, Ben Tay wrote:
>
> > Hi,
> >
> > I'm trying to experiment with using external solvers. I have some
> questions:
> >
> > 1. Is there any difference in speed with calling the external software
> from
> > PETSc or directly using them?
>
> There is minor conversion overhead when you use them from PETSc.
>
> >
> > 2. I tried to install MUMPS using  --download-mumps. I was prompted to
> > include --with-scalapack. After changing, I was again prompted to
> include
> > --with-blacs. I changed and again I was told the need for
> > --with-blacs-dir=<directory>. I thought I'm supposed to specify where to
> > install blacs and entered a directory. But it seems that I need to
> specify
> > the location of where blacs is. But I do not have it. So how do I solve
> > that?
>
> Mumps requires blacs & scalapack. So use:
> --download-blacs=1 --download-scalapack=1 --download-mumps=1
>
> >
> > 3. I installed some other external packages. I wanted to test their
> speed at
> > solving equations. In the manual, I was told to use the runtime option
> > -mat_type <mattype> -ksp_type preonly -pc_type <pctype> and also -help
> to
> > get help msg. However when I tried to issue ./a.out -mat_type superlu
> > -ksp_type preonly -pc_type lu, nothing happened. How should the command
> be
> > issued? I tried to get help by running ./a.out -h what appears isn't
> what I
> > want.
>
> Did you install PETSc with superlu_dist? If so use '-mat_type
> superlu_dist'
>
> [Note: superlu & superlu_dist are different packages - the first one
> is sequential - the second one is parallel]
>
> Satish
>
> >
> > Thank you very much. Regards.
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070212/0eaad54a/attachment.htm>

From zonexo at gmail.com  Mon Feb 12 10:03:19 2007
From: zonexo at gmail.com (Ben Tay)
Date: Tue, 13 Feb 2007 00:03:19 +0800
Subject: External software help
In-Reply-To: <804ab5d40702120753j5f227637m70aae8e71a929e68@mail.gmail.com>
References: <804ab5d40702120719ueaad01cy655372f7dbcd5d73@mail.gmail.com>
	 <Pine.LNX.4.64.0702120923050.6359@asterix>
	 <804ab5d40702120753j5f227637m70aae8e71a929e68@mail.gmail.com>
Message-ID: <804ab5d40702120803n66ece10cs927a178a20ec366d@mail.gmail.com>

Btw, how is Trilinos/ML <http://software.sandia.gov/trilinos/> used and
installed? Is the command to download also
--download*-*Trilinos/ML<http://software.sandia.gov/trilinos/> ?
waht about the command to use it?

Thank you.


On 2/12/07, Ben Tay <zonexo at gmail.com> wrote:
>
> Hi Satish,
>
> I've installed superlu. I issued the command ./a.out  -mat_type
> superlu -ksp_type preonly -pc_type lu and it just hanged there. Is it
> because I had install it with mpich? I also wanted to try umfpack and
> plapack. Is it similar?
>
> Btw, plapack 's option isn't in pg 82 of the manual.
>
>
> Thank you.
>
>
>  On 2/12/07, Satish Balay <balay at mcs.anl.gov> wrote:
> >
> > On Mon, 12 Feb 2007, Ben Tay wrote:
> >
> > > Hi,
> > >
> > > I'm trying to experiment with using external solvers. I have some
> > questions:
> > >
> > > 1. Is there any difference in speed with calling the external software
> > from
> > > PETSc or directly using them?
> >
> > There is minor conversion overhead when you use them from PETSc.
> >
> > >
> > > 2. I tried to install MUMPS using  --download-mumps. I was prompted to
> >
> > > include --with-scalapack. After changing, I was again prompted to
> > include
> > > --with-blacs. I changed and again I was told the need for
> > > --with-blacs-dir=<directory>. I thought I'm supposed to specify where
> > to
> > > install blacs and entered a directory. But it seems that I need to
> > specify
> > > the location of where blacs is. But I do not have it. So how do I
> > solve
> > > that?
> >
> > Mumps requires blacs & scalapack. So use:
> > --download-blacs=1 --download-scalapack=1 --download-mumps=1
> >
> > >
> > > 3. I installed some other external packages. I wanted to test their
> > speed at
> > > solving equations. In the manual, I was told to use the runtime option
> >
> > > -mat_type <mattype> -ksp_type preonly -pc_type <pctype> and also -help
> > to
> > > get help msg. However when I tried to issue ./a.out -mat_type superlu
> > > -ksp_type preonly -pc_type lu, nothing happened. How should the
> > command be
> > > issued? I tried to get help by running ./a.out -h what appears isn't
> > what I
> > > want.
> >
> > Did you install PETSc with superlu_dist? If so use '-mat_type
> > superlu_dist'
> >
> > [Note: superlu & superlu_dist are different packages - the first one
> > is sequential - the second one is parallel]
> >
> > Satish
> >
> > >
> > > Thank you very much. Regards.
> > >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070213/6ff8fb8f/attachment.htm>

From balay at mcs.anl.gov  Mon Feb 12 10:08:59 2007
From: balay at mcs.anl.gov (Satish Balay)
Date: Mon, 12 Feb 2007 10:08:59 -0600 (CST)
Subject: A 3D example of KSPSolve?
In-Reply-To: <918742.70603.qm@web36202.mail.mud.yahoo.com>
References: <918742.70603.qm@web36202.mail.mud.yahoo.com>
Message-ID: <Pine.LNX.4.64.0702120953590.6359@asterix>

Well some how the inbalance comes up in your application run - but not
in the test example. It is possible that the application stresses your
machine/memory-subsytem a lot more than the test code.

Your machine has a NUMA [Non-unimform memory access] - so some
messages are local [if the memory is local - and others can take
atleast 3 hops trhough the AMD memory/hypertransport network. I was
assuming the delays due to multiple hops might show up in this test
runs I requested. [but it does not].

So perhaps these multiple hops cause delays only when the memort
network gets stressed - as with your application?

http://www.thg.ru/cpu/20040929/images/opteron_8way.gif

I guess we'll just have to use your app to benchmark. Earlier I
sugested using latest mpich with '--device=ch3:sshm'. Another option
to try is '--with-device=ch3:nemesis'

To do these experiments - you can build different versions of PETSc
[so that you can switch between them all]. i.e use a different value
for PETSC_ARCH for each build:

It is possible that some of the load imbalance happens before the
communication stages - but its visible only in the scatter state [in
log_summary]. So to get a better idea on this - we'll need a Barrier
in VecScatterBegin(). Not sure how to do this.

Barry: does -log_sync add a barrier in vecscatter?

Also - can you confirm that no-one-else/no-other-application is using
this machine when you perform these measurement runs?

Satish

On Sat, 10 Feb 2007, Shi Jin wrote:

> Furthermore, I did a multi-process test on the SMP.
> petscmpirun -n 3 taskset -c 0,2,4 ./ex2 -ksp_type cg
> -log_summary | egrep \(MPI_Send\|MPI_Barrier\)
> Average time for MPI_Barrier(): 4.19617e-06
> Average time for zero size MPI_Send(): 3.65575e-06
> 
>  petscmpirun -n 4 taskset -c 0,2,4,6 ./ex2 -ksp_type
> cg -log_summary | egrep \(MPI_Send\|MPI_Barrier\)
> Average time for MPI_Barrier(): 1.75953e-05
> Average time for zero size MPI_Send(): 2.44975e-05
> 
>  petscmpirun -n 5 taskset -c 0,2,4,6,8 ./ex2 -ksp_type
> cg -log_summary | egrep \(MPI_Send\|MPI_Barrier\)
> Average time for MPI_Barrier(): 4.22001e-05
> Average time for zero size MPI_Send(): 2.54154e-05
> 
> petscmpirun -n 6 taskset -c 0,2,4,6,8,10 ./ex2
> -ksp_type cg -log_summary | egrep
> \(MPI_Send\|MPI_Barrier\)
> Average time for MPI_Barrier(): 4.87804e-05
> Average time for zero size MPI_Send(): 1.83185e-05
> 
> petscmpirun -n 7 taskset -c 0,2,4,6,8,10,12 ./ex2
> -ksp_type cg -log_summary | egrep
> \(MPI_Send\|MPI_Barrier\)
> Average time for MPI_Barrier(): 2.37942e-05
> Average time for zero size MPI_Send(): 5.00679e-06
> 
> petscmpirun -n 8 taskset -c 0,2,4,6,8,10,12,14 ./ex2
> -ksp_type cg -log_summary | egrep
> \(MPI_Send\|MPI_Barrier\)
> Average time for MPI_Barrier(): 1.35899e-05
> Average time for zero size MPI_Send(): 6.73532e-06
> 
> They all seem quite fast.
> Shi
> 
> --- Shi Jin <jinzishuai at yahoo.com> wrote:
> 
> > Yes. The results follow.
> > --- Satish Balay <balay at mcs.anl.gov> wrote:
> > 
> > > Can you send the optupt from the following runs.
> > You
> > > can do this with
> > > src/ksp/ksp/examples/tutorials/ex2.c - to keep
> > > things simple.
> > > 
> > > petscmpirun -n 2 taskset -c 0,2 ./ex2 -log_summary
> > |
> > > egrep \(MPI_Send\|MPI_Barrier\)
> > Average time for MPI_Barrier(): 1.81198e-06
> > Average time for zero size MPI_Send(): 5.00679e-06
> > > petscmpirun -n 2 taskset -c 0,4 ./ex2 -log_summary
> > |
> > > egrep \(MPI_Send\|MPI_Barrier\)
> > Average time for MPI_Barrier(): 2.00272e-06
> > Average time for zero size MPI_Send(): 4.05312e-06
> > > petscmpirun -n 2 taskset -c 0,6 ./ex2 -log_summary
> > |
> > > egrep \(MPI_Send\|MPI_Barrier\)
> > Average time for MPI_Barrier(): 1.7643e-06
> > Average time for zero size MPI_Send(): 4.05312e-06
> > > petscmpirun -n 2 taskset -c 0,8 ./ex2 -log_summary
> > |
> > > egrep \(MPI_Send\|MPI_Barrier\)
> > Average time for MPI_Barrier(): 2.00272e-06
> > Average time for zero size MPI_Send(): 4.05312e-06
> > > petscmpirun -n 2 taskset -c 0,12 ./ex2
> > -log_summary
> > > | egrep \(MPI_Send\|MPI_Barrier\)
> > Average time for MPI_Barrier(): 1.57356e-06
> > Average time for zero size MPI_Send(): 5.48363e-06
> > > petscmpirun -n 2 taskset -c 0,14 ./ex2
> > -log_summary
> > > | egrep \(MPI_Send\|MPI_Barrier\)
> > Average time for MPI_Barrier(): 2.00272e-06
> > Average time for zero size MPI_Send(): 4.52995e-06
> > I also did 
> >  petscmpirun -n 2 taskset -c 0,10 ./ex2 -log_summary
> > |
> > egrep \(MPI_Send\|MPI_Barrier\)
> > Average time for MPI_Barrier(): 5.00679e-06
> > Average time for zero size MPI_Send(): 3.93391e-06
> > 
> > 
> > The results are not so different from each other.
> > Also
> > please note, the timing is not exact, some times I
> > got
> > O(1e-5) timings for all cases.
> > I assume these numbers are pretty good, right? Does
> > it
> > indicate that the MPI communication on a SMP machine
> > is very fast?
> > I will do a similar test on a cluster and report it
> > back to the list.
> > 
> > Shi
> > 
> > 
> > 
> > 
> >  
> >
> ____________________________________________________________________________________
> > Need Mail bonding?
> > Go to the Yahoo! Mail Q&A for great tips from Yahoo!
> > Answers users.
> >
> http://answers.yahoo.com/dir/?link=list&sid=396546091
> > 
> > 
> 
> 
> 
>  
> ____________________________________________________________________________________
> Yahoo! Music Unlimited
> Access over 1 million songs.
> http://music.yahoo.com/unlimited
> 
> 



From balay at mcs.anl.gov  Mon Feb 12 10:14:52 2007
From: balay at mcs.anl.gov (Satish Balay)
Date: Mon, 12 Feb 2007 10:14:52 -0600 (CST)
Subject: External software help
In-Reply-To: <804ab5d40702120753j5f227637m70aae8e71a929e68@mail.gmail.com>
References: <804ab5d40702120719ueaad01cy655372f7dbcd5d73@mail.gmail.com> 
 <Pine.LNX.4.64.0702120923050.6359@asterix> <804ab5d40702120753j5f227637m70aae8e71a929e68@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0702121009520.6359@asterix>

On Mon, 12 Feb 2007, Ben Tay wrote:

> Hi Satish,
> 
> I've installed superlu. I issued the command ./a.out  -mat_type
> superlu -ksp_type preonly -pc_type lu and it just hanged there.

Did you install superlu separately? Sugest installing with PETSc
configure option '--download-superlu=1.

> Is it because I had install it with mpich?

No - its because superlu includes some blas code - that will hang if
compiled 'with -O' - esp with intel compilers. PETSc configure handles
this correctly.

> I also wanted to try umfpack and plapack. Is it similar?
> Btw, plapack 's option isn't in pg 82 of the manual.

I believe plapack is for parallel dense usage - so perhaps its not
appropriate for your usage..

Satish



From hzhang at mcs.anl.gov  Mon Feb 12 10:16:28 2007
From: hzhang at mcs.anl.gov (Hong Zhang)
Date: Mon, 12 Feb 2007 10:16:28 -0600 (CST)
Subject: External software help
In-Reply-To: <804ab5d40702120753j5f227637m70aae8e71a929e68@mail.gmail.com>
References: <804ab5d40702120719ueaad01cy655372f7dbcd5d73@mail.gmail.com> 
 <Pine.LNX.4.64.0702120923050.6359@asterix> <804ab5d40702120753j5f227637m70aae8e71a929e68@mail.gmail.com>
Message-ID: <Pine.LNX.4.58.0702121011550.1870@terra.mcs.anl.gov>


You may test the installation of superlu using
petsc example src/ksp/ksp/examples/tutorials/ex5.c:
e.g.,
./ex5 -ksp_type preonly -pc_type lu -mat_type superlu -ksp_view | more
KSP Object:
  type: preonly
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
  left preconditioning
PC Object:
  type: lu
    LU: out-of-place factorization
      matrix ordering: nd
    LU: tolerance for zero pivot 1e-12
    LU: factor fill ratio needed 0
         Factored matrix follows
        Matrix Object:
          type=superlu, rows=6, cols=6
          total: nonzeros=0, allocated nonzeros=6
            not using I-node routines
            SuperLU run parameters:
              Equil: NO
              ColPerm: 3
              IterRefine: 0
              SymmetricMode: NO
              DiagPivotThresh: 1
              PivotGrowth: NO
              ConditionNumber: NO
              RowPerm: 0
              ReplaceTinyPivot: NO
              PrintStat: NO
              lwork: 0
  linear system matrix = precond matrix:
  Matrix Object:
    type=superlu, rows=6, cols=6
    total: nonzeros=20, allocated nonzeros=30
      not using I-node routines
Norm of error < 1.e-12, Iterations 1
KSP Object:
...

> I've installed superlu. I issued the command ./a.out  -mat_type
> superlu -ksp_type preonly -pc_type lu and it just hanged there. Is it
> because I had install it with mpich? I also wanted to try umfpack and
> plapack. Is it similar?
>
> Btw, plapack 's option isn't in pg 82 of the manual.

I'll add it. Thanks,

Hong

>
>
> Thank you.
>
>
> On 2/12/07, Satish Balay <balay at mcs.anl.gov> wrote:
> >
> > On Mon, 12 Feb 2007, Ben Tay wrote:
> >
> > > Hi,
> > >
> > > I'm trying to experiment with using external solvers. I have some
> > questions:
> > >
> > > 1. Is there any difference in speed with calling the external software
> > from
> > > PETSc or directly using them?
> >
> > There is minor conversion overhead when you use them from PETSc.
> >
> > >
> > > 2. I tried to install MUMPS using  --download-mumps. I was prompted to
> > > include --with-scalapack. After changing, I was again prompted to
> > include
> > > --with-blacs. I changed and again I was told the need for
> > > --with-blacs-dir=<directory>. I thought I'm supposed to specify where to
> > > install blacs and entered a directory. But it seems that I need to
> > specify
> > > the location of where blacs is. But I do not have it. So how do I solve
> > > that?
> >
> > Mumps requires blacs & scalapack. So use:
> > --download-blacs=1 --download-scalapack=1 --download-mumps=1
> >
> > >
> > > 3. I installed some other external packages. I wanted to test their
> > speed at
> > > solving equations. In the manual, I was told to use the runtime option
> > > -mat_type <mattype> -ksp_type preonly -pc_type <pctype> and also -help
> > to
> > > get help msg. However when I tried to issue ./a.out -mat_type superlu
> > > -ksp_type preonly -pc_type lu, nothing happened. How should the command
> > be
> > > issued? I tried to get help by running ./a.out -h what appears isn't
> > what I
> > > want.
> >
> > Did you install PETSc with superlu_dist? If so use '-mat_type
> > superlu_dist'
> >
> > [Note: superlu & superlu_dist are different packages - the first one
> > is sequential - the second one is parallel]
> >
> > Satish
> >
> > >
> > > Thank you very much. Regards.
> > >
> >
> >
>



From balay at mcs.anl.gov  Mon Feb 12 10:22:16 2007
From: balay at mcs.anl.gov (Satish Balay)
Date: Mon, 12 Feb 2007 10:22:16 -0600 (CST)
Subject: External software help
In-Reply-To: <804ab5d40702120803n66ece10cs927a178a20ec366d@mail.gmail.com>
References: <804ab5d40702120719ueaad01cy655372f7dbcd5d73@mail.gmail.com> 
 <Pine.LNX.4.64.0702120923050.6359@asterix> 
 <804ab5d40702120753j5f227637m70aae8e71a929e68@mail.gmail.com>
 <804ab5d40702120803n66ece10cs927a178a20ec366d@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0702121020370.6359@asterix>

On Tue, 13 Feb 2007, Ben Tay wrote:

> Btw, how is Trilinos/ML <http://software.sandia.gov/trilinos/> used
> and installed? Is the command to download also
> --download*-*Trilinos/ML<http://software.sandia.gov/trilinos/> ?
> waht about the command to use it?

To install ML - use: --dowload-ml=1 Usage is: '-pc_type ml'

Satish



From dimitri.lecas at c-s.fr  Mon Feb 12 10:44:03 2007
From: dimitri.lecas at c-s.fr (LECAS Dimitri)
Date: Mon, 12 Feb 2007 17:44:03 +0100
Subject: Partitioning on a mpiaij matrix
Message-ID: <a1ba8000.8000a1ba@c-s.fr>



----- Original Message -----
From: Barry Smith <bsmith at mcs.anl.gov>
Date: Monday, February 12, 2007 2:42 pm
Subject: Re: Partitioning on a mpiaij matrix

> 
>  It is convertfrom, not convert you need to check.
> 
>  In src/mat/impls/adj/mpi/mpiadj.c MatCreate_MPIAdj
> there is the line
>  ierr                = PetscMemcpy(B-
> >ops,&MatOps_Values,sizeof(struct _MatOps));CHKERRQ(ierr);
> in the MatOps_Values above it there is 
> /*60*/ 0,
>       MatDestroy_MPIAdj,
>       MatView_MPIAdj,
>       MatConvertFrom_MPIAdj,
>       0,
> 
> Therefor the conversion function convertfrom MUST be in the matrix 
> ops table
> when the convert is called. But it is not for you, how is this 
> possible?
> 
>   Barry
> 

I made some progress but it's not very comprehensible

It's seems there is a problem with parmetis. When i compile petsc
without parmetis, this line don't give an error.

CALL Matconvert(mat,MATMPIADJ,MAT_INITIAL_MATRIX,mat2,ierr)
CHKERRQ(ierr)

With mat created with  
CALL MatCreateMPIAIJ (PETSC_COMM_WORLD, partTab(rk+1), partTab(rk+1), N,
N, 30  , PETSC_NULL_INTEGER, 30, PETSC_NULL_INTEGER, mat, ierr)

But, when petsc is compiled with parmetis, the call to MatConvert give
the error 
[0]PETSC ERROR: No support for this operation for this object type!
[0]PETSC ERROR: Mat type mpiadj!

-- 
Dimitri Lecas



From bhatiamanav at gmail.com  Mon Feb 12 17:26:33 2007
From: bhatiamanav at gmail.com (Manav Bhatia)
Date: Mon, 12 Feb 2007 15:26:33 -0800
Subject: nonlinear solvers
Message-ID: <A9144CC1-2969-4CC6-9BF8-CDEA16C15DBE@gmail.com>

Hi,

   I am using the nonlinear solvers in Petsc. My application requires  
the jacobian at the final nonlinear solution, since after the  
nonlinear solution I solve a linear system of equations with the  
jacobian as the system matrix.
   I am curious to know if it is safe to assume that for all  
nonlinear solvers in Petsc, the last jacobian used before convergence  
is same as the jacobian evaluated at the final solution. If this is  
the case, then I will not need to evaluate the jacobian again,  
otherwise, I will need to compute it again after the final solution.

   Kindly help me with your comments.

Thanks,
Manav



From knepley at gmail.com  Mon Feb 12 21:23:49 2007
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 12 Feb 2007 21:23:49 -0600
Subject: nonlinear solvers
In-Reply-To: <A9144CC1-2969-4CC6-9BF8-CDEA16C15DBE@gmail.com>
References: <A9144CC1-2969-4CC6-9BF8-CDEA16C15DBE@gmail.com>
Message-ID: <a9f269830702121923g7b5ce795md12682af596f4882@mail.gmail.com>

On 2/12/07, Manav Bhatia <bhatiamanav at gmail.com> wrote:
> Hi,
>
>    I am using the nonlinear solvers in Petsc. My application requires
> the jacobian at the final nonlinear solution, since after the
> nonlinear solution I solve a linear system of equations with the
> jacobian as the system matrix.
>    I am curious to know if it is safe to assume that for all
> nonlinear solvers in Petsc, the last jacobian used before convergence
> is same as the jacobian evaluated at the final solution. If this is
> the case, then I will not need to evaluate the jacobian again,
> otherwise, I will need to compute it again after the final solution.

Never. We solve the Newton Equation, update the solution, and
THEN check for convergence. The Jacobian would not be updated.

   Matt

>    Kindly help me with your comments.
>
> Thanks,
> Manav
>
>


-- 
One trouble is that despite this system, anyone who reads journals widely
and critically is forced to realize that there are scarcely any bars to eventual
publication. There seems to be no study too fragmented, no hypothesis too
trivial, no literature citation too biased or too egotistical, no design too
warped, no methodology too bungled, no presentation of results too
inaccurate, too obscure, and too contradictory, no analysis too self-serving,
no argument too circular, no conclusions too trifling or too unjustified, and
no grammar and syntax too offensive for a paper to end up in print. --
Drummond Rennie



From jinzishuai at yahoo.com  Mon Feb 12 22:22:14 2007
From: jinzishuai at yahoo.com (Shi Jin)
Date: Mon, 12 Feb 2007 20:22:14 -0800 (PST)
Subject: A 3D example of KSPSolve?
In-Reply-To: <Pine.LNX.4.64.0702120953590.6359@asterix>
Message-ID: <20070213042214.67470.qmail@web36215.mail.mud.yahoo.com>

Thank you Satish.
I cannot say that no one is using that machine when I
ran. But I made sure that when I use taskset, the
processors are exclusively mine. The total number of
running jobs is always smaller than the available
runs.

I think I will stop benchmarking the SMP machine for
the time being and focus my concentration on the code
to run on a distributed memory cluster. I think it is
very likely I can make some improvement to the
existing  code by tuning the linear solver and
preconditioner. I am starting another thread on how to
use the incomplete  cholesky docomposition (ICC) as a
preconditioner for my congugate gradient method. 

When I am satisfied with the code and its performance
on a cluster, I will revisit the SMP issue so that we
might achieve better performance when the number of
processes is not too large (<-8).

Thank you very much.

Shi

T
Shi
--- Satish Balay <balay at mcs.anl.gov> wrote:

> Well some how the inbalance comes up in your
> application run - but not
> in the test example. It is possible that the
> application stresses your
> machine/memory-subsytem a lot more than the test
> code.
> 
> Your machine has a NUMA [Non-unimform memory access]
> - so some
> messages are local [if the memory is local - and
> others can take
> atleast 3 hops trhough the AMD memory/hypertransport
> network. I was
> assuming the delays due to multiple hops might show
> up in this test
> runs I requested. [but it does not].
> 
> So perhaps these multiple hops cause delays only
> when the memort
> network gets stressed - as with your application?
> 
>
http://www.thg.ru/cpu/20040929/images/opteron_8way.gif
> 
> I guess we'll just have to use your app to
> benchmark. Earlier I
> sugested using latest mpich with
> '--device=ch3:sshm'. Another option
> to try is '--with-device=ch3:nemesis'
> 
> To do these experiments - you can build different
> versions of PETSc
> [so that you can switch between them all]. i.e use a
> different value
> for PETSC_ARCH for each build:
> 
> It is possible that some of the load imbalance
> happens before the
> communication stages - but its visible only in the
> scatter state [in
> log_summary]. So to get a better idea on this -
> we'll need a Barrier
> in VecScatterBegin(). Not sure how to do this.
> 
> Barry: does -log_sync add a barrier in vecscatter?
> 
> Also - can you confirm that
> no-one-else/no-other-application is using
> this machine when you perform these measurement
> runs?
> 
> Satish
> 
> On Sat, 10 Feb 2007, Shi Jin wrote:
> 
> > Furthermore, I did a multi-process test on the
> SMP.
> > petscmpirun -n 3 taskset -c 0,2,4 ./ex2 -ksp_type
> cg
> > -log_summary | egrep \(MPI_Send\|MPI_Barrier\)
> > Average time for MPI_Barrier(): 4.19617e-06
> > Average time for zero size MPI_Send(): 3.65575e-06
> > 
> >  petscmpirun -n 4 taskset -c 0,2,4,6 ./ex2
> -ksp_type
> > cg -log_summary | egrep \(MPI_Send\|MPI_Barrier\)
> > Average time for MPI_Barrier(): 1.75953e-05
> > Average time for zero size MPI_Send(): 2.44975e-05
> > 
> >  petscmpirun -n 5 taskset -c 0,2,4,6,8 ./ex2
> -ksp_type
> > cg -log_summary | egrep \(MPI_Send\|MPI_Barrier\)
> > Average time for MPI_Barrier(): 4.22001e-05
> > Average time for zero size MPI_Send(): 2.54154e-05
> > 
> > petscmpirun -n 6 taskset -c 0,2,4,6,8,10 ./ex2
> > -ksp_type cg -log_summary | egrep
> > \(MPI_Send\|MPI_Barrier\)
> > Average time for MPI_Barrier(): 4.87804e-05
> > Average time for zero size MPI_Send(): 1.83185e-05
> > 
> > petscmpirun -n 7 taskset -c 0,2,4,6,8,10,12 ./ex2
> > -ksp_type cg -log_summary | egrep
> > \(MPI_Send\|MPI_Barrier\)
> > Average time for MPI_Barrier(): 2.37942e-05
> > Average time for zero size MPI_Send(): 5.00679e-06
> > 
> > petscmpirun -n 8 taskset -c 0,2,4,6,8,10,12,14
> ./ex2
> > -ksp_type cg -log_summary | egrep
> > \(MPI_Send\|MPI_Barrier\)
> > Average time for MPI_Barrier(): 1.35899e-05
> > Average time for zero size MPI_Send(): 6.73532e-06
> > 
> > They all seem quite fast.
> > Shi
> > 
> > --- Shi Jin <jinzishuai at yahoo.com> wrote:
> > 
> > > Yes. The results follow.
> > > --- Satish Balay <balay at mcs.anl.gov> wrote:
> > > 
> > > > Can you send the optupt from the following
> runs.
> > > You
> > > > can do this with
> > > > src/ksp/ksp/examples/tutorials/ex2.c - to keep
> > > > things simple.
> > > > 
> > > > petscmpirun -n 2 taskset -c 0,2 ./ex2
> -log_summary
> > > |
> > > > egrep \(MPI_Send\|MPI_Barrier\)
> > > Average time for MPI_Barrier(): 1.81198e-06
> > > Average time for zero size MPI_Send():
> 5.00679e-06
> > > > petscmpirun -n 2 taskset -c 0,4 ./ex2
> -log_summary
> > > |
> > > > egrep \(MPI_Send\|MPI_Barrier\)
> > > Average time for MPI_Barrier(): 2.00272e-06
> > > Average time for zero size MPI_Send():
> 4.05312e-06
> > > > petscmpirun -n 2 taskset -c 0,6 ./ex2
> -log_summary
> > > |
> > > > egrep \(MPI_Send\|MPI_Barrier\)
> > > Average time for MPI_Barrier(): 1.7643e-06
> > > Average time for zero size MPI_Send():
> 4.05312e-06
> > > > petscmpirun -n 2 taskset -c 0,8 ./ex2
> -log_summary
> > > |
> > > > egrep \(MPI_Send\|MPI_Barrier\)
> > > Average time for MPI_Barrier(): 2.00272e-06
> > > Average time for zero size MPI_Send():
> 4.05312e-06
> > > > petscmpirun -n 2 taskset -c 0,12 ./ex2
> > > -log_summary
> > > > | egrep \(MPI_Send\|MPI_Barrier\)
> > > Average time for MPI_Barrier(): 1.57356e-06
> > > Average time for zero size MPI_Send():
> 5.48363e-06
> > > > petscmpirun -n 2 taskset -c 0,14 ./ex2
> > > -log_summary
> > > > | egrep \(MPI_Send\|MPI_Barrier\)
> > > Average time for MPI_Barrier(): 2.00272e-06
> > > Average time for zero size MPI_Send():
> 4.52995e-06
> > > I also did 
> > >  petscmpirun -n 2 taskset -c 0,10 ./ex2
> -log_summary
> > > |
> > > egrep \(MPI_Send\|MPI_Barrier\)
> > > Average time for MPI_Barrier(): 5.00679e-06
> > > Average time for zero size MPI_Send():
> 3.93391e-06
> > > 
> > > 
> > > The results are not so different from each
> other.
> > > Also
> > > please note, the timing is not exact, some times
> I
> > > got
> > > O(1e-5) timings for all cases.
> > > I assume these numbers are pretty good, right?
> Does
> > > it
> > > indicate that the MPI communication on a SMP
> machine
> > > is very fast?
> > > I will do a similar test on a cluster and report
> it
> > > back to the list.
> > > 
> > > Shi
> > > 
> > > 
> > > 
> > > 
> > >  
> > >
> >
>
____________________________________________________________________________________
> > > Need Mail bonding?
> > > Go to the Yahoo! Mail Q&A for great tips from
> Yahoo!
> > > Answers users.
> > >
> >
>
http://answers.yahoo.com/dir/?link=list&sid=396546091
> > > 
> > > 
> > 
> > 
> > 
> >  
> >
>
____________________________________________________________________________________
> > Yahoo! Music Unlimited
> > Access over 1 million songs.
> 
=== message truncated ===



 
____________________________________________________________________________________
Have a burning question?  
Go to www.Answers.yahoo.com and get answers from real people who know.



From jinzishuai at yahoo.com  Mon Feb 12 22:43:52 2007
From: jinzishuai at yahoo.com (Shi Jin)
Date: Mon, 12 Feb 2007 20:43:52 -0800 (PST)
Subject: Using ICC for MPISBAIJ?
Message-ID: <205544.61365.qm@web36213.mail.mud.yahoo.com>

Hi All,

Thank you very much for the help you gave me in tuning
my code. I now think it is important for us to take
advantage of the symmetric positive definiteness
property of our Matrix, i.e., we should use the
conjugate gradient (CG) method with incomplete
Cholesky decomposition (ICC) as the pre-conditioner (I
assume this is commonly accepted at least for serial
computation, right?).
However, I am surprised and disappointed to realize
that the -pc_type icc option only exists for seqsbaij
Matrices. In order to parallelize the linear solver, I
have to use the external package BlockSolve95.
I took a look at this package at
http://www-unix.mcs.anl.gov/sumaa3d/BlockSolve/ 
I am very disappointed to see it hasn't been in
development ever since 1997. I am worried it does not
provide a state-of-art performance.

Nevertheless, I gave it a try. The package is not as 
easy to build as common linux software (even much
worse than Petsc), especially according their REAME,
it is unknown to work with linux. However, by
hand-editing the bmake/linux/linux.site file, I seemed
to be able to build the library. However, the examples
doesn't build and the PETSC built with BlockSolve95
gives me errors in linking like:
undefined referece to "dgemv_" and "dgetrf_".

In another place of the PETSC mannul, I found there is
another external package "Spooles" that can also be
used  with mpisbaij and Cholesky PC. But it is also
dated in 1999.

Could anyone give me some advice what is the best way
to go to solve a large sparse symmetric positive
definite linux system efficiently using MPI on a
cluster?

Thank you very much.
Shi



 
____________________________________________________________________________________
Don't get soaked.  Take a quick peak at the forecast
with the Yahoo! Search weather shortcut.
http://tools.search.yahoo.com/shortcuts/#loc_weather



From hzhang at mcs.anl.gov  Mon Feb 12 23:17:33 2007
From: hzhang at mcs.anl.gov (Hong Zhang)
Date: Mon, 12 Feb 2007 23:17:33 -0600 (CST)
Subject: Using ICC for MPISBAIJ?
In-Reply-To: <205544.61365.qm@web36213.mail.mud.yahoo.com>
References: <205544.61365.qm@web36213.mail.mud.yahoo.com>
Message-ID: <Pine.LNX.4.58.0702122302510.28927@terra.mcs.anl.gov>



> Thank you very much for the help you gave me in tuning
> my code. I now think it is important for us to take
> advantage of the symmetric positive definiteness
> property of our Matrix, i.e., we should use the
> conjugate gradient (CG) method with incomplete
> Cholesky decomposition (ICC) as the pre-conditioner (I
> assume this is commonly accepted at least for serial
> computation, right?).
Yes.

> However, I am surprised and disappointed to realize
> that the -pc_type icc option only exists for seqsbaij
> Matrices. In order to parallelize the linear solver, I

icc also works for seqaij type, which enables more efficient
data accessing than seqsbaij.

> have to use the external package BlockSolve95.
> I took a look at this package at
> http://www-unix.mcs.anl.gov/sumaa3d/BlockSolve/
> I am very disappointed to see it hasn't been in
> development ever since 1997. I am worried it does not
> provide a state-of-art performance.
>
> Nevertheless, I gave it a try. The package is not as
> easy to build as common linux software (even much
> worse than Petsc), especially according their REAME,
> it is unknown to work with linux. However, by
> hand-editing the bmake/linux/linux.site file, I seemed
> to be able to build the library. However, the examples
> doesn't build and the PETSC built with BlockSolve95
> gives me errors in linking like:
> undefined referece to "dgemv_" and "dgetrf_".

This seems relates to linking lapack. Satish might knows
about it.

>
> In another place of the PETSC mannul, I found there is
> another external package "Spooles" that can also be
> used  with mpisbaij and Cholesky PC. But it is also
> dated in 1999.

Spooles is sparse direct solver. Although it has been
out of support since 99, we find it is still in good quality,
especially it has good robustness and portability.
Petsc also interfaces with other well-maintained sparse
direct solvers, e.g., mumps and superlu_dist. When matrices are
in the order of 100k or less and ill-conditioned, the direct solvers
are good choices.
>
> Could anyone give me some advice what is the best way
> to go to solve a large sparse symmetric positive
> definite linux system efficiently using MPI on a
> cluster?

The performance is application dependant. Petsc allows you
testing various algorithms at runtime.
Use '-help' to see all possible options. Run your application
with '-log_summary' to collect and compare performance data.

Good luck,

Hong
>
>
>
> ____________________________________________________________________________________
> Don't get soaked.  Take a quick peak at the forecast
> with the Yahoo! Search weather shortcut.
> http://tools.search.yahoo.com/shortcuts/#loc_weather
>
>



From hzhang at mcs.anl.gov  Mon Feb 12 23:22:48 2007
From: hzhang at mcs.anl.gov (Hong Zhang)
Date: Mon, 12 Feb 2007 23:22:48 -0600 (CST)
Subject: Using ICC for MPISBAIJ?
In-Reply-To: <205544.61365.qm@web36213.mail.mud.yahoo.com>
References: <205544.61365.qm@web36213.mail.mud.yahoo.com>
Message-ID: <Pine.LNX.4.58.0702122318250.29513@terra.mcs.anl.gov>


I forget to tell you that you can use parallel
CG with block-jacobi, and sequential icc within the
diagonal blocks. Example, run
src/ksp/ksp/examples/tutorials/ex5 with
mpirun -np 2 ./ex5 -ksp_type cg -pc_type bjacobi -sub_pc_type icc
-ksp_view

Use '-help' to get many options on icc.

Hong

On Mon, 12 Feb 2007, Shi Jin wrote:

> Hi All,
>
> Thank you very much for the help you gave me in tuning
> my code. I now think it is important for us to take
> advantage of the symmetric positive definiteness
> property of our Matrix, i.e., we should use the
> conjugate gradient (CG) method with incomplete
> Cholesky decomposition (ICC) as the pre-conditioner (I
> assume this is commonly accepted at least for serial
> computation, right?).
> However, I am surprised and disappointed to realize
> that the -pc_type icc option only exists for seqsbaij
> Matrices. In order to parallelize the linear solver, I
> have to use the external package BlockSolve95.
> I took a look at this package at
> http://www-unix.mcs.anl.gov/sumaa3d/BlockSolve/
> I am very disappointed to see it hasn't been in
> development ever since 1997. I am worried it does not
> provide a state-of-art performance.
>
> Nevertheless, I gave it a try. The package is not as
> easy to build as common linux software (even much
> worse than Petsc), especially according their REAME,
> it is unknown to work with linux. However, by
> hand-editing the bmake/linux/linux.site file, I seemed
> to be able to build the library. However, the examples
> doesn't build and the PETSC built with BlockSolve95
> gives me errors in linking like:
> undefined referece to "dgemv_" and "dgetrf_".
>
> In another place of the PETSC mannul, I found there is
> another external package "Spooles" that can also be
> used  with mpisbaij and Cholesky PC. But it is also
> dated in 1999.
>
> Could anyone give me some advice what is the best way
> to go to solve a large sparse symmetric positive
> definite linux system efficiently using MPI on a
> cluster?
>
> Thank you very much.
> Shi
>
>
>
>
> ____________________________________________________________________________________
> Don't get soaked.  Take a quick peak at the forecast
> with the Yahoo! Search weather shortcut.
> http://tools.search.yahoo.com/shortcuts/#loc_weather
>
>



From zonexo at gmail.com  Tue Feb 13 00:28:41 2007
From: zonexo at gmail.com (Ben Tay)
Date: Tue, 13 Feb 2007 14:28:41 +0800
Subject: understanding the output from -info
In-Reply-To: <Pine.LNX.4.64.0702112137380.6359@asterix>
References: <804ab5d40702080747g69ee8b44h54cce509177ee0a8@mail.gmail.com>
	 <Pine.OSX.4.64.0702101305160.21480@barry-smiths-computer.local>
	 <804ab5d40702101702s71c974d7u39a97d6ab8058cf4@mail.gmail.com>
	 <e7ba66e40702102003i68c7f3bds40d8343c23616998@mail.gmail.com>
	 <804ab5d40702102141s258ef22due0093263f83dc7bb@mail.gmail.com>
	 <Pine.OSX.4.64.0702111040490.21480@barry-smiths-computer.local>
	 <804ab5d40702111626p2cbbf495ma954bcda1e3b75e8@mail.gmail.com>
	 <Pine.OSX.4.64.0702111847410.21480@barry-smiths-computer.local>
	 <804ab5d40702111921q2767248dte540b04e38a71236@mail.gmail.com>
	 <Pine.LNX.4.64.0702112137380.6359@asterix>
Message-ID: <804ab5d40702122228k66de0664t62fe2e12db3be1a9@mail.gmail.com>

Ya thanks for the suggestion. strangely it worked. However, if I had not
included hypre, the original command also worked.

Tks anyway.


On 2/12/07, Satish Balay <balay at mcs.anl.gov> wrote:
>
> - If you have build isses [involing sending configure.log] please use
> petsc-maint at mcs.anl.gov address [not the mailing list]
>
> - Looks like you were using the following configure options:
>
> --with-cc=/scratch/g0306332/intel/cc/bin/icc
> --with-fc=/lsftmp/g0306332/inter/fc/bin/ifort
> --with-blas-lapack-dir=/lsftmp/g0306332/inter/mkl/lib/32
> --with-mpi=0 --with-x=0 --with-shared
>
> But now - you are not specifing the compilers. The default compiler in
> your path must be Intel compilers version 7. Configure breaks with it.
> So sugest using the compilers that worked for you before. i.e
>
> --with-cc=/scratch/g0306332/intel/cc/bin/icc
> --with-fc=/lsftmp/g0306332/inter/fc/bin/ifort
>
> If you still have problem with hypre - remove
> externalpackages/hypre-1.11.1b and retry.
>
> Satish
>
> On Mon, 12 Feb 2007, Ben Tay wrote:
>
> > Hi,
> >
> > I tried to compile PETSc again and using --download-hypre=1. My command
> > given is
> >
> > ./config/configure.py --with-vendor-compilers=intel
> > --with-blas-lapack-dir=/lsftmp/g0306332/inter/mkl/ --wit
> > h-x=0 --with-shared --with-mpi-dir=/opt/mpich/myrinet/intel/
> > --with-debugging=0 --download-hypre=1
> >
> > I tried twice and the same error msg appears:
> >
> > Downloaded hypre could not be used. Please check install in
> > /nas/lsftmp/g0306332/petsc-2.3.2-p8/externalpackages/hypre-1.11.1b
> /linux-hypre.
> > I've attached the configure.log for your reference.
> >
> > Thank you.
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070213/9fe36306/attachment.htm>

From bsmith at mcs.anl.gov  Tue Feb 13 09:07:20 2007
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Tue, 13 Feb 2007 09:07:20 -0600 (CST)
Subject: Using ICC for MPISBAIJ?
In-Reply-To: <205544.61365.qm@web36213.mail.mud.yahoo.com>
References: <205544.61365.qm@web36213.mail.mud.yahoo.com>
Message-ID: <Pine.OSX.4.64.0702130906230.21480@barry-smiths-computer.local>


  For a moderate number of processes
-pc_type bjacobi -sub_pc_type icc -sub_ksp_type preonly

or

-pc_type asm -sub_pc_type icc -sub_ksp_type preonly

Barry


On Mon, 12 Feb 2007, Shi Jin wrote:

> Hi All,
> 
> Thank you very much for the help you gave me in tuning
> my code. I now think it is important for us to take
> advantage of the symmetric positive definiteness
> property of our Matrix, i.e., we should use the
> conjugate gradient (CG) method with incomplete
> Cholesky decomposition (ICC) as the pre-conditioner (I
> assume this is commonly accepted at least for serial
> computation, right?).
> However, I am surprised and disappointed to realize
> that the -pc_type icc option only exists for seqsbaij
> Matrices. In order to parallelize the linear solver, I
> have to use the external package BlockSolve95.
> I took a look at this package at
> http://www-unix.mcs.anl.gov/sumaa3d/BlockSolve/ 
> I am very disappointed to see it hasn't been in
> development ever since 1997. I am worried it does not
> provide a state-of-art performance.
> 
> Nevertheless, I gave it a try. The package is not as 
> easy to build as common linux software (even much
> worse than Petsc), especially according their REAME,
> it is unknown to work with linux. However, by
> hand-editing the bmake/linux/linux.site file, I seemed
> to be able to build the library. However, the examples
> doesn't build and the PETSC built with BlockSolve95
> gives me errors in linking like:
> undefined referece to "dgemv_" and "dgetrf_".
> 
> In another place of the PETSC mannul, I found there is
> another external package "Spooles" that can also be
> used  with mpisbaij and Cholesky PC. But it is also
> dated in 1999.
> 
> Could anyone give me some advice what is the best way
> to go to solve a large sparse symmetric positive
> definite linux system efficiently using MPI on a
> cluster?
> 
> Thank you very much.
> Shi
> 
> 
> 
>  
> ____________________________________________________________________________________
> Don't get soaked.  Take a quick peak at the forecast
> with the Yahoo! Search weather shortcut.
> http://tools.search.yahoo.com/shortcuts/#loc_weather
> 
> 



From dimitri.lecas at c-s.fr  Wed Feb 14 14:07:39 2007
From: dimitri.lecas at c-s.fr (LECAS Dimitri)
Date: Wed, 14 Feb 2007 21:07:39 +0100
Subject: Differents number of iterations with the same problem
Message-ID: <1fd0920e75.20e751fd09@c-s.fr>

Hello

I'm surprised to not have the same numbers of iterations when i run
several instance of my program with the same number of processors and
the same matrix/right hand side. Is there some random or it's
asynchronous message passing fault ?

PS:  My code used KSPSolve with bicg and jacobi for pc)



From knepley at gmail.com  Wed Feb 14 14:14:51 2007
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 14 Feb 2007 14:14:51 -0600
Subject: Differents number of iterations with the same problem
In-Reply-To: <1fd0920e75.20e751fd09@c-s.fr>
References: <1fd0920e75.20e751fd09@c-s.fr>
Message-ID: <a9f269830702141214r1878767ekc9a7372ea937b99a@mail.gmail.com>

No there is no randomness. I suspect the matrix/rhs is not the same.

   Matt

On 2/14/07, LECAS Dimitri <dimitri.lecas at c-s.fr> wrote:
>
> Hello
>
> I'm surprised to not have the same numbers of iterations when i run
> several instance of my program with the same number of processors and
> the same matrix/right hand side. Is there some random or it's
> asynchronous message passing fault ?
>
> PS:  My code used KSPSolve with bicg and jacobi for pc)
>
>


-- 
One trouble is that despite this system, anyone who reads journals widely
and critically is forced to realize that there are scarcely any bars to
eventual
publication. There seems to be no study too fragmented, no hypothesis too
trivial, no literature citation too biased or too egotistical, no design too
warped, no methodology too bungled, no presentation of results too
inaccurate, too obscure, and too contradictory, no analysis too
self-serving,
no argument too circular, no conclusions too trifling or too unjustified,
and
no grammar and syntax too offensive for a paper to end up in print. --
Drummond Rennie
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070214/b86a5ef1/attachment.htm>

From bsmith at mcs.anl.gov  Wed Feb 14 16:47:03 2007
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Wed, 14 Feb 2007 16:47:03 -0600 (CST)
Subject: Differents number of iterations with the same problem
In-Reply-To: <1fd0920e75.20e751fd09@c-s.fr>
References: <1fd0920e75.20e751fd09@c-s.fr>
Message-ID: <Pine.OSX.4.64.0702141645480.17197@internal.address.see.rfc1918.mc.net>


  With jacobi it probably requires a lot of iterations? Then it would
not supprise me to have a slight difference in iteration count. If
it is taking 1000's of iterations then I would expect large differences
in iteration count.

    Barry

On Wed, 14 Feb 2007, LECAS Dimitri wrote:

> Hello
> 
> I'm surprised to not have the same numbers of iterations when i run
> several instance of my program with the same number of processors and
> the same matrix/right hand side. Is there some random or it's
> asynchronous message passing fault ?
> 
> PS:  My code used KSPSolve with bicg and jacobi for pc)
> 
> 



From manav at u.washington.edu  Thu Feb 15 02:56:53 2007
From: manav at u.washington.edu (Manav Bhatia)
Date: Thu, 15 Feb 2007 00:56:53 -0800
Subject: matrix addition
Message-ID: <FEADDCC8-7E10-43CC-B2A2-D6C9BCACD23D@u.washington.edu>

Hi,
   I need to add two matrices, but I did not find a direct function  
to do so. Is there a specific reason not having a matrix addition  
function?
   In the absence of such a function, I am thinking of extracting  
each row of the two matrices and adding them. Would there be a more  
efficient way to do the same?

   Kindly help me with your advice.

Thanks,
Manav



From DOMI0002 at ntu.edu.sg  Thu Feb 15 03:24:08 2007
From: DOMI0002 at ntu.edu.sg (#DOMINIC DENVER JOHN CHANDAR#)
Date: Thu, 15 Feb 2007 17:24:08 +0800
Subject: matrix addition
In-Reply-To: <FEADDCC8-7E10-43CC-B2A2-D6C9BCACD23D@u.washington.edu>
Message-ID: <A970687040E72044810535C74B5F0E49CD72B0@MAIL22.student.main.ntu.edu.sg>

How about MatAXPY() ?

 
http://www-unix.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/
manualpages/Mat/MatAXPY.html

 Computes Y = aX + Y , where X, Y are matrices. Set a=1.


-Dominic



-----Original Message-----
From: owner-petsc-users at mcs.anl.gov
[mailto:owner-petsc-users at mcs.anl.gov] On Behalf Of Manav Bhatia
Sent: Thursday, February 15, 2007 4:57 PM
To: petsc-users at mcs.anl.gov
Subject: matrix addition

Hi,
   I need to add two matrices, but I did not find a direct function to
do so. Is there a specific reason not having a matrix addition function?
   In the absence of such a function, I am thinking of extracting each
row of the two matrices and adding them. Would there be a more efficient
way to do the same?

   Kindly help me with your advice.

Thanks,
Manav



From manav at u.washington.edu  Thu Feb 15 03:29:00 2007
From: manav at u.washington.edu (Manav Bhatia)
Date: Thu, 15 Feb 2007 01:29:00 -0800
Subject: matrix addition
In-Reply-To: <A970687040E72044810535C74B5F0E49CD72B0@MAIL22.student.main.ntu.edu.sg>
References: <A970687040E72044810535C74B5F0E49CD72B0@MAIL22.student.main.ntu.edu.sg>
Message-ID: <B19F8867-D16C-4A01-A43D-672EE458E62B@u.washington.edu>

ooppss.... totally missed it...
Thanks, :-)

Manav



On Feb 15, 2007, at 1:24 AM, #DOMINIC DENVER JOHN CHANDAR# wrote:

> How about MatAXPY() ?
>
>
> http://www-unix.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/ 
> docs/
> manualpages/Mat/MatAXPY.html
>
>  Computes Y = aX + Y , where X, Y are matrices. Set a=1.
>
>
> -Dominic
>
>
>
> -----Original Message-----
> From: owner-petsc-users at mcs.anl.gov
> [mailto:owner-petsc-users at mcs.anl.gov] On Behalf Of Manav Bhatia
> Sent: Thursday, February 15, 2007 4:57 PM
> To: petsc-users at mcs.anl.gov
> Subject: matrix addition
>
> Hi,
>    I need to add two matrices, but I did not find a direct function to
> do so. Is there a specific reason not having a matrix addition  
> function?
>    In the absence of such a function, I am thinking of extracting each
> row of the two matrices and adding them. Would there be a more  
> efficient
> way to do the same?
>
>    Kindly help me with your advice.
>
> Thanks,
> Manav
>



From jianings at gmail.com  Thu Feb 15 13:18:27 2007
From: jianings at gmail.com (Jianing Shi)
Date: Thu, 15 Feb 2007 11:18:27 -0800
Subject: code design
Message-ID: <63516a2e0702151118r7aeed903y40ebeab1be9a9170@mail.gmail.com>

Hi Petsc masters,

I have a question which is more about code design using Petsc.

Suppose I need to implement a C++ library to provide an interface for
users to set up the ODE and/or PDE systems, which I will solve on
parallel computers using Petsc.  Since Petsc has defined its own data
type (in fact, a lot), PetscInt, PetscScalar, etc.  I would like to
link my own C++ library to the Petsc library.  I imagine there are two
solutions:

1) write an interface between my library and Petsc, i.e., between my
own data structure (object-oriented) with the DA structure of Petsc.
This requires translation between all the data type, for instance, int
and PetscInt....

2) use templated programming in my own library, so that when I link to
the Petsc library, I can easily reuse my own code to set up the Right
hand side, Jacobian and so on.

Just wondering what is a good solution for an efficient and neat design?

Thanks,

Jianing



From knepley at gmail.com  Thu Feb 15 13:23:37 2007
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 15 Feb 2007 13:23:37 -0600
Subject: code design
In-Reply-To: <63516a2e0702151118r7aeed903y40ebeab1be9a9170@mail.gmail.com>
References: <63516a2e0702151118r7aeed903y40ebeab1be9a9170@mail.gmail.com>
Message-ID: <a9f269830702151123me940256hbfd36536706f59f6@mail.gmail.com>

On 2/15/07, Jianing Shi <jianings at gmail.com> wrote:
> Hi Petsc masters,
>
> I have a question which is more about code design using Petsc.
>
> Suppose I need to implement a C++ library to provide an interface for
> users to set up the ODE and/or PDE systems, which I will solve on
> parallel computers using Petsc.  Since Petsc has defined its own data
> type (in fact, a lot), PetscInt, PetscScalar, etc.  I would like to
> link my own C++ library to the Petsc library.  I imagine there are two
> solutions:
>
> 1) write an interface between my library and Petsc, i.e., between my
> own data structure (object-oriented) with the DA structure of Petsc.
> This requires translation between all the data type, for instance, int
> and PetscInt....

OO is really orthogonal to the introduction of new types.

> 2) use templated programming in my own library, so that when I link to
> the Petsc library, I can easily reuse my own code to set up the Right
> hand side, Jacobian and so on.

Yes, this is the correct way to handle .it.

  Matt

> Just wondering what is a good solution for an efficient and neat design?
>
> Thanks,
>
> Jianing
>
>


-- 
One trouble is that despite this system, anyone who reads journals widely
and critically is forced to realize that there are scarcely any bars to eventual
publication. There seems to be no study too fragmented, no hypothesis too
trivial, no literature citation too biased or too egotistical, no design too
warped, no methodology too bungled, no presentation of results too
inaccurate, too obscure, and too contradictory, no analysis too self-serving,
no argument too circular, no conclusions too trifling or too unjustified, and
no grammar and syntax too offensive for a paper to end up in print. --
Drummond Rennie



From jianings at gmail.com  Thu Feb 15 13:46:15 2007
From: jianings at gmail.com (Jianing Shi)
Date: Thu, 15 Feb 2007 11:46:15 -0800
Subject: code design
In-Reply-To: <a9f269830702151123me940256hbfd36536706f59f6@mail.gmail.com>
References: <63516a2e0702151118r7aeed903y40ebeab1be9a9170@mail.gmail.com>
	 <a9f269830702151123me940256hbfd36536706f59f6@mail.gmail.com>
Message-ID: <63516a2e0702151146x5ac51efdw3547a42d46b497f6@mail.gmail.com>

So follow up my puzzles for code design, do I have the same problem if
I want to use SUNDIALS control script, and link it to Petsc library?

I am trying to building up my own C++ library, handle it over to PVODE
control script, and solve the underlying system using Petsc.

Jianing

> > I have a question which is more about code design using Petsc.
> >
> > Suppose I need to implement a C++ library to provide an interface for
> > users to set up the ODE and/or PDE systems, which I will solve on
> > parallel computers using Petsc.  Since Petsc has defined its own data
> > type (in fact, a lot), PetscInt, PetscScalar, etc.  I would like to
> > link my own C++ library to the Petsc library.  I imagine there are two
> > solutions:
> >
> > 1) write an interface between my library and Petsc, i.e., between my
> > own data structure (object-oriented) with the DA structure of Petsc.
> > This requires translation between all the data type, for instance, int
> > and PetscInt....
>
> OO is really orthogonal to the introduction of new types.
>
> > 2) use templated programming in my own library, so that when I link to
> > the Petsc library, I can easily reuse my own code to set up the Right
> > hand side, Jacobian and so on.
>
> Yes, this is the correct way to handle .it.
>
>   Matt
>
> > Just wondering what is a good solution for an efficient and neat design?
> >
> > Thanks,
> >
> > Jianing



From knepley at gmail.com  Fri Feb 16 12:45:34 2007
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 16 Feb 2007 12:45:34 -0600
Subject: code design
In-Reply-To: <63516a2e0702151146x5ac51efdw3547a42d46b497f6@mail.gmail.com>
References: <63516a2e0702151118r7aeed903y40ebeab1be9a9170@mail.gmail.com>
	 <a9f269830702151123me940256hbfd36536706f59f6@mail.gmail.com>
	 <63516a2e0702151146x5ac51efdw3547a42d46b497f6@mail.gmail.com>
Message-ID: <a9f269830702161045l218b6d91ja95fee0e861106be@mail.gmail.com>

I don't really understand the question, I guess.

   Matt

On 2/15/07, Jianing Shi <jianings at gmail.com> wrote:
> So follow up my puzzles for code design, do I have the same problem if
> I want to use SUNDIALS control script, and link it to Petsc library?
>
> I am trying to building up my own C++ library, handle it over to PVODE
> control script, and solve the underlying system using Petsc.
>
> Jianing
>
> > > I have a question which is more about code design using Petsc.
> > >
> > > Suppose I need to implement a C++ library to provide an interface for
> > > users to set up the ODE and/or PDE systems, which I will solve on
> > > parallel computers using Petsc.  Since Petsc has defined its own data
> > > type (in fact, a lot), PetscInt, PetscScalar, etc.  I would like to
> > > link my own C++ library to the Petsc library.  I imagine there are two
> > > solutions:
> > >
> > > 1) write an interface between my library and Petsc, i.e., between my
> > > own data structure (object-oriented) with the DA structure of Petsc.
> > > This requires translation between all the data type, for instance, int
> > > and PetscInt....
> >
> > OO is really orthogonal to the introduction of new types.
> >
> > > 2) use templated programming in my own library, so that when I link to
> > > the Petsc library, I can easily reuse my own code to set up the Right
> > > hand side, Jacobian and so on.
> >
> > Yes, this is the correct way to handle .it.
> >
> >   Matt
> >
> > > Just wondering what is a good solution for an efficient and neat design?
> > >
> > > Thanks,
> > >
> > > Jianing
>
>


-- 
One trouble is that despite this system, anyone who reads journals widely
and critically is forced to realize that there are scarcely any bars to eventual
publication. There seems to be no study too fragmented, no hypothesis too
trivial, no literature citation too biased or too egotistical, no design too
warped, no methodology too bungled, no presentation of results too
inaccurate, too obscure, and too contradictory, no analysis too self-serving,
no argument too circular, no conclusions too trifling or too unjustified, and
no grammar and syntax too offensive for a paper to end up in print. --
Drummond Rennie



From hzhang at mcs.anl.gov  Fri Feb 16 13:30:20 2007
From: hzhang at mcs.anl.gov (Hong Zhang)
Date: Fri, 16 Feb 2007 13:30:20 -0600 (CST)
Subject: code design
In-Reply-To: <a9f269830702161045l218b6d91ja95fee0e861106be@mail.gmail.com>
References: <63516a2e0702151118r7aeed903y40ebeab1be9a9170@mail.gmail.com> 
 <a9f269830702151123me940256hbfd36536706f59f6@mail.gmail.com> 
 <63516a2e0702151146x5ac51efdw3547a42d46b497f6@mail.gmail.com>
 <a9f269830702161045l218b6d91ja95fee0e861106be@mail.gmail.com>
Message-ID: <Pine.LNX.4.58.0702161321200.25144@terra.mcs.anl.gov>


Jianing,
>
> On 2/15/07, Jianing Shi <jianings at gmail.com> wrote:
> > So follow up my puzzles for code design, do I have the same problem if
> > I want to use SUNDIALS control script, and link it to Petsc library?

I don't know what is SUNDIALS control script. We do support
CVODE through petsc-sundials interface.
> >
> > I am trying to building up my own C++ library, handle it over to PVODE
> > control script, and solve the underlying system using Petsc.

We have intention to use CVODE's multi-time-step control, and use Petsc
solving linear and non-linear systems at each time step.
Is this what you need?
Current interface uses SUNDAILS solvers.

The interface is implemented in
~petsc/src/ts/impls/implicit/sundials/sundials.c

Hong

> >
> > Jianing
> >
> > > > I have a question which is more about code design using Petsc.
> > > >
> > > > Suppose I need to implement a C++ library to provide an interface for
> > > > users to set up the ODE and/or PDE systems, which I will solve on
> > > > parallel computers using Petsc.  Since Petsc has defined its own data
> > > > type (in fact, a lot), PetscInt, PetscScalar, etc.  I would like to
> > > > link my own C++ library to the Petsc library.  I imagine there are two
> > > > solutions:
> > > >
> > > > 1) write an interface between my library and Petsc, i.e., between my
> > > > own data structure (object-oriented) with the DA structure of Petsc.
> > > > This requires translation between all the data type, for instance, int
> > > > and PetscInt....
> > >
> > > OO is really orthogonal to the introduction of new types.
> > >
> > > > 2) use templated programming in my own library, so that when I link to
> > > > the Petsc library, I can easily reuse my own code to set up the Right
> > > > hand side, Jacobian and so on.
> > >
> > > Yes, this is the correct way to handle .it.
> > >
> > >   Matt
> > >
> > > > Just wondering what is a good solution for an efficient and neat design?
> > > >
> > > > Thanks,
> > > >
> > > > Jianing
> >
> >
>
>
> --
> One trouble is that despite this system, anyone who reads journals widely
> and critically is forced to realize that there are scarcely any bars to eventual
> publication. There seems to be no study too fragmented, no hypothesis too
> trivial, no literature citation too biased or too egotistical, no design too
> warped, no methodology too bungled, no presentation of results too
> inaccurate, too obscure, and too contradictory, no analysis too self-serving,
> no argument too circular, no conclusions too trifling or too unjustified, and
> no grammar and syntax too offensive for a paper to end up in print. --
> Drummond Rennie
>
>



From jinzishuai at yahoo.com  Fri Feb 16 14:18:22 2007
From: jinzishuai at yahoo.com (Shi Jin)
Date: Fri, 16 Feb 2007 12:18:22 -0800 (PST)
Subject: Problem creating a non-square MPIAIJ Matrix 
Message-ID: <581824.83974.qm@web36201.mail.mud.yahoo.com>

Hi there,

I am found a very mysterious problem of creating a MxN
matrix when M>N.
Please take a look at the attached short test code I
wrote to demonstrate the problem.

Basically, I want to create a 8x6 matrix. 
A={1 2 3 4 0 0;
   0 0 0 0 0 0;
   0 0 0 0 0 0;
   0 0 0 0 0 0;
   0 0 0 0 0 0;
   0 0 0 0 0 0;
   0 0 0 0 0 0;
   0 0 0 0 0 0

If I do it with 2 processes, I suppose the local
submatrices should look like
p0:
   1 2 3 4 0 0;
   0 0 0 0 0 0;
   0 0 0 0 0 0;
   0 0 0 0 0 0;
p1:
   0 0 0 0 0 0;
   0 0 0 0 0 0;
   0 0 0 0 0 0;
   0 0 0 0 0 0
The problem is that on the first process, the diagonal
portion of the local submatrix is
   1 2 3 4 | 0 0;
   0 0 0 0 | 0 0;
   0 0 0 0 | 0 0;
   0 0 0 0 | 0 0;
So I need to set d_nnz[0]=4 on p0 since the first row
has 4 diagonal nonzero entries. However, when I run
the code by
mpiexec -n 2 ./mpiaij
I got error saying that
[0]PETSC ERROR: --------------------- Error Message
------------------------------------
[0]PETSC ERROR: Argument out of range!
[0]PETSC ERROR: nnz cannot be greater than row length:
local row 0 value 4 rowlength 3!

It seems that it is checking against the n parameter
which is set by petsc to be n=N/2=3.
But why should we do that? According the manual, the
local submatrix is of dimension m by N.

Could you please help me understand the problem?
Thank you very much.
Shi


 
____________________________________________________________________________________
TV dinner still cooling? 
Check out "Tonight's Picks" on Yahoo! TV.
http://tv.yahoo.com/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpiaij.c
Type: text/x-csrc
Size: 977 bytes
Desc: 3194231359-mpiaij.c
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070216/5214f593/attachment.c>

From balay at mcs.anl.gov  Fri Feb 16 14:33:11 2007
From: balay at mcs.anl.gov (Satish Balay)
Date: Fri, 16 Feb 2007 14:33:11 -0600 (CST)
Subject: Problem creating a non-square MPIAIJ Matrix 
In-Reply-To: <581824.83974.qm@web36201.mail.mud.yahoo.com>
References: <581824.83974.qm@web36201.mail.mud.yahoo.com>
Message-ID: <Pine.LNX.4.64.0702161425220.3536@asterix>


>    MatCreateMPIAIJ(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,
>            8,6,PETSC_DEFAULT,d_nnz ,PETSC_DEFAULT, o_nnz, &A);

Here you are asking PETSc to decide tine local 'm,n' partition sizes.

So you should use MatGetLocalSize() to get these values [and not
assume they are sequare blocks]

If you need to have the diagonal blocks square - then specify them
appropriately at the matrix creation time [instead of using
PETSC_DECIDE - for m,n parameters].

However - note that you'll have to create Vectors with matching
parallel layout [when using in MatVec()]

i.e in y = Ax

x should match column layout of A
y should match row layout of A

Satish

On Fri, 16 Feb 2007, Shi Jin wrote:

> Hi there,
> 
> I am found a very mysterious problem of creating a MxN
> matrix when M>N.
> Please take a look at the attached short test code I
> wrote to demonstrate the problem.
> 
> Basically, I want to create a 8x6 matrix. 
> A={1 2 3 4 0 0;
>    0 0 0 0 0 0;
>    0 0 0 0 0 0;
>    0 0 0 0 0 0;
>    0 0 0 0 0 0;
>    0 0 0 0 0 0;
>    0 0 0 0 0 0;
>    0 0 0 0 0 0
> 
> If I do it with 2 processes, I suppose the local
> submatrices should look like
> p0:
>    1 2 3 4 0 0;
>    0 0 0 0 0 0;
>    0 0 0 0 0 0;
>    0 0 0 0 0 0;
> p1:
>    0 0 0 0 0 0;
>    0 0 0 0 0 0;
>    0 0 0 0 0 0;
>    0 0 0 0 0 0
> The problem is that on the first process, the diagonal
> portion of the local submatrix is
>    1 2 3 4 | 0 0;
>    0 0 0 0 | 0 0;
>    0 0 0 0 | 0 0;
>    0 0 0 0 | 0 0;
> So I need to set d_nnz[0]=4 on p0 since the first row
> has 4 diagonal nonzero entries. However, when I run
> the code by
> mpiexec -n 2 ./mpiaij
> I got error saying that
> [0]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> [0]PETSC ERROR: Argument out of range!
> [0]PETSC ERROR: nnz cannot be greater than row length:
> local row 0 value 4 rowlength 3!
> 
> It seems that it is checking against the n parameter
> which is set by petsc to be n=N/2=3.
> But why should we do that? According the manual, the
> local submatrix is of dimension m by N.
> 
> Could you please help me understand the problem?
> Thank you very much.
> Shi
> 
> 
>  
> ____________________________________________________________________________________
> TV dinner still cooling? 
> Check out "Tonight's Picks" on Yahoo! TV.
> http://tv.yahoo.com/



From jinzishuai at yahoo.com  Fri Feb 16 14:47:00 2007
From: jinzishuai at yahoo.com (Shi Jin)
Date: Fri, 16 Feb 2007 12:47:00 -0800 (PST)
Subject: Problem creating a non-square MPIAIJ Matrix 
In-Reply-To: <Pine.LNX.4.64.0702161425220.3536@asterix>
Message-ID: <20070216204700.28042.qmail@web36211.mail.mud.yahoo.com>

I actually used MatGetLocalSize(A,&m,&n) in the code.
They give me
m=4,n=3, as expected.
I can also specify m=4,n=3 in MatCreateMPIAIJ() which
is exactly identical to the previous code. If I
specify anything else, I get error saying that they
don't agree with the global sizes.

But I don't understand why we need to make n=N/2?
Are we storing the whole rows of the matrix? Just like
the mannual says, the local submatrix is of size m*N.

Shi
--- Satish Balay <balay at mcs.anl.gov> wrote:

> 
> >   
>
MatCreateMPIAIJ(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,
> >            8,6,PETSC_DEFAULT,d_nnz ,PETSC_DEFAULT,
> o_nnz, &A);
> 
> Here you are asking PETSc to decide tine local 'm,n'
> partition sizes.
> 
> So you should use MatGetLocalSize() to get these
> values [and not
> assume they are sequare blocks]
> 
> If you need to have the diagonal blocks square -
> then specify them
> appropriately at the matrix creation time [instead
> of using
> PETSC_DECIDE - for m,n parameters].
> 
> However - note that you'll have to create Vectors
> with matching
> parallel layout [when using in MatVec()]
> 
> i.e in y = Ax
> 
> x should match column layout of A
> y should match row layout of A
> 
> Satish
> 
> On Fri, 16 Feb 2007, Shi Jin wrote:
> 
> > Hi there,
> > 
> > I am found a very mysterious problem of creating a
> MxN
> > matrix when M>N.
> > Please take a look at the attached short test code
> I
> > wrote to demonstrate the problem.
> > 
> > Basically, I want to create a 8x6 matrix. 
> > A={1 2 3 4 0 0;
> >    0 0 0 0 0 0;
> >    0 0 0 0 0 0;
> >    0 0 0 0 0 0;
> >    0 0 0 0 0 0;
> >    0 0 0 0 0 0;
> >    0 0 0 0 0 0;
> >    0 0 0 0 0 0
> > 
> > If I do it with 2 processes, I suppose the local
> > submatrices should look like
> > p0:
> >    1 2 3 4 0 0;
> >    0 0 0 0 0 0;
> >    0 0 0 0 0 0;
> >    0 0 0 0 0 0;
> > p1:
> >    0 0 0 0 0 0;
> >    0 0 0 0 0 0;
> >    0 0 0 0 0 0;
> >    0 0 0 0 0 0
> > The problem is that on the first process, the
> diagonal
> > portion of the local submatrix is
> >    1 2 3 4 | 0 0;
> >    0 0 0 0 | 0 0;
> >    0 0 0 0 | 0 0;
> >    0 0 0 0 | 0 0;
> > So I need to set d_nnz[0]=4 on p0 since the first
> row
> > has 4 diagonal nonzero entries. However, when I
> run
> > the code by
> > mpiexec -n 2 ./mpiaij
> > I got error saying that
> > [0]PETSC ERROR: --------------------- Error
> Message
> > ------------------------------------
> > [0]PETSC ERROR: Argument out of range!
> > [0]PETSC ERROR: nnz cannot be greater than row
> length:
> > local row 0 value 4 rowlength 3!
> > 
> > It seems that it is checking against the n
> parameter
> > which is set by petsc to be n=N/2=3.
> > But why should we do that? According the manual,
> the
> > local submatrix is of dimension m by N.
> > 
> > Could you please help me understand the problem?
> > Thank you very much.
> > Shi
> > 
> > 
> >  
> >
>
____________________________________________________________________________________
> > TV dinner still cooling? 
> > Check out "Tonight's Picks" on Yahoo! TV.
> > http://tv.yahoo.com/
> 
> 



 
____________________________________________________________________________________
Have a burning question?  
Go to www.Answers.yahoo.com and get answers from real people who know.



From balay at mcs.anl.gov  Fri Feb 16 15:03:56 2007
From: balay at mcs.anl.gov (Satish Balay)
Date: Fri, 16 Feb 2007 15:03:56 -0600 (CST)
Subject: Problem creating a non-square MPIAIJ Matrix 
In-Reply-To: <20070216204700.28042.qmail@web36211.mail.mud.yahoo.com>
References: <20070216204700.28042.qmail@web36211.mail.mud.yahoo.com>
Message-ID: <Pine.LNX.4.64.0702161450040.13161@asterix>

On Fri, 16 Feb 2007, Shi Jin wrote:

> I actually used MatGetLocalSize(A,&m,&n) in the code.  They give me
> m=4,n=3, as expected.  I can also specify m=4,n=3 in
> MatCreateMPIAIJ() which is exactly identical to the previous code.

> If I specify anything else, I get error saying that they don't agree
> with the global sizes.

What did you specify? Notice that in your partition scheme (m,n) have
different values on each proc)

(M,N = 8,6) (m0,n0 = 4,4)  (m1,n1 = 4,2)


     4      2

  1 2 3 4 | 0 0
  0 0 0 0 | 0 0
4 0 0 0 0 | 0 0
  0 0 0 0 | 0 0
  -------------
  0 0 0 0 | 0 0
4 0 0 0 0 | 0 0
  0 0 0 0 | 0 0
  0 0 0 0 | 0 0

Howeve note that you get a square diagonal block on proc-0 [4x4] but
not on proc-1 [4x2] . Its probably best to use the default PETSc
partitioning scheme then this alternative one.

> But I don't understand why we need to make n=N/2?

This is the default partitioning scheme - when you specify
PETSC_DECIDE for m,n. Here we choose to divide things as evenly as
possible.

> Are we storing the whole rows of the matrix? Just like
> the mannual says, the local submatrix is of size m*N.

We are stoing the diagonal block and offdiagonal block
separately. However both blocks are on the same processor. i.e each
processor stores m*N values - in 2 submatrices m*n, m*(N-n). To
understand this better - check manpage for MatCreateMPIAIJ().

Satish



From jinzishuai at yahoo.com  Fri Feb 16 15:34:22 2007
From: jinzishuai at yahoo.com (Shi Jin)
Date: Fri, 16 Feb 2007 13:34:22 -0800 (PST)
Subject: Problem creating a non-square MPIAIJ Matrix 
In-Reply-To: <Pine.LNX.4.64.0702161450040.13161@asterix>
Message-ID: <152812.69083.qm@web36203.mail.mud.yahoo.com>

> We are stoing the diagonal block and offdiagonal
> block
> separately. However both blocks are on the same
> processor. i.e each
> processor stores m*N values - in 2 submatrices m*n,
> m*(N-n). To
> understand this better - check manpage for
> MatCreateMPIAIJ().
Thanks. But this is completely different from what I
read from the PETSC mannual.  
http://www-unix.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manualpages/Mat/MatCreateMPIAIJ.html
Here is says:
"The DIAGONAL portion of the local submatrix of a
processor can be defined as the submatrix which is
obtained by extraction the part corresponding to the
rows r1-r2 and columns r1-r2 of the global matrix,
where r1 is the first row that belongs to the
processor, and r2 is the last row belonging to the
this processor. This is a square mxm matrix. The
remaining portion of the local submatrix (mxN)
constitute the OFF-DIAGONAL portion."

So the two matrices are mxm and mx(N-m)
instead of what you said: mxn and mx(N-n)
However, the code seems to act like what you
described.
They are equivalent for square matrices but not the
same for non-square matrices like the one I showed.
Could you please clarify whether the manual is
accurate or not?
Thank you very much.

Shi


 
____________________________________________________________________________________
Sucker-punch spam with award-winning protection. 
Try the free Yahoo! Mail Beta.
http://advision.webevents.yahoo.com/mailbeta/features_spam.html



From balay at mcs.anl.gov  Fri Feb 16 15:55:24 2007
From: balay at mcs.anl.gov (Satish Balay)
Date: Fri, 16 Feb 2007 15:55:24 -0600 (CST)
Subject: Problem creating a non-square MPIAIJ Matrix 
In-Reply-To: <152812.69083.qm@web36203.mail.mud.yahoo.com>
References: <152812.69083.qm@web36203.mail.mud.yahoo.com>
Message-ID: <Pine.LNX.4.64.0702161541420.13161@asterix>

On Fri, 16 Feb 2007, Shi Jin wrote:

> > We are stoing the diagonal block and offdiagonal block
> > separately. However both blocks are on the same processor. i.e
> > each processor stores m*N values - in 2 submatrices m*n,
> > m*(N-n). To understand this better - check manpage for
> > MatCreateMPIAIJ().

> Thanks. But this is completely different from what I
> read from the PETSC mannual.  
> http://www-unix.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manualpages/Mat/MatCreateMPIAIJ.html
> Here is says:
> "The DIAGONAL portion of the local submatrix of a
> processor can be defined as the submatrix which is
> obtained by extraction the part corresponding to the
> rows r1-r2 and columns r1-r2 of the global matrix,
> where r1 is the first row that belongs to the
> processor, and r2 is the last row belonging to the
> this processor. This is a square mxm matrix. The
> remaining portion of the local submatrix (mxN)
> constitute the OFF-DIAGONAL portion."

This text was proably writen assuming that the initial matrix was a
square MxM matrix [in which case the diagonal blocks are also
square]. This text should be corrected to reflect the 'rectangular'
matrix case as well.

>  So the two matrices are mxm and mx(N-m) instead of what you said:
> mxn and mx(N-n) However, the code seems to act like what you
> described.

This interpreatation of the partitionling is not possible. You are
assuming the following partitioning - which PETSc doesn't support.


  1 2 3 4 | 0 0
  0 0 0 0 | 0 0
  0 0 0 0 | 0 0
  0 0 0 0 | 0 0
  -------------
  0 0 | 0 0 0 0
  0 0 | 0 0 0 0
  0 0 | 0 0 0 0
  0 0 | 0 0 0 0
  
Note: the primary purpose of storing diagonal & offdiagonal blocks is
to separate comutation that requires messages from compuattion that
does not - in a MatVec.

i.e We the current petsc partitioning - the diagonal block can be
processed without any communication. [with a matching vec partitioning
- as mentioned in an earlier e-mail]

The above scheme - with different column partitioning on each node -
doesn't help with this [and removes the primary purpose for storing
the matrix blocks separately]

Satish

> They are equivalent for square matrices but not the same for
> non-square matrices like the one I showed.  Could you please clarify
> whether the manual is accurate or not?



From jianings at gmail.com  Fri Feb 16 15:57:18 2007
From: jianings at gmail.com (Jianing Shi)
Date: Fri, 16 Feb 2007 15:57:18 -0600
Subject: code design
In-Reply-To: <Pine.LNX.4.58.0702161321200.25144@terra.mcs.anl.gov>
References: <63516a2e0702151118r7aeed903y40ebeab1be9a9170@mail.gmail.com>
	 <a9f269830702151123me940256hbfd36536706f59f6@mail.gmail.com>
	 <63516a2e0702151146x5ac51efdw3547a42d46b497f6@mail.gmail.com>
	 <a9f269830702161045l218b6d91ja95fee0e861106be@mail.gmail.com>
	 <Pine.LNX.4.58.0702161321200.25144@terra.mcs.anl.gov>
Message-ID: <63516a2e0702161357n5e982c0cu2d1ea91619c589df@mail.gmail.com>

> We have intention to use CVODE's multi-time-step control, and use Petsc
> solving linear and non-linear systems at each time step.
> Is this what you need?

Yes, that is exactly what I mean.

> Current interface uses SUNDAILS solvers.
>
> The interface is implemented in
> ~petsc/src/ts/impls/implicit/sundials/sundials.c

Thanks, I will look into that.

Jianing



From jinzishuai at yahoo.com  Fri Feb 16 16:07:34 2007
From: jinzishuai at yahoo.com (Shi Jin)
Date: Fri, 16 Feb 2007 14:07:34 -0800 (PST)
Subject: Problem creating a non-square MPIAIJ Matrix 
In-Reply-To: <Pine.LNX.4.64.0702161541420.13161@asterix>
Message-ID: <155636.80641.qm@web36203.mail.mud.yahoo.com>

Thanks. Now I know I followed the documentation that
is mistaken.
I will change the code according to your description.
Thank you very much.

Shi
--- Satish Balay <balay at mcs.anl.gov> wrote:

> On Fri, 16 Feb 2007, Shi Jin wrote:
> 
> > > We are stoing the diagonal block and offdiagonal
> block
> > > separately. However both blocks are on the same
> processor. i.e
> > > each processor stores m*N values - in 2
> submatrices m*n,
> > > m*(N-n). To understand this better - check
> manpage for
> > > MatCreateMPIAIJ().
> 
> > Thanks. But this is completely different from what
> I
> > read from the PETSC mannual.  
> >
>
http://www-unix.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manualpages/Mat/MatCreateMPIAIJ.html
> > Here is says:
> > "The DIAGONAL portion of the local submatrix of a
> > processor can be defined as the submatrix which is
> > obtained by extraction the part corresponding to
> the
> > rows r1-r2 and columns r1-r2 of the global matrix,
> > where r1 is the first row that belongs to the
> > processor, and r2 is the last row belonging to the
> > this processor. This is a square mxm matrix. The
> > remaining portion of the local submatrix (mxN)
> > constitute the OFF-DIAGONAL portion."
> 
> This text was proably writen assuming that the
> initial matrix was a
> square MxM matrix [in which case the diagonal blocks
> are also
> square]. This text should be corrected to reflect
> the 'rectangular'
> matrix case as well.
> 
> >  So the two matrices are mxm and mx(N-m) instead
> of what you said:
> > mxn and mx(N-n) However, the code seems to act
> like what you
> > described.
> 
> This interpreatation of the partitionling is not
> possible. You are
> assuming the following partitioning - which PETSc
> doesn't support.
> 
> 
>   1 2 3 4 | 0 0
>   0 0 0 0 | 0 0
>   0 0 0 0 | 0 0
>   0 0 0 0 | 0 0
>   -------------
>   0 0 | 0 0 0 0
>   0 0 | 0 0 0 0
>   0 0 | 0 0 0 0
>   0 0 | 0 0 0 0
>   
> Note: the primary purpose of storing diagonal &
> offdiagonal blocks is
> to separate comutation that requires messages from
> compuattion that
> does not - in a MatVec.
> 
> i.e We the current petsc partitioning - the diagonal
> block can be
> processed without any communication. [with a
> matching vec partitioning
> - as mentioned in an earlier e-mail]
> 
> The above scheme - with different column
> partitioning on each node -
> doesn't help with this [and removes the primary
> purpose for storing
> the matrix blocks separately]
> 
> Satish
> 
> > They are equivalent for square matrices but not
> the same for
> > non-square matrices like the one I showed.  Could
> you please clarify
> > whether the manual is accurate or not?
> 
> 



 
____________________________________________________________________________________
Don't pick lemons.
See all the new 2007 cars at Yahoo! Autos.
http://autos.yahoo.com/new_cars.html 



From svm at cfdrc.com  Fri Feb 16 17:44:25 2007
From: svm at cfdrc.com (Saikrishna V. Marella)
Date: Fri, 16 Feb 2007 17:44:25 -0600
Subject: MatSetValuesBlocked
In-Reply-To: <155636.80641.qm@web36203.mail.mud.yahoo.com>
References: <Pine.LNX.4.64.0702161541420.13161@asterix> <155636.80641.qm@web36203.mail.mud.yahoo.com>
Message-ID: <000301c75224$642e3a80$10fda8c0@svmwin64>

Hey guys,

How are the block matrices assembled when using
MatSetValuesBlocked(mat,m,idxm[],n,idxn[],v[],addv).

Suppose m=n=2 and block size(bs) = 2 The matrix I am trying to assemble is

1  2  | 3  4
5  6  | 7  8
- - - | - - -
9  10 | 11 12 
13 14 | 15 16

The manual says, v[] should be row-oriented. For Block matrices what does
that mean?

Should the array v[] being passed in look like

a) v[] = [1,2,5,6,3,4,7,8,9,10,13,14,11,12,15,16] 
                
                   OR

b) v[] = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]

Thanks.
Sai Marella.
_____________________________________________________
Saikrishna Marella, (PhD), Project Engineer
CFD Research Corp. 215 Wynn Dr. Huntsville AL 35805
Tel: 256-726-4954,(4800), Fax(4806) , svm at cfdrc.com
Home Page:  http://www.cfdrc.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Saikrishna(Sai) Marella (svm at cfdrc.com).vcf
Type: text/x-vcard
Size: 420 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070216/528a5730/attachment.vcf>

From bsmith at mcs.anl.gov  Fri Feb 16 20:38:13 2007
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Fri, 16 Feb 2007 20:38:13 -0600 (CST)
Subject: MatSetValuesBlocked
In-Reply-To: <000301c75224$642e3a80$10fda8c0@svmwin64>
References: <Pine.LNX.4.64.0702161541420.13161@asterix>
 <155636.80641.qm@web36203.mail.mud.yahoo.com> <000301c75224$642e3a80$10fda8c0@svmwin64>
Message-ID: <Pine.OSX.4.64.0702162034100.17197@internal.address.see.rfc1918.mc.net>


  It is b) with the column oriented option it would be 1 5 9 13 2 6 ....

   Barry

I will add your example to the manual page to make this absolutly clear
for future users.


On Fri, 16 Feb 2007, Saikrishna V. Marella wrote:

> Hey guys,
> 
> How are the block matrices assembled when using
> MatSetValuesBlocked(mat,m,idxm[],n,idxn[],v[],addv).
> 
> Suppose m=n=2 and block size(bs) = 2 The matrix I am trying to assemble is
> 
> 1  2  | 3  4
> 5  6  | 7  8
> - - - | - - -
> 9  10 | 11 12 
> 13 14 | 15 16
> 
> The manual says, v[] should be row-oriented. For Block matrices what does
> that mean?
> 
> Should the array v[] being passed in look like
> 
> a) v[] = [1,2,5,6,3,4,7,8,9,10,13,14,11,12,15,16] 
>                 
>                    OR
> 
> b) v[] = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]
> 
> Thanks.
> Sai Marella.
> _____________________________________________________
> Saikrishna Marella, (PhD), Project Engineer
> CFD Research Corp. 215 Wynn Dr. Huntsville AL 35805
> Tel: 256-726-4954,(4800), Fax(4806) , svm at cfdrc.com
> Home Page:  http://www.cfdrc.com
> 



From manav at u.washington.edu  Sun Feb 18 07:08:14 2007
From: manav at u.washington.edu (Manav Bhatia)
Date: Sun, 18 Feb 2007 05:08:14 -0800
Subject: fill for matrix multiplication
Message-ID: <CCF65CBF-51F9-4751-89A9-67EFD8829D8E@u.washington.edu>

Hi,

   I am performing a matrix multiplication of two dense matrices with  
both MatMatMult and MatMatMultTranspose.
   What do I choose a fill factor as? According to the definition in  
the documentation: fiill = expected fill as ratio of nnz(C)/(nnz(A) +  
nnz(B)).
  So, If I have two full matrices A and B, I will get a full matrix  
as my result. Hence, the fill factor will be 0.5. However, both these  
methods give an error with a fill factor less than 1.0. Also, if I  
use PETSC_DFAULT as the fill argument, it agains results in an error  
since its value is -2.

Kindly help me with your advice here.

Thanks
Manav
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070218/0e941af4/attachment.htm>

From bsmith at mcs.anl.gov  Sun Feb 18 12:34:47 2007
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Sun, 18 Feb 2007 12:34:47 -0600 (CST)
Subject: fill for matrix multiplication
In-Reply-To: <CCF65CBF-51F9-4751-89A9-67EFD8829D8E@u.washington.edu>
References: <CCF65CBF-51F9-4751-89A9-67EFD8829D8E@u.washington.edu>
Message-ID: <Pine.OSX.4.64.0702181232580.220@barry-smiths-computer.local>


 Use 1.0 for dense matrices; it is ignored since dense matrices
are always dense.

   Barry


On Sun, 18 Feb 2007, Manav Bhatia wrote:

> Hi,
> 
>  I am performing a matrix multiplication of two dense matrices with both
> MatMatMult and MatMatMultTranspose.
>  What do I choose a fill factor as? According to the definition in the
> documentation: fiill = expected fill as ratio of nnz(C)/(nnz(A) + nnz(B)).
> So, If I have two full matrices A and B, I will get a full matrix as my
> result. Hence, the fill factor will be 0.5. However, both these methods give
> an error with a fill factor less than 1.0. Also, if I use PETSC_DFAULT as the
> fill argument, it agains results in an error since its value is -2.
> 
> Kindly help me with your advice here.
> 
> Thanks
> Manav



From hzhang at mcs.anl.gov  Sun Feb 18 14:12:25 2007
From: hzhang at mcs.anl.gov (Hong Zhang)
Date: Sun, 18 Feb 2007 14:12:25 -0600 (CST)
Subject: fill for matrix multiplication
In-Reply-To: <Pine.OSX.4.64.0702181232580.220@barry-smiths-computer.local>
References: <CCF65CBF-51F9-4751-89A9-67EFD8829D8E@u.washington.edu>
 <Pine.OSX.4.64.0702181232580.220@barry-smiths-computer.local>
Message-ID: <Pine.LNX.4.58.0702181408280.21948@terra.mcs.anl.gov>


Manav,

I've enabled fill=PETSC_DFAULT for MatMatMultSymbolic().
We recommend using MatMatMult() instead of
MatMatMultSymbolic() and MatMatMultNumeric().

Thanks for reporting the problem,

Hong

On Sun, 18 Feb 2007, Barry Smith wrote:

>
>  Use 1.0 for dense matrices; it is ignored since dense matrices
> are always dense.
>
>    Barry
>
>
> On Sun, 18 Feb 2007, Manav Bhatia wrote:
>
> > Hi,
> >
> >  I am performing a matrix multiplication of two dense matrices with both
> > MatMatMult and MatMatMultTranspose.
> >  What do I choose a fill factor as? According to the definition in the
> > documentation: fiill = expected fill as ratio of nnz(C)/(nnz(A) + nnz(B)).
> > So, If I have two full matrices A and B, I will get a full matrix as my
> > result. Hence, the fill factor will be 0.5. However, both these methods give
> > an error with a fill factor less than 1.0. Also, if I use PETSC_DFAULT as the
> > fill argument, it agains results in an error since its value is -2.
> >
> > Kindly help me with your advice here.
> >
> > Thanks
> > Manav
>
>



From jens.madsen at risoe.dk  Mon Feb 19 07:34:27 2007
From: jens.madsen at risoe.dk (jens.madsen at risoe.dk)
Date: Mon, 19 Feb 2007 14:34:27 +0100
Subject: Can I please come off the mailing list for a while
References: <CCF65CBF-51F9-4751-89A9-67EFD8829D8E@u.washington.edu>
 <Pine.OSX.4.64.0702181232580.220@barry-smiths-computer.local>
 <Pine.LNX.4.58.0702181408280.21948@terra.mcs.anl.gov>
Message-ID: <CA703655D571CF49A2E6DAED3FF5A49317CE18@EXCHG-VS1.risoe.dk>

 

 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/ms-tnef
Size: 2820 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070219/b5d110e4/attachment.bin>

From manav at u.washington.edu  Tue Feb 20 17:38:42 2007
From: manav at u.washington.edu (Manav Bhatia)
Date: Tue, 20 Feb 2007 15:38:42 -0800
Subject: TS 
Message-ID: <7A0AE5E2-4EB8-493E-AF41-14F3CCE3152A@u.washington.edu>

Hi

     I am preparing my code to use the TS capability of Petsc, and I  
had a few doubts to clear up. These primarily relate to the set up of  
the problem, and I have stated them below. Please correct me if I am  
wrong.

-- For a linear transient problem, I understand that the following  
different combinations are possible:
1>  A(t) U_t = f (t)
	where I will have to call the setLHSMatrix() and setRHSFunction()  
functions during set up.
2> A(t) U_t = B(t) U
	where I will have to call the setLHSMatrix() and setRHSMatrix()  
functions during set up.
3> U_t = f(t)
	where I call only the setRHSfunction()
4> U_t = A(t) U
	where I call only the setRHSMatrix()


-- For a nonlinear transient problem, I understand that the following  
different combinations are possible:
1>  A(t) U_t = f (U, t)
	where I will have to call the setLHSMatrix() and setRHSFunction(),  
and setRHSJacobian() functions during set up.
2> U_t = f(U, t)
	where I will have to call the setRHSfunction() and setRHSJacobian()  
functions during set up.


-- setting KSP and PC types
   From what I understand, I can set up the KSP and PC type of the  
transient solver, which will be used only if I specify an A matrix  
for the problem. In addition, I can independently set the KSP and PC  
type of the SNES used by TS, which is used by the time solver.


Kindly help me with your advice here.

Thanks,
Manav



From knepley at gmail.com  Tue Feb 20 17:47:54 2007
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 20 Feb 2007 17:47:54 -0600
Subject: TS
In-Reply-To: <7A0AE5E2-4EB8-493E-AF41-14F3CCE3152A@u.washington.edu>
References: <7A0AE5E2-4EB8-493E-AF41-14F3CCE3152A@u.washington.edu>
Message-ID: <a9f269830702201547m725bf6e0gf96a83710efe7f3@mail.gmail.com>

On 2/20/07, Manav Bhatia <manav at u.washington.edu> wrote:
> Hi
>
>      I am preparing my code to use the TS capability of Petsc, and I
> had a few doubts to clear up. These primarily relate to the set up of
> the problem, and I have stated them below. Please correct me if I am
> wrong.
>
> -- For a linear transient problem, I understand that the following
> different combinations are possible:
> 1>  A(t) U_t = f (t)
>         where I will have to call the setLHSMatrix() and setRHSFunction()
> functions during set up.
> 2> A(t) U_t = B(t) U
>         where I will have to call the setLHSMatrix() and setRHSMatrix()
> functions during set up.
> 3> U_t = f(t)
>         where I call only the setRHSfunction()
> 4> U_t = A(t) U
>         where I call only the setRHSMatrix()
>
>
> -- For a nonlinear transient problem, I understand that the following
> different combinations are possible:
> 1>  A(t) U_t = f (U, t)
>         where I will have to call the setLHSMatrix() and setRHSFunction(),
> and setRHSJacobian() functions during set up.
> 2> U_t = f(U, t)
>         where I will have to call the setRHSfunction() and setRHSJacobian()
> functions during set up.

This sounds correct. You do not have to specify the Jacobian, as we can
automatically give a FD approximation, but it is better to do so.

>
> -- setting KSP and PC types
>    From what I understand, I can set up the KSP and PC type of the
> transient solver, which will be used only if I specify an A matrix
> for the problem. In addition, I can independently set the KSP and PC
> type of the SNES used by TS, which is used by the time solver.

The solver is only used if you specify an implicit method. The KSP and PC
type are used by either a SNES or just the KSP itself depending on whether
the problem is nonlinear.

  Matt

>
> Kindly help me with your advice here.
>
> Thanks,
> Manav
>
>


-- 
One trouble is that despite this system, anyone who reads journals widely
and critically is forced to realize that there are scarcely any bars to eventual
publication. There seems to be no study too fragmented, no hypothesis too
trivial, no literature citation too biased or too egotistical, no design too
warped, no methodology too bungled, no presentation of results too
inaccurate, too obscure, and too contradictory, no analysis too self-serving,
no argument too circular, no conclusions too trifling or too unjustified, and
no grammar and syntax too offensive for a paper to end up in print. --
Drummond Rennie



From manav at u.washington.edu  Tue Feb 20 23:00:31 2007
From: manav at u.washington.edu (Manav Bhatia)
Date: Tue, 20 Feb 2007 21:00:31 -0800
Subject: TS
In-Reply-To: <a9f269830702201547m725bf6e0gf96a83710efe7f3@mail.gmail.com>
References: <7A0AE5E2-4EB8-493E-AF41-14F3CCE3152A@u.washington.edu> <a9f269830702201547m725bf6e0gf96a83710efe7f3@mail.gmail.com>
Message-ID: <29AB5E28-BBDB-47A0-AA8E-52D4946F41E4@u.washington.edu>

>
>>
>> -- setting KSP and PC types
>>    From what I understand, I can set up the KSP and PC type of the
>> transient solver, which will be used only if I specify an A matrix
>> for the problem. In addition, I can independently set the KSP and PC
>> type of the SNES used by TS, which is used by the time solver.
>
> The solver is only used if you specify an implicit method. The KSP  
> and PC
> type are used by either a SNES or just the KSP itself depending on  
> whether
> the problem is nonlinear.
>

So, if I have a problem with a LHS matrix, and I want to use an  
explicit method, then do I have to invert the matrix before asking  
the solver to run? i.e. the solver will not do that for me?

Thanks,
Manav




From hzhang at mcs.anl.gov  Tue Feb 20 23:16:04 2007
From: hzhang at mcs.anl.gov (Hong Zhang)
Date: Tue, 20 Feb 2007 23:16:04 -0600 (CST)
Subject: TS
In-Reply-To: <29AB5E28-BBDB-47A0-AA8E-52D4946F41E4@u.washington.edu>
References: <7A0AE5E2-4EB8-493E-AF41-14F3CCE3152A@u.washington.edu>
 <a9f269830702201547m725bf6e0gf96a83710efe7f3@mail.gmail.com>
 <29AB5E28-BBDB-47A0-AA8E-52D4946F41E4@u.washington.edu>
Message-ID: <Pine.LNX.4.58.0702202313090.24818@terra.mcs.anl.gov>



On Tue, 20 Feb 2007, Manav Bhatia wrote:

> >
>
> So, if I have a problem with a LHS matrix, and I want to use an
> explicit method, then do I have to invert the matrix before asking
> the solver to run? i.e. the solver will not do that for me?

The LHS matrix formulation is not implemented for the explicit method.
Do you have such application?

Hong



From manav at u.washington.edu  Wed Feb 21 00:28:48 2007
From: manav at u.washington.edu (Manav Bhatia)
Date: Tue, 20 Feb 2007 22:28:48 -0800
Subject: TS
In-Reply-To: <Pine.LNX.4.58.0702202313090.24818@terra.mcs.anl.gov>
References: <7A0AE5E2-4EB8-493E-AF41-14F3CCE3152A@u.washington.edu> <a9f269830702201547m725bf6e0gf96a83710efe7f3@mail.gmail.com> <29AB5E28-BBDB-47A0-AA8E-52D4946F41E4@u.washington.edu> <Pine.LNX.4.58.0702202313090.24818@terra.mcs.anl.gov>
Message-ID: <A3F3B4E2-930B-4E2A-9DC2-B93F1669D137@u.washington.edu>


On Feb 20, 2007, at 9:16 PM, Hong Zhang wrote:

>
>
> On Tue, 20 Feb 2007, Manav Bhatia wrote:
>
>>>
>>
>> So, if I have a problem with a LHS matrix, and I want to use an
>> explicit method, then do I have to invert the matrix before asking
>> the solver to run? i.e. the solver will not do that for me?
>
> The LHS matrix formulation is not implemented for the explicit method.
> Do you have such application?

Well, I am working with a conduction heat transfer finite element  
model, which has the following equation set:

[C(t,{T})] d{T}/dt = {F(t,{T})} - [K(t,{T})] {T}

with initial conditions
{T(0)} = {T0}

So, I have a problem which has a LHS matrix, which is also dependent  
on the primary variable (which is temperature {T}).
In the simple case, ofcourse, we can neglect this dependence on  
temperature (for [C]), but that has limited applicability for my  
problem, since the [C] matrix has non-negligible nonlinearities.
So, I am looking for ways to formulate my problem to use the Petsc  
solvers. The best option that I can think  of is to restate the  
problem as:

d{T}/dt = [C(t,{T})]^(-1) ({F(t,{T})} - [K(t,{T})] {T})

where I can now specify the RHS function and its jacobian (I will  
provide the jacobian, so no need to use finite differencing), and use  
an explicit / implicit solver.

However, if I assume a linear problem, then I am left with a case of

[C] d{T}/dt = {F(t)} - [K]{T}

Here, I could either restate the problem in the same way as I did  
above, or I could specify a LHS matrix (in this case [C]), and ask  
the solver to handle it. But, from our previous email exchanges, it  
seems like I will have to use an implicit solver for the same, since  
an explicit solver will not handle a LHS matrix.

Kindly correct me if I am wrong.

Thanks,
Manav




From hzhang at mcs.anl.gov  Wed Feb 21 09:24:55 2007
From: hzhang at mcs.anl.gov (Hong Zhang)
Date: Wed, 21 Feb 2007 09:24:55 -0600 (CST)
Subject: TS
In-Reply-To: <A3F3B4E2-930B-4E2A-9DC2-B93F1669D137@u.washington.edu>
References: <7A0AE5E2-4EB8-493E-AF41-14F3CCE3152A@u.washington.edu>
 <a9f269830702201547m725bf6e0gf96a83710efe7f3@mail.gmail.com>
 <29AB5E28-BBDB-47A0-AA8E-52D4946F41E4@u.washington.edu>
 <Pine.LNX.4.58.0702202313090.24818@terra.mcs.anl.gov>
 <A3F3B4E2-930B-4E2A-9DC2-B93F1669D137@u.washington.edu>
Message-ID: <Pine.LNX.4.58.0702210904500.31356@terra.mcs.anl.gov>

Manav,

Since you have to solve equation at each time-step,
I would suggest using Crank-Nicholson method, which combines
implicit and explicit methods and gives
higher order of approximation than Euler and Backward Euler
methods that petsc supports.

However, Crank-Nicholson method is not supported by the
petsc release. You must use petsc-dev
(see
http://www-unix.mcs.anl.gov/petsc/petsc-as/developers/index.html#Obtaining
on how to get it).

Additional note: in petsc-dev, the interface functions
TSSetRHSMatrix() and TSSetLHSMatrix() are replaced by TSSetMatrices().
An example of using cn method is petsc-dev/src/ts/examples/tests/ex1.c
See the targets of
"runex1_cn_*" in petsc-dev/src/ts/examples/tests/makefile
on how to run this example.

Use of LHS matrices in petsc is not well tested yet.
I've been looking for examples that involve LHS matrix.
Would you like contribute your application, or a simplified version
of it to us as a test example? We'll put your name
in the contributed example.

Thanks,

Hong
>
> [C(t,{T})] d{T}/dt = {F(t,{T})} - [K(t,{T})] {T}
>
> with initial conditions
> {T(0)} = {T0}
>
> So, I have a problem which has a LHS matrix, which is also dependent
> on the primary variable (which is temperature {T}).
> In the simple case, ofcourse, we can neglect this dependence on
> temperature (for [C]), but that has limited applicability for my
> problem, since the [C] matrix has non-negligible nonlinearities.
> So, I am looking for ways to formulate my problem to use the Petsc
> solvers. The best option that I can think  of is to restate the
> problem as:
>
> d{T}/dt = [C(t,{T})]^(-1) ({F(t,{T})} - [K(t,{T})] {T})
>
> where I can now specify the RHS function and its jacobian (I will
> provide the jacobian, so no need to use finite differencing), and use
> an explicit / implicit solver.
>
> However, if I assume a linear problem, then I am left with a case of
>
> [C] d{T}/dt = {F(t)} - [K]{T}
>
> Here, I could either restate the problem in the same way as I did
> above, or I could specify a LHS matrix (in this case [C]), and ask
> the solver to handle it. But, from our previous email exchanges, it
> seems like I will have to use an implicit solver for the same, since
> an explicit solver will not handle a LHS matrix.
>
> Kindly correct me if I am wrong.
>
> Thanks,
> Manav
>
>
>



From diosady at MIT.EDU  Wed Feb 21 11:14:24 2007
From: diosady at MIT.EDU (Laslo Tibor Diosady)
Date: Wed, 21 Feb 2007 12:14:24 -0500
Subject: Defining my own reordering method for ILU
Message-ID: <1172078064.18714.17.camel@splinter.mit.edu>

Hi,

I want to be able to define an ordering type for the ILU factorization
to be used as a preconditioner to GMRES. I have successfully used
PCFactorSetMatOrdering(), to set standard types, however I have my own
reordering method which I would like to be able to use.

Since my reordering method is based on information not found within the
matrix and I may want to solve several different systems with the same
ordering, what I really want to be able to do is create an index set in
advance and pass it is to be used for the reordering.

Is this possible, and how would you suggest I go about doing this?

Thanks,

Laslo



From bsmith at mcs.anl.gov  Wed Feb 21 11:27:24 2007
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Wed, 21 Feb 2007 11:27:24 -0600 (CST)
Subject: Defining my own reordering method for ILU
In-Reply-To: <1172078064.18714.17.camel@splinter.mit.edu>
References: <1172078064.18714.17.camel@splinter.mit.edu>
Message-ID: <Pine.OSX.4.64.0702211119410.220@barry-smiths-computer.local>


  Laslo,

   You can do your ordering up front to generate an index set.
But you will still need to write a routine MatOrdering_MyOrdering() that 
returns the isrow and iscol, then register the routine
with MatOrderingRegisterDynamic(). Then use PCFactorSetMatOrdering()
to tell the PC to use your new ordering routine. 

To have the preconditioner reuse the reordering for several matrices
use PCFactorSetReuseOrdering(). 

  Good luck,

   Barry

I know it is a little strange to need to provide the call back function
MatOrdering_MyOrdering() that simply returns the ordering you already
created but that is the only way to get the ordering into the right
place inside the PC objects.

The MatOrdering_MyOrdering() simply sets the isrow and iscol pointer
and returns.

On Wed, 21 Feb 2007, Laslo Tibor Diosady wrote:

> Hi,
> 
> I want to be able to define an ordering type for the ILU factorization
> to be used as a preconditioner to GMRES. I have successfully used
> PCFactorSetMatOrdering(), to set standard types, however I have my own
> reordering method which I would like to be able to use.
> 
> Since my reordering method is based on information not found within the
> matrix and I may want to solve several different systems with the same
> ordering, what I really want to be able to do is create an index set in
> advance and pass it is to be used for the reordering.
> 
> Is this possible, and how would you suggest I go about doing this?
> 
> Thanks,
> 
> Laslo
> 
> 



From diosady at MIT.EDU  Wed Feb 21 11:53:16 2007
From: diosady at MIT.EDU (Laslo Tibor Diosady)
Date: Wed, 21 Feb 2007 12:53:16 -0500
Subject: Defining my own reordering method for ILU
In-Reply-To: <Pine.OSX.4.64.0702211119410.220@barry-smiths-computer.local>
References: <1172078064.18714.17.camel@splinter.mit.edu>
	 <Pine.OSX.4.64.0702211119410.220@barry-smiths-computer.local>
Message-ID: <1172080396.18714.24.camel@splinter.mit.edu>

Hi Barry,

If I understand how this works then I need to create a function 
MatOrdering_MyOrdering(Mat mat,const MatOrderingType type,IS *irow,IS
*icol)

My problem is that in order to compute my matrix ordering I need data
other than the matrix as input. 

I can't see how I can do that if those are the only four arguments
passed to the function.

Thanks,

Laslo



On Wed, 2007-02-21 at 11:27 -0600, Barry Smith wrote:
>   Laslo,
> 
>    You can do your ordering up front to generate an index set.
> But you will still need to write a routine MatOrdering_MyOrdering() that 
> returns the isrow and iscol, then register the routine
> with MatOrderingRegisterDynamic(). Then use PCFactorSetMatOrdering()
> to tell the PC to use your new ordering routine. 
> 
> To have the preconditioner reuse the reordering for several matrices
> use PCFactorSetReuseOrdering(). 
> 
>   Good luck,
> 
>    Barry
> 
> I know it is a little strange to need to provide the call back function
> MatOrdering_MyOrdering() that simply returns the ordering you already
> created but that is the only way to get the ordering into the right
> place inside the PC objects.
> 
> The MatOrdering_MyOrdering() simply sets the isrow and iscol pointer
> and returns.
> 
> On Wed, 21 Feb 2007, Laslo Tibor Diosady wrote:
> 
> > Hi,
> > 
> > I want to be able to define an ordering type for the ILU factorization
> > to be used as a preconditioner to GMRES. I have successfully used
> > PCFactorSetMatOrdering(), to set standard types, however I have my own
> > reordering method which I would like to be able to use.
> > 
> > Since my reordering method is based on information not found within the
> > matrix and I may want to solve several different systems with the same
> > ordering, what I really want to be able to do is create an index set in
> > advance and pass it is to be used for the reordering.
> > 
> > Is this possible, and how would you suggest I go about doing this?
> > 
> > Thanks,
> > 
> > Laslo
> > 
> > 
> 



From manav at u.washington.edu  Wed Feb 21 13:04:50 2007
From: manav at u.washington.edu (Manav Bhatia)
Date: Wed, 21 Feb 2007 11:04:50 -0800
Subject: TS
In-Reply-To: <Pine.LNX.4.58.0702210904500.31356@terra.mcs.anl.gov>
References: <7A0AE5E2-4EB8-493E-AF41-14F3CCE3152A@u.washington.edu> <a9f269830702201547m725bf6e0gf96a83710efe7f3@mail.gmail.com> <29AB5E28-BBDB-47A0-AA8E-52D4946F41E4@u.washington.edu> <Pine.LNX.4.58.0702202313090.24818@terra.mcs.anl.gov> <A3F3B4E2-930B-4E2A-9DC2-B93F1669D137@u.washington.edu> <Pine.LNX.4.58.0702210904500.31356@terra.mcs.anl.gov>
Message-ID: <3D428AD2-EFED-4631-A70C-7F9AC5F08A30@u.washington.edu>

Hong,

	In my case, my LHS matrix is also dependent on the primary variable

>> [C(t,{T})] d{T}/dt = {F(t,{T})}

Would this case be handled by CN? or does it handle only a constant  
or time dependent LHS matrix?
For a constant/time-dependent LHS matrix, the jacobian of the RHS is  
same as the steady-state nonlinear analysis. Otherwise, I need to  
change the jacobian definition too, I think.

My main concern is:
If I use the restated problem (from my previous mail)
>> d{T}/dt = [C(t,{T})]^(-1) ({F(t,{T})} - [K(t,{T})] {T})

I know how to calculate the jacobian of the RHS, even though it  
requires inversion. With this, I can use any of the explicit/implicit  
methods.
But if I do not state the problem like this, what do I do about the  
LHS matrix that is dependent on the primary variable?

I would be happy to contribute an example. But it will have to wait a  
about 2 weeks till I can start working on it.

Thanks,
Manav



On Feb 21, 2007, at 7:24 AM, Hong Zhang wrote:

> Manav,
>
> Since you have to solve equation at each time-step,
> I would suggest using Crank-Nicholson method, which combines
> implicit and explicit methods and gives
> higher order of approximation than Euler and Backward Euler
> methods that petsc supports.
>
> However, Crank-Nicholson method is not supported by the
> petsc release. You must use petsc-dev
> (see
> http://www-unix.mcs.anl.gov/petsc/petsc-as/developers/ 
> index.html#Obtaining
> on how to get it).
>
> Additional note: in petsc-dev, the interface functions
> TSSetRHSMatrix() and TSSetLHSMatrix() are replaced by TSSetMatrices().
> An example of using cn method is petsc-dev/src/ts/examples/tests/ex1.c
> See the targets of
> "runex1_cn_*" in petsc-dev/src/ts/examples/tests/makefile
> on how to run this example.
>
> Use of LHS matrices in petsc is not well tested yet.
> I've been looking for examples that involve LHS matrix.
> Would you like contribute your application, or a simplified version
> of it to us as a test example? We'll put your name
> in the contributed example.
>
> Thanks,
>
> Hong
>>
>> [C(t,{T})] d{T}/dt = {F(t,{T})} - [K(t,{T})] {T}
>>
>> with initial conditions
>> {T(0)} = {T0}
>>
>> So, I have a problem which has a LHS matrix, which is also dependent
>> on the primary variable (which is temperature {T}).
>> In the simple case, ofcourse, we can neglect this dependence on
>> temperature (for [C]), but that has limited applicability for my
>> problem, since the [C] matrix has non-negligible nonlinearities.
>> So, I am looking for ways to formulate my problem to use the Petsc
>> solvers. The best option that I can think  of is to restate the
>> problem as:
>>
>> d{T}/dt = [C(t,{T})]^(-1) ({F(t,{T})} - [K(t,{T})] {T})
>>
>> where I can now specify the RHS function and its jacobian (I will
>> provide the jacobian, so no need to use finite differencing), and use
>> an explicit / implicit solver.
>>
>> However, if I assume a linear problem, then I am left with a case of
>>
>> [C] d{T}/dt = {F(t)} - [K]{T}
>>
>> Here, I could either restate the problem in the same way as I did
>> above, or I could specify a LHS matrix (in this case [C]), and ask
>> the solver to handle it. But, from our previous email exchanges, it
>> seems like I will have to use an implicit solver for the same, since
>> an explicit solver will not handle a LHS matrix.
>>
>> Kindly correct me if I am wrong.
>>
>> Thanks,
>> Manav
>>
>>
>>
>



From hzhang at mcs.anl.gov  Wed Feb 21 14:13:42 2007
From: hzhang at mcs.anl.gov (Hong Zhang)
Date: Wed, 21 Feb 2007 14:13:42 -0600 (CST)
Subject: TS
In-Reply-To: <3D428AD2-EFED-4631-A70C-7F9AC5F08A30@u.washington.edu>
References: <7A0AE5E2-4EB8-493E-AF41-14F3CCE3152A@u.washington.edu>
 <a9f269830702201547m725bf6e0gf96a83710efe7f3@mail.gmail.com>
 <29AB5E28-BBDB-47A0-AA8E-52D4946F41E4@u.washington.edu>
 <Pine.LNX.4.58.0702202313090.24818@terra.mcs.anl.gov>
 <A3F3B4E2-930B-4E2A-9DC2-B93F1669D137@u.washington.edu>
 <Pine.LNX.4.58.0702210904500.31356@terra.mcs.anl.gov>
 <3D428AD2-EFED-4631-A70C-7F9AC5F08A30@u.washington.edu>
Message-ID: <Pine.LNX.4.58.0702211354360.5017@terra.mcs.anl.gov>


Manav,
>
> >> [C(t,{T})] d{T}/dt = {F(t,{T})}

The current code should be able to handle
[C(t_n,{T_n})] {T_(n+1) - T_n}/dt = {F(t,{T})},
i.e., the LHS matrix uses explicit scheme.
As I mentioned, the codes for cn and LHS matrix
are buggy and not sufficiently tested.
Additional coding is likely needed.

How about start from the formulation
>
> >> d{T}/dt = [C(t,{T})]^(-1) ({F(t,{T})} - [K(t,{T})] {T})

and get a working code. Then pass it to me.
I'll use it to test the above LHS matrix formulation
and improve petsc cn method.

> I would be happy to contribute an example. But it will have to wait a
> about 2 weeks till I can start working on it.

Fine with us. We can help to optimize it.

Hong
>
>
> On Feb 21, 2007, at 7:24 AM, Hong Zhang wrote:
>
> > Manav,
> >
> > Since you have to solve equation at each time-step,
> > I would suggest using Crank-Nicholson method, which combines
> > implicit and explicit methods and gives
> > higher order of approximation than Euler and Backward Euler
> > methods that petsc supports.
> >
> > However, Crank-Nicholson method is not supported by the
> > petsc release. You must use petsc-dev
> > (see
> > http://www-unix.mcs.anl.gov/petsc/petsc-as/developers/
> > index.html#Obtaining
> > on how to get it).
> >
> > Additional note: in petsc-dev, the interface functions
> > TSSetRHSMatrix() and TSSetLHSMatrix() are replaced by TSSetMatrices().
> > An example of using cn method is petsc-dev/src/ts/examples/tests/ex1.c
> > See the targets of
> > "runex1_cn_*" in petsc-dev/src/ts/examples/tests/makefile
> > on how to run this example.
> >
> > Use of LHS matrices in petsc is not well tested yet.
> > I've been looking for examples that involve LHS matrix.
> > Would you like contribute your application, or a simplified version
> > of it to us as a test example? We'll put your name
> > in the contributed example.
> >
> > Thanks,
> >
> > Hong
> >>
> >> [C(t,{T})] d{T}/dt = {F(t,{T})} - [K(t,{T})] {T}
> >>
> >> with initial conditions
> >> {T(0)} = {T0}
> >>
> >> So, I have a problem which has a LHS matrix, which is also dependent
> >> on the primary variable (which is temperature {T}).
> >> In the simple case, ofcourse, we can neglect this dependence on
> >> temperature (for [C]), but that has limited applicability for my
> >> problem, since the [C] matrix has non-negligible nonlinearities.
> >> So, I am looking for ways to formulate my problem to use the Petsc
> >> solvers. The best option that I can think  of is to restate the
> >> problem as:
> >>
> >> d{T}/dt = [C(t,{T})]^(-1) ({F(t,{T})} - [K(t,{T})] {T})
> >>
> >> where I can now specify the RHS function and its jacobian (I will
> >> provide the jacobian, so no need to use finite differencing), and use
> >> an explicit / implicit solver.
> >>
> >> However, if I assume a linear problem, then I am left with a case of
> >>
> >> [C] d{T}/dt = {F(t)} - [K]{T}
> >>
> >> Here, I could either restate the problem in the same way as I did
> >> above, or I could specify a LHS matrix (in this case [C]), and ask
> >> the solver to handle it. But, from our previous email exchanges, it
> >> seems like I will have to use an implicit solver for the same, since
> >> an explicit solver will not handle a LHS matrix.
> >>
> >> Kindly correct me if I am wrong.
> >>
> >> Thanks,
> >> Manav
> >>
> >>
> >>
> >
>
>



From jinzishuai at yahoo.com  Wed Feb 21 14:37:58 2007
From: jinzishuai at yahoo.com (Shi Jin)
Date: Wed, 21 Feb 2007 12:37:58 -0800 (PST)
Subject: efficiency of tranposing a Matrix?
Message-ID: <20070221203758.2941.qmail@web36206.mail.mud.yahoo.com>

Hi there,

I have a code that keeps on using the same matrix L
and its transpose in all time updates.
I can improve the performance of the code by replacing
the MatMultTranspose() with MatMult() and computing
the transposed matrix at the beginning of the code for
only once. The cost is of course extra storage of the
transposed matrix.

However, I have a question regarding the efficiency of
transposing the matrix. I created the Matrix L with
MPIAIJ and preallocated the proper memory for it.
Then I call MatTranspose(L,&LT) to compute LT which is
the transposed L. But I noticed that this process is
extremely slow, 6 times slower than the creation of
Matrix L itself.

The first question is do I need to preallocate the
memory  for LT also? I didn't do it since I suppose
PETSc is smart enough to figure out the necessary
storage.

Secondly, I am not sure why MatTranspose is so slow. I
understand in order to transpose a Matrix, one may
need to call MPI_Alltoall which is extremely
expensive. But it seems trivial that I can go through
a similar process of creating the Matrix L and be much
faster. I am not sure how MatTraspose() is implemented
and whether I should actually compose LT instead of
transpose L.

Thank you very much.

Shi


 
____________________________________________________________________________________
Don't get soaked.  Take a quick peak at the forecast
with the Yahoo! Search weather shortcut.
http://tools.search.yahoo.com/shortcuts/#loc_weather



From hzhang at mcs.anl.gov  Wed Feb 21 15:03:12 2007
From: hzhang at mcs.anl.gov (Hong Zhang)
Date: Wed, 21 Feb 2007 15:03:12 -0600 (CST)
Subject: efficiency of tranposing a Matrix?
In-Reply-To: <20070221203758.2941.qmail@web36206.mail.mud.yahoo.com>
References: <20070221203758.2941.qmail@web36206.mail.mud.yahoo.com>
Message-ID: <Pine.LNX.4.58.0702211444490.5017@terra.mcs.anl.gov>


Shi,

Checking MatTranspose_MPIAIJ(), I find that
the preallocation is not implemented.
This is likely the reason of slowdown.
>
> I have a code that keeps on using the same matrix L
> and its transpose in all time updates.
> I can improve the performance of the code by replacing
> the MatMultTranspose() with MatMult() and computing
> the transposed matrix at the beginning of the code for
> only once. The cost is of course extra storage of the
> transposed matrix.
>
> However, I have a question regarding the efficiency of
> transposing the matrix. I created the Matrix L with
> MPIAIJ and preallocated the proper memory for it.
> Then I call MatTranspose(L,&LT) to compute LT which is
> the transposed L. But I noticed that this process is
> extremely slow, 6 times slower than the creation of
> Matrix L itself.
>
> The first question is do I need to preallocate the
> memory  for LT also? I didn't do it since I suppose
> PETSc is smart enough to figure out the necessary
> storage.

Preallocation of LT is non-trivial, requring
all-to-all communications. I'll add it into MatTranspose_MPIAIJ().

> Secondly, I am not sure why MatTranspose is so slow. I
> understand in order to transpose a Matrix, one may
> need to call MPI_Alltoall which is extremely
> expensive. But it seems trivial that I can go through
> a similar process of creating the Matrix L and be much
> faster. I am not sure how MatTraspose() is implemented
> and whether I should actually compose LT instead of
> transpose L.

If you know the non-zero structure of LT without communication,
creating it directly would outperform petsc MatTranspose().
See MatMatTranspose_MPIAIJ() in
petsc/src/mat/impls/aij/mpi/mpiaij.c for details.

Hong



From bsmith at mcs.anl.gov  Wed Feb 21 15:04:14 2007
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Wed, 21 Feb 2007 15:04:14 -0600 (CST)
Subject: Defining my own reordering method for ILU
In-Reply-To: <1172080396.18714.24.camel@splinter.mit.edu>
References: <1172078064.18714.17.camel@splinter.mit.edu> 
 <Pine.OSX.4.64.0702211119410.220@barry-smiths-computer.local>
 <1172080396.18714.24.camel@splinter.mit.edu>
Message-ID: <Pine.OSX.4.64.0702211443290.220@barry-smiths-computer.local>


  Laslo,

   I would just "cheat" and compute the ordering up front and 
then just have it as a global variable that you access in the routine.

   The more "PETSc" way would be to use PetscObjectCompose to stick the
ordering you have computed in the matrix then PetscObjectQuery() to get
it out inside the MyOrdering routine.

   Barry

On Wed, 21 Feb 2007, Laslo Tibor Diosady wrote:

> Hi Barry,
> 
> If I understand how this works then I need to create a function 
> MatOrdering_MyOrdering(Mat mat,const MatOrderingType type,IS *irow,IS
> *icol)
> 
> My problem is that in order to compute my matrix ordering I need data
> other than the matrix as input. 
> 
> I can't see how I can do that if those are the only four arguments
> passed to the function.
> 
> Thanks,
> 
> Laslo
> 
> 
> 
> On Wed, 2007-02-21 at 11:27 -0600, Barry Smith wrote:
> >   Laslo,
> > 
> >    You can do your ordering up front to generate an index set.
> > But you will still need to write a routine MatOrdering_MyOrdering() that 
> > returns the isrow and iscol, then register the routine
> > with MatOrderingRegisterDynamic(). Then use PCFactorSetMatOrdering()
> > to tell the PC to use your new ordering routine. 
> > 
> > To have the preconditioner reuse the reordering for several matrices
> > use PCFactorSetReuseOrdering(). 
> > 
> >   Good luck,
> > 
> >    Barry
> > 
> > I know it is a little strange to need to provide the call back function
> > MatOrdering_MyOrdering() that simply returns the ordering you already
> > created but that is the only way to get the ordering into the right
> > place inside the PC objects.
> > 
> > The MatOrdering_MyOrdering() simply sets the isrow and iscol pointer
> > and returns.
> > 
> > On Wed, 21 Feb 2007, Laslo Tibor Diosady wrote:
> > 
> > > Hi,
> > > 
> > > I want to be able to define an ordering type for the ILU factorization
> > > to be used as a preconditioner to GMRES. I have successfully used
> > > PCFactorSetMatOrdering(), to set standard types, however I have my own
> > > reordering method which I would like to be able to use.
> > > 
> > > Since my reordering method is based on information not found within the
> > > matrix and I may want to solve several different systems with the same
> > > ordering, what I really want to be able to do is create an index set in
> > > advance and pass it is to be used for the reordering.
> > > 
> > > Is this possible, and how would you suggest I go about doing this?
> > > 
> > > Thanks,
> > > 
> > > Laslo
> > > 
> > > 
> > 
> 
> 



From diosady at MIT.EDU  Wed Feb 21 19:36:05 2007
From: diosady at MIT.EDU (Laslo T. Diosady)
Date: Wed, 21 Feb 2007 20:36:05 -0500
Subject: Defining my own reordering method for ILU
In-Reply-To: <Pine.OSX.4.64.0702211443290.220@barry-smiths-computer.local>
References: <1172078064.18714.17.camel@splinter.mit.edu>  <Pine.OSX.4.64.0702211119410.220@barry-smiths-computer.local> <1172080396.18714.24.camel@splinter.mit.edu> <Pine.OSX.4.64.0702211443290.220@barry-smiths-computer.local>
Message-ID: <d0e1f6389f6b6d013df69d23b7f47bef@mit.edu>

Hi Barry,

I tried the "PETSc" way and was successful.

Thanks for the help,

Laslo


On Feb 21, 2007, at 4:04 PM, Barry Smith wrote:

>
>   Laslo,
>
>    I would just "cheat" and compute the ordering up front and
> then just have it as a global variable that you access in the routine.
>
>    The more "PETSc" way would be to use PetscObjectCompose to stick the
> ordering you have computed in the matrix then PetscObjectQuery() to get
> it out inside the MyOrdering routine.
>
>    Barry
>
> On Wed, 21 Feb 2007, Laslo Tibor Diosady wrote:
>
>> Hi Barry,
>>
>> If I understand how this works then I need to create a function
>> MatOrdering_MyOrdering(Mat mat,const MatOrderingType type,IS *irow,IS
>> *icol)
>>
>> My problem is that in order to compute my matrix ordering I need data
>> other than the matrix as input.
>>
>> I can't see how I can do that if those are the only four arguments
>> passed to the function.
>>
>> Thanks,
>>
>> Laslo
>>
>>
>>
>> On Wed, 2007-02-21 at 11:27 -0600, Barry Smith wrote:
>>>   Laslo,
>>>
>>>    You can do your ordering up front to generate an index set.
>>> But you will still need to write a routine MatOrdering_MyOrdering() 
>>> that
>>> returns the isrow and iscol, then register the routine
>>> with MatOrderingRegisterDynamic(). Then use PCFactorSetMatOrdering()
>>> to tell the PC to use your new ordering routine.
>>>
>>> To have the preconditioner reuse the reordering for several matrices
>>> use PCFactorSetReuseOrdering().
>>>
>>>   Good luck,
>>>
>>>    Barry
>>>
>>> I know it is a little strange to need to provide the call back 
>>> function
>>> MatOrdering_MyOrdering() that simply returns the ordering you already
>>> created but that is the only way to get the ordering into the right
>>> place inside the PC objects.
>>>
>>> The MatOrdering_MyOrdering() simply sets the isrow and iscol pointer
>>> and returns.
>>>
>>> On Wed, 21 Feb 2007, Laslo Tibor Diosady wrote:
>>>
>>>> Hi,
>>>>
>>>> I want to be able to define an ordering type for the ILU 
>>>> factorization
>>>> to be used as a preconditioner to GMRES. I have successfully used
>>>> PCFactorSetMatOrdering(), to set standard types, however I have my 
>>>> own
>>>> reordering method which I would like to be able to use.
>>>>
>>>> Since my reordering method is based on information not found within 
>>>> the
>>>> matrix and I may want to solve several different systems with the 
>>>> same
>>>> ordering, what I really want to be able to do is create an index 
>>>> set in
>>>> advance and pass it is to be used for the reordering.
>>>>
>>>> Is this possible, and how would you suggest I go about doing this?
>>>>
>>>> Thanks,
>>>>
>>>> Laslo
>>>>
>>>>
>>>
>>
>>
>



From zonexo at gmail.com  Thu Feb 22 01:33:23 2007
From: zonexo at gmail.com (Ben Tay)
Date: Thu, 22 Feb 2007 15:33:23 +0800
Subject: Using Compaq visual fortran with PETSc and not installing Intel MKL/MPICH
Message-ID: <804ab5d40702212333l71dccda4yab44aa730effae9a@mail.gmail.com>

Hi,

I have been using PETSc with visual fortran/intel mkl/mpich installed. This
has the same configuration as the configuration file .dsw supplied by PETSc.
However, now using another of my school's computer, MKL and MPICH are not
installed.

Is there anyway I can still use compaq visual fortran with PETSc? By using
--download-f-blas-lapack=1, how can I make them work?



Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070222/8456074a/attachment.htm>

From balay at mcs.anl.gov  Thu Feb 22 09:15:57 2007
From: balay at mcs.anl.gov (Satish Balay)
Date: Thu, 22 Feb 2007 09:15:57 -0600 (CST)
Subject: Using Compaq visual fortran with PETSc and not installing Intel
 MKL/MPICH
In-Reply-To: <804ab5d40702212333l71dccda4yab44aa730effae9a@mail.gmail.com>
References: <804ab5d40702212333l71dccda4yab44aa730effae9a@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0702220914550.8990@asterix>

On Thu, 22 Feb 2007, Ben Tay wrote:

> Hi,
> 
> I have been using PETSc with visual fortran/intel mkl/mpich installed. This
> has the same configuration as the configuration file .dsw supplied by PETSc.
> However, now using another of my school's computer, MKL and MPICH are not
> installed.
> 
> Is there anyway I can still use compaq visual fortran with PETSc? By using
> --download-f-blas-lapack=1, how can I make them work?

For blas/lapack the above should work, but for MPI - you'll have to
either install mpich or use --with-mpi=0.

Satish



From hzhang at mcs.anl.gov  Thu Feb 22 16:35:54 2007
From: hzhang at mcs.anl.gov (Hong Zhang)
Date: Thu, 22 Feb 2007 16:35:54 -0600 (CST)
Subject: efficiency of tranposing a Matrix?
In-Reply-To: <20070221203758.2941.qmail@web36206.mail.mud.yahoo.com>
References: <20070221203758.2941.qmail@web36206.mail.mud.yahoo.com>
Message-ID: <Pine.LNX.4.58.0702221630140.2621@terra.mcs.anl.gov>


Shi,

I added preallocation to MatTranspose_MPIAIJ(),
in which, d_nnz is computed from L, but
o_nnz is set as d_nnz.
This avoids data communication,
and allocates sufficient space in most cases,
I believe :-)

You may either get petsc-dev,
or replace MatTranspose_MPIAIJ() in your
~petsc/src/mat/impls/aij/mpi/mpiaij.c
with the one attached. Then rebuild the petsc lib.

Let us know if you still have slow down
in MatTranspose().


Hong

On Wed, 21 Feb 2007, Shi Jin wrote:

> Hi there,
>
> I have a code that keeps on using the same matrix L
> and its transpose in all time updates.
> I can improve the performance of the code by replacing
> the MatMultTranspose() with MatMult() and computing
> the transposed matrix at the beginning of the code for
> only once. The cost is of course extra storage of the
> transposed matrix.
>
> However, I have a question regarding the efficiency of
> transposing the matrix. I created the Matrix L with
> MPIAIJ and preallocated the proper memory for it.
> Then I call MatTranspose(L,&LT) to compute LT which is
> the transposed L. But I noticed that this process is
> extremely slow, 6 times slower than the creation of
> Matrix L itself.
>
> The first question is do I need to preallocate the
> memory  for LT also? I didn't do it since I suppose
> PETSc is smart enough to figure out the necessary
> storage.
>
> Secondly, I am not sure why MatTranspose is so slow. I
> understand in order to transpose a Matrix, one may
> need to call MPI_Alltoall which is extremely
> expensive. But it seems trivial that I can go through
> a similar process of creating the Matrix L and be much
> faster. I am not sure how MatTraspose() is implemented
> and whether I should actually compose LT instead of
> transpose L.
>
> Thank you very much.
>
> Shi
>
>
>
> ____________________________________________________________________________________
> Don't get soaked.  Take a quick peak at the forecast
> with the Yahoo! Search weather shortcut.
> http://tools.search.yahoo.com/shortcuts/#loc_weather
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mattranspose.c
Type: text/x-csrc
Size: 2314 bytes
Desc: mattranspose.c
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070222/3493b5ee/attachment.c>

From niriedith at gmail.com  Fri Feb 23 09:08:42 2007
From: niriedith at gmail.com (Niriedith Karina )
Date: Fri, 23 Feb 2007 11:08:42 -0400
Subject: about Unstructured Meshes
Message-ID: <a414309d0702230708y6162de61gfd75156d2a3dbcf5@mail.gmail.com>

Hi !!

I'm new here...and i'm new a petsc user.. :P
I want to know if petsc has support for meshes...
I was reading about that and i need create a mesh but all the software that
i see are comercial ...so I use petsc for linear solver and matrices so if
petsc create meshes it would be but easy for me  .. so..
help me ! :D

Thanks anyway...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070223/3dd3483e/attachment.htm>

From niriedith at gmail.com  Fri Feb 23 09:11:54 2007
From: niriedith at gmail.com (Niriedith Karina )
Date: Fri, 23 Feb 2007 11:11:54 -0400
Subject: about SAMG
Message-ID: <a414309d0702230711v3abf11bevff5198aeb100cf1d@mail.gmail.com>

Hi again :P

how can work whit the algebraic multigrid in petsc? i really don't know :(

thaks :D
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070223/5e4147bd/attachment.htm>

From knepley at gmail.com  Fri Feb 23 09:13:23 2007
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 23 Feb 2007 09:13:23 -0600
Subject: about Unstructured Meshes
In-Reply-To: <a414309d0702230708y6162de61gfd75156d2a3dbcf5@mail.gmail.com>
References: <a414309d0702230708y6162de61gfd75156d2a3dbcf5@mail.gmail.com>
Message-ID: <a9f269830702230713k5d3c94bcw9aba8ba51689cd6@mail.gmail.com>

On 2/23/07, Niriedith Karina <niriedith at gmail.com> wrote:
> Hi !!
>
> I'm new here...and i'm new a petsc user.. :P
> I want to know if petsc has support for meshes...
> I was reading about that and i need create a mesh but all the software that
> i see are comercial ...so I use petsc for linear solver and matrices so if
> petsc create meshes it would be but easy for me  .. so..
> help me ! :D

1) PETSc does not make meshes, but there are good free meshing packages,
    Triangle in 2D and TetGen in 3D.

2) The unstructured mesh support in PETSc is very new, and at this point is
    probably only usable by expert programmers. If you feel up to it,
take a look
    at the examples in src/dm/mesh/examples/tutorials. Otherwise, you can use
    the packages above and manage the construction of Mats and Vecs yourself.

  Thanks,

    Matt

> Thanks anyway...



From knepley at gmail.com  Fri Feb 23 09:14:53 2007
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 23 Feb 2007 09:14:53 -0600
Subject: about SAMG
In-Reply-To: <a414309d0702230711v3abf11bevff5198aeb100cf1d@mail.gmail.com>
References: <a414309d0702230711v3abf11bevff5198aeb100cf1d@mail.gmail.com>
Message-ID: <a9f269830702230714v43173123qd7ca527850ad521d@mail.gmail.com>

On 2/23/07, Niriedith Karina <niriedith at gmail.com> wrote:
> Hi again :P
>
> how can work whit the algebraic multigrid in petsc? i really don't know :(

You need to configure with an AMG package, for instance --download-hypre. Then
the solver will be available -pc_type hypre -pc_hypre_type boomeramg.

  Matt

> thaks :D
>



From niriedith at gmail.com  Fri Feb 23 09:50:59 2007
From: niriedith at gmail.com (Niriedith Karina )
Date: Fri, 23 Feb 2007 11:50:59 -0400
Subject: about Unstructured Meshes
In-Reply-To: <a9f269830702230713k5d3c94bcw9aba8ba51689cd6@mail.gmail.com>
References: <a414309d0702230708y6162de61gfd75156d2a3dbcf5@mail.gmail.com>
	 <a9f269830702230713k5d3c94bcw9aba8ba51689cd6@mail.gmail.com>
Message-ID: <a414309d0702230750u6b8301f4t6ba4fb2cc795cc20@mail.gmail.com>

oka oka...
thaks
but... i don't understand very well....
what can i do with petsc and the unstructured mesh support? because you say
that petsc does not make meshes so what it's new about the meshes in petsc ?

i'm sorry but i'm very new in this area...

thaks again...

On 2/23/07, Matthew Knepley <knepley at gmail.com> wrote:
>
> On 2/23/07, Niriedith Karina <niriedith at gmail.com> wrote:
> > Hi !!
> >
> > I'm new here...and i'm new a petsc user.. :P
> > I want to know if petsc has support for meshes...
> > I was reading about that and i need create a mesh but all the software
> that
> > i see are comercial ...so I use petsc for linear solver and matrices so
> if
> > petsc create meshes it would be but easy for me  .. so..
> > help me ! :D
>
> 1) PETSc does not make meshes, but there are good free meshing packages,
>     Triangle in 2D and TetGen in 3D.
>
> 2) The unstructured mesh support in PETSc is very new, and at this point
> is
>     probably only usable by expert programmers. If you feel up to it,
> take a look
>     at the examples in src/dm/mesh/examples/tutorials. Otherwise, you can
> use
>     the packages above and manage the construction of Mats and Vecs
> yourself.
>
>   Thanks,
>
>     Matt
>
> > Thanks anyway...
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070223/f34d41b8/attachment.htm>

From knepley at gmail.com  Fri Feb 23 09:53:07 2007
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 23 Feb 2007 09:53:07 -0600
Subject: about Unstructured Meshes
In-Reply-To: <a414309d0702230750u6b8301f4t6ba4fb2cc795cc20@mail.gmail.com>
References: <a414309d0702230708y6162de61gfd75156d2a3dbcf5@mail.gmail.com>
	 <a9f269830702230713k5d3c94bcw9aba8ba51689cd6@mail.gmail.com>
	 <a414309d0702230750u6b8301f4t6ba4fb2cc795cc20@mail.gmail.com>
Message-ID: <a9f269830702230753g275bc50agc4398f627a71ecfa@mail.gmail.com>

On 2/23/07, Niriedith Karina <niriedith at gmail.com> wrote:
> oka oka...
> thaks
> but... i don't understand very well....
> what can i do with petsc and the unstructured mesh support? because you say
> that petsc does not make meshes so what it's new about the meshes in petsc ?
> i'm sorry but i'm very new in this area...

Then I think you should not try out the PETSc stuff yet. I would just get the
appropriate mesh generator and go from there. The extra support is for
construction of functions and operators over a mesh, but you can do that
yourself after generating them.

   Matt

> thaks again...
>
>
> On 2/23/07, Matthew Knepley <knepley at gmail.com > wrote:
> > On 2/23/07, Niriedith Karina < niriedith at gmail.com> wrote:
> > > Hi !!
> > >
> > > I'm new here...and i'm new a petsc user.. :P
> > > I want to know if petsc has support for meshes...
> > > I was reading about that and i need create a mesh but all the software
> that
> > > i see are comercial ...so I use petsc for linear solver and matrices so
> if
> > > petsc create meshes it would be but easy for me  .. so..
> > > help me ! :D
> >
> > 1) PETSc does not make meshes, but there are good free meshing packages,
> >     Triangle in 2D and TetGen in 3D.
> >
> > 2) The unstructured mesh support in PETSc is very new, and at this point
> is
> >     probably only usable by expert programmers. If you feel up to it,
> > take a look
> >     at the examples in src/dm/mesh/examples/tutorials. Otherwise, you can
> use
> >     the packages above and manage the construction of Mats and Vecs
> yourself.
> >
> >   Thanks,
> >
> >     Matt
> >
> > > Thanks anyway...
> >
> >
>
>


-- 
One trouble is that despite this system, anyone who reads journals widely
and critically is forced to realize that there are scarcely any bars to eventual
publication. There seems to be no study too fragmented, no hypothesis too
trivial, no literature citation too biased or too egotistical, no design too
warped, no methodology too bungled, no presentation of results too
inaccurate, too obscure, and too contradictory, no analysis too self-serving,
no argument too circular, no conclusions too trifling or too unjustified, and
no grammar and syntax too offensive for a paper to end up in print. --
Drummond Rennie



From niriedith at gmail.com  Tue Feb 27 08:23:20 2007
From: niriedith at gmail.com (Niriedith Karina )
Date: Tue, 27 Feb 2007 10:23:20 -0400
Subject: about mesh generators
Message-ID: <a414309d0702270623r3af6c78kf79b4145b3bcfa3b@mail.gmail.com>

Hi!!

I read about a software Hexgen  ....
Anyone knows where find this software...is very very important because i
need that a software for meshes hexa...

Thanks !!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070227/da80a14a/attachment.htm>

From niriedith at gmail.com  Tue Feb 27 10:24:33 2007
From: niriedith at gmail.com (Niriedith Karina )
Date: Tue, 27 Feb 2007 12:24:33 -0400
Subject: about AMG
Message-ID: <a414309d0702270824v516454b0x81c033d21f14180f@mail.gmail.com>

Hi!

How configure hypre with petsc?...

I have installed hypre in the cluster...but when i run a program in petsc
with -pc_type hypre -pc_type_hypre boomeramg it doesn't work :(
Help me! :(
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070227/d91fe335/attachment.htm>

From bsmith at mcs.anl.gov  Tue Feb 27 11:09:22 2007
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Tue, 27 Feb 2007 11:09:22 -0600 (CST)
Subject: about AMG
In-Reply-To: <a414309d0702270824v516454b0x81c033d21f14180f@mail.gmail.com>
References: <a414309d0702270824v516454b0x81c033d21f14180f@mail.gmail.com>
Message-ID: <Pine.OSX.4.64.0702271108380.303@anlext2wls147.wl.anl-external.org>


  Add the config/configure.py option --download-hypre

  Note that the hypre from the LLNL has some bugs that make it unusable
from PETSc.

   Barry


On Tue, 27 Feb 2007, Niriedith Karina  wrote:

> Hi!
> 
> How configure hypre with petsc?...
> 
> I have installed hypre in the cluster...but when i run a program in petsc
> with -pc_type hypre -pc_type_hypre boomeramg it doesn't work :(
> Help me! :(
> 



From niriedith at gmail.com  Tue Feb 27 11:31:53 2007
From: niriedith at gmail.com (Niriedith Karina )
Date: Tue, 27 Feb 2007 13:31:53 -0400
Subject: about AMG
In-Reply-To: <Pine.OSX.4.64.0702271108380.303@anlext2wls147.wl.anl-external.org>
References: <a414309d0702270824v516454b0x81c033d21f14180f@mail.gmail.com>
	 <Pine.OSX.4.64.0702271108380.303@anlext2wls147.wl.anl-external.org>
Message-ID: <a414309d0702270931h4de1a35eo8ff014d5d1e3c761@mail.gmail.com>

 i did that...and the configuration was successful
./configure
and then
make install

but it doesn't work :(
I think that may be the dir where i did it is the problem...
petsc in /opt/petsc/
hypre in /opt/hypre/
(sorry also i'm new in  english and  linux :P )

Help me....
:(
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070227/f1b540f0/attachment.htm>

From knepley at gmail.com  Tue Feb 27 12:49:11 2007
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 27 Feb 2007 12:49:11 -0600
Subject: about AMG
In-Reply-To: <a414309d0702270931h4de1a35eo8ff014d5d1e3c761@mail.gmail.com>
References: <a414309d0702270824v516454b0x81c033d21f14180f@mail.gmail.com>
	 <Pine.OSX.4.64.0702271108380.303@anlext2wls147.wl.anl-external.org>
	 <a414309d0702270931h4de1a35eo8ff014d5d1e3c761@mail.gmail.com>
Message-ID: <a9f269830702271049m3cd174a8ue8a02da9ed213ffd@mail.gmail.com>

If you have a configure problem, please send configure.log to
petsc-maint at mcs.anl.gov.

   Matt

On 2/27/07, Niriedith Karina <niriedith at gmail.com> wrote:
>  i did that...and the configuration was successful
> ./configure
> and then
> make install
>
> but it doesn't work :(
> I think that may be the dir where i did it is the problem...
> petsc in /opt/petsc/
> hypre in /opt/hypre/
> (sorry also i'm new in  english and  linux :P )
>
> Help me....
> :(
>


-- 
One trouble is that despite this system, anyone who reads journals widely
and critically is forced to realize that there are scarcely any bars to eventual
publication. There seems to be no study too fragmented, no hypothesis too
trivial, no literature citation too biased or too egotistical, no design too
warped, no methodology too bungled, no presentation of results too
inaccurate, too obscure, and too contradictory, no analysis too self-serving,
no argument too circular, no conclusions too trifling or too unjustified, and
no grammar and syntax too offensive for a paper to end up in print. --
Drummond Rennie



From dalcinl at gmail.com  Tue Feb 27 13:35:41 2007
From: dalcinl at gmail.com (Lisandro Dalcin)
Date: Tue, 27 Feb 2007 16:35:41 -0300
Subject: about AMG
In-Reply-To: <a414309d0702270931h4de1a35eo8ff014d5d1e3c761@mail.gmail.com>
References: <a414309d0702270824v516454b0x81c033d21f14180f@mail.gmail.com>
	 <Pine.OSX.4.64.0702271108380.303@anlext2wls147.wl.anl-external.org>
	 <a414309d0702270931h4de1a35eo8ff014d5d1e3c761@mail.gmail.com>
Message-ID: <e7ba66e40702271135q49ea19eap5b69060e37313608@mail.gmail.com>

In case this helps you, here goes what I usually do for installing
petsc in our cluster, in a central location. Make sure 'mpicc' is in
your $PATH, the first line is to be sure about this.

$ which mpicc
/usr/local/mpich2/1.0.5/bin/mpicc
$ tar -zxf petsc-2.3.2-p8.tar.gz
$ cd petsc-2.3.2-p8
$ export PETSC_DIR=`pwd`
$ export PETSC_ARCH=linux-gnu
$ touch ~/.hypre_license
$ python config/configure.py --prefix=/usr/local/petsc/2.3.2
--with-shared=1 --with-hypre=1 --download-hypre=ifneeded
$ make
$ su -c 'make install'
$ export PETSC_DIR=/usr/local/petsc/2.3.2
$ make test


On 2/27/07, Niriedith Karina <niriedith at gmail.com> wrote:
>  i did that...and the configuration was successful
> ./configure
> and then
> make install
>
> but it doesn't work :(
> I think that may be the dir where i did it is the problem...
> petsc in /opt/petsc/
> hypre in /opt/hypre/
> (sorry also i'm new in  english and  linux :P )
>
> Help me....
> :(
>


-- 
Lisandro Dalc?n
---------------
Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
PTLC - G?emes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



From bsmith at mcs.anl.gov  Tue Feb 27 15:45:28 2007
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Tue, 27 Feb 2007 15:45:28 -0600 (CST)
Subject: about AMG
In-Reply-To: <a414309d0702270931h4de1a35eo8ff014d5d1e3c761@mail.gmail.com>
References: <a414309d0702270824v516454b0x81c033d21f14180f@mail.gmail.com> 
 <Pine.OSX.4.64.0702271108380.303@anlext2wls147.wl.anl-external.org>
 <a414309d0702270931h4de1a35eo8ff014d5d1e3c761@mail.gmail.com>
Message-ID: <Pine.OSX.4.64.0702271545000.303@anlext2wls147.wl.anl-external.org>


  Please send to petsc-maint at mcs.anl.gov configure.log and make_*
from building PETSc.

   Barry


On Tue, 27 Feb 2007, Niriedith Karina  wrote:

> i did that...and the configuration was successful
> ./configure
> and then
> make install
> 
> but it doesn't work :(
> I think that may be the dir where i did it is the problem...
> petsc in /opt/petsc/
> hypre in /opt/hypre/
> (sorry also i'm new in  english and  linux :P )
> 
> Help me....
> :(
> 



From niriedith at gmail.com  Tue Feb 27 15:54:22 2007
From: niriedith at gmail.com (Niriedith Karina )
Date: Tue, 27 Feb 2007 17:54:22 -0400
Subject: algebraic multigrid preconditioner
Message-ID: <a414309d0702271354k1b15139crdb3eae2af0e14e3f@mail.gmail.com>

Hi!

I need a linear solver with GMRES and a Algebraiv multigrid
I read about SAMG y BoomerAMG...but i dont know how use SAMG
and with Hypre i had some problem to configure....also you say that hypre
has some bugs that make it unusable
from PETSc...so my question how can i do that?

Thanks!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070227/17c9ef90/attachment.htm>

From knepley at gmail.com  Tue Feb 27 15:57:39 2007
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 27 Feb 2007 15:57:39 -0600
Subject: algebraic multigrid preconditioner
In-Reply-To: <a414309d0702271354k1b15139crdb3eae2af0e14e3f@mail.gmail.com>
References: <a414309d0702271354k1b15139crdb3eae2af0e14e3f@mail.gmail.com>
Message-ID: <a9f269830702271357s238f5f63i870a2fecddbfd05e@mail.gmail.com>

On 2/27/07, Niriedith Karina <niriedith at gmail.com> wrote:
> Hi!
>
> I need a linear solver with GMRES and a Algebraiv multigrid
> I read about SAMG y BoomerAMG...but i dont know how use SAMG
> and with Hypre i had some problem to configure....also you say that hypre
> has some bugs that make it unusable
> from PETSc...so my question how can i do that?
>
> Thanks!

You must reconfigure using --download-hypre. If you have a problem, send
the configure.log to petsc-maint at mcs.anl.gov.

   Matt

-- 
One trouble is that despite this system, anyone who reads journals widely
and critically is forced to realize that there are scarcely any bars to eventual
publication. There seems to be no study too fragmented, no hypothesis too
trivial, no literature citation too biased or too egotistical, no design too
warped, no methodology too bungled, no presentation of results too
inaccurate, too obscure, and too contradictory, no analysis too self-serving,
no argument too circular, no conclusions too trifling or too unjustified, and
no grammar and syntax too offensive for a paper to end up in print. --
Drummond Rennie



From niriedith at gmail.com  Tue Feb 27 16:54:50 2007
From: niriedith at gmail.com (Niriedith Karina )
Date: Tue, 27 Feb 2007 18:54:50 -0400
Subject: algebraic multigrid preconditioner
In-Reply-To: <a9f269830702271357s238f5f63i870a2fecddbfd05e@mail.gmail.com>
References: <a414309d0702271354k1b15139crdb3eae2af0e14e3f@mail.gmail.com>
	 <a9f269830702271357s238f5f63i870a2fecddbfd05e@mail.gmail.com>
Message-ID: <a414309d0702271454i4af2716ak13391fae6527a59b@mail.gmail.com>

Finally I understood =D ...but...i have a problem...my account in the
cluster is limited...i use petsc but it's installed previously in the
university i have a month using petsc...so...my question is it's the only
way to do that right,( reinstall petsc)?

Thanks!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070227/889f44de/attachment.htm>

From dalcinl at gmail.com  Tue Feb 27 17:41:11 2007
From: dalcinl at gmail.com (Lisandro Dalcin)
Date: Tue, 27 Feb 2007 20:41:11 -0300
Subject: algebraic multigrid preconditioner
In-Reply-To: <a414309d0702271454i4af2716ak13391fae6527a59b@mail.gmail.com>
References: <a414309d0702271354k1b15139crdb3eae2af0e14e3f@mail.gmail.com>
	 <a9f269830702271357s238f5f63i870a2fecddbfd05e@mail.gmail.com>
	 <a414309d0702271454i4af2716ak13391fae6527a59b@mail.gmail.com>
Message-ID: <e7ba66e40702271541m149ca5eeyef931c14f8938006@mail.gmail.com>

Well, in our cluster, the debug version need
[dalcinl at aquiles ~]$ du -sh /usr/local/petsc/dev/lib/linux-gnu
163M    /usr/local/petsc/dev/lib/linux-gnu

but the optimized version only need

[dalcinl at aquiles ~]$ du -sh /usr/local/petsc/dev/lib/linux-gnu-O
18M     /usr/local/petsc/dev/lib/linux-gnu-O

Can you afford to have 20M in your cluster account? Of course, you
will need many more space for building PETSc, but perhaps you can do
that on /tmp or some public scratch space.

Or even better, ask your cluster sys admin to build PETSc with your
specific configure options (in your case, with hypre), usig an
appropriate name for PETSC_ARCH, for example
PETSC_ARCH=linux-gnu-O-hypre.


On 2/27/07, Niriedith Karina <niriedith at gmail.com> wrote:
> Finally I understood =D ...but...i have a problem...my account in the
> cluster is limited...i use petsc but it's installed previously in the
> university i have a month using petsc...so...my question is it's the only
> way to do that right,( reinstall petsc)?
>
> Thanks!
>


-- 
Lisandro Dalc?n
---------------
Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
PTLC - G?emes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



From jinzishuai at yahoo.com  Wed Feb 28 16:14:19 2007
From: jinzishuai at yahoo.com (Shi Jin)
Date: Wed, 28 Feb 2007 14:14:19 -0800 (PST)
Subject: Memory allocated by PETSC?
Message-ID: <414054.87968.qm@web36208.mail.mud.yahoo.com>

Hi, 
I am curious how much extra memory PETSc allocates in
the background. Since my estimate of memory usage of
the code is much smaller than what I see when it runs.
So I did this simple test:
First I used PETSc to dump a matrix in binary format
into a file. The file has a size of 13MB. I assume
this should be the same size that is used to store the
matrix in memory. Then I wrote a simple code that does
nothing but to load this matrix from the file by
MatLoad(). However, I found that the code consumes
29MB of memory (VIRT=29M from top) using single
process. 
This is confirmed by the -malloc_log option where it
says
 Maximum memory PetscMalloc()ed 29246912 maximum size
of entire process 0
I've attached the output of the code with detailed
malloc information.
Could you please explain to me about the  difference
of  over two time?
I don't want to criticize anything but need an clear
idea of how much memory is needed so that I know
whether there is a chance for me to reduce the memory
usage of my production code.
Thank you very much.

Shi


 
____________________________________________________________________________________
Need a quick answer? Get one in minutes from people who know.
Ask your question on www.Answers.yahoo.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: out
Type: application/octet-stream
Size: 1455 bytes
Desc: 1857821269-out
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070228/724d9d05/attachment.obj>

From balay at mcs.anl.gov  Wed Feb 28 16:51:38 2007
From: balay at mcs.anl.gov (Satish Balay)
Date: Wed, 28 Feb 2007 16:51:38 -0600 (CST)
Subject: Memory allocated by PETSC?
In-Reply-To: <414054.87968.qm@web36208.mail.mud.yahoo.com>
References: <414054.87968.qm@web36208.mail.mud.yahoo.com>
Message-ID: <Pine.LNX.4.64.0702281645430.15676@asterix>

> Maximum memory PetscMalloc()ed 29246912 maximum size of entire process 0

The choice of wording here is a bit misleading. PETSc is using
getrusage(ru_maxrss) - which is resident set size. [so top should show
similar numbers got RSS]

This might include both code segment and data segments - and the code
segment part could be a few MB - perhaps up to 10 MB]

0: [0] 10 15321472 MatSeqAIJSetPreallocation_SeqAIJ()

This indicates that the matrix is taking approximately 15MB. And there
are other datastructures that are taking about another couple of MB
space.

Depending upon how the malloc()/free() is implemented in the OS - some
of the freed memory might not immediately reflect on th RSS count.

Hope this helps..
Satish

On Wed, 28 Feb 2007, Shi Jin wrote:

> Hi, 
> I am curious how much extra memory PETSc allocates in
> the background. Since my estimate of memory usage of
> the code is much smaller than what I see when it runs.
> So I did this simple test:
> First I used PETSc to dump a matrix in binary format
> into a file. The file has a size of 13MB. I assume
> this should be the same size that is used to store the
> matrix in memory. Then I wrote a simple code that does
> nothing but to load this matrix from the file by
> MatLoad(). However, I found that the code consumes
> 29MB of memory (VIRT=29M from top) using single
> process. 
> This is confirmed by the -malloc_log option where it
> says
>  Maximum memory PetscMalloc()ed 29246912 maximum size
> of entire process 0
> I've attached the output of the code with detailed
> malloc information.
> Could you please explain to me about the  difference
> of  over two time?
> I don't want to criticize anything but need an clear
> idea of how much memory is needed so that I know
> whether there is a chance for me to reduce the memory
> usage of my production code.
> Thank you very much.
> 
> Shi
> 
> 
>  
> ____________________________________________________________________________________
> Need a quick answer? Get one in minutes from people who know.
> Ask your question on www.Answers.yahoo.com



From bsmith at mcs.anl.gov  Wed Feb 28 16:59:51 2007
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Wed, 28 Feb 2007 16:59:51 -0600 (CST)
Subject: Memory allocated by PETSC?
In-Reply-To: <414054.87968.qm@web36208.mail.mud.yahoo.com>
References: <414054.87968.qm@web36208.mail.mud.yahoo.com>
Message-ID: <Pine.OSX.4.64.0702281653030.259@bsmith.mcs.anl.gov>


  Shi,

    The current algorithm used to do a MatLoad_MPIAIJ requires 
memory on each process of about TWICE the memory required just for
the matrix. For example, if a matrix requires 40 megabytes total,
after it is completely loaded on 4 processes it will take about 
10 megabytes on each process, BUT during MatLoad each process will
use 20 megabytes (10 for the final matrix and 10 for work space
to receive message in). This is why the total is 29 meg: 15meg for the final
matrix and around 18meg for the MatLoad.

  Barry

We could work hard and reduce the amount of memory used during the
load process if this is problem for you. We are not fans of loading
huge matrices from files so generally this is not a problem.


On Wed, 28 Feb 2007, Shi Jin wrote:

> Hi, 
> I am curious how much extra memory PETSc allocates in
> the background. Since my estimate of memory usage of
> the code is much smaller than what I see when it runs.
> So I did this simple test:
> First I used PETSc to dump a matrix in binary format
> into a file. The file has a size of 13MB. I assume
> this should be the same size that is used to store the
> matrix in memory. Then I wrote a simple code that does
> nothing but to load this matrix from the file by
> MatLoad(). However, I found that the code consumes
> 29MB of memory (VIRT=29M from top) using single
> process. 
> This is confirmed by the -malloc_log option where it
> says
>  Maximum memory PetscMalloc()ed 29246912 maximum size
> of entire process 0
> I've attached the output of the code with detailed
> malloc information.
> Could you please explain to me about the  difference
> of  over two time?
> I don't want to criticize anything but need an clear
> idea of how much memory is needed so that I know
> whether there is a chance for me to reduce the memory
> usage of my production code.
> Thank you very much.
> 
> Shi
> 
> 
>  
> ____________________________________________________________________________________
> Need a quick answer? Get one in minutes from people who know.
> Ask your question on www.Answers.yahoo.com



From jinzishuai at yahoo.com  Wed Feb 28 17:16:15 2007
From: jinzishuai at yahoo.com (Shi Jin)
Date: Wed, 28 Feb 2007 15:16:15 -0800 (PST)
Subject: Memory allocated by PETSC?
In-Reply-To: <Pine.OSX.4.64.0702281653030.259@bsmith.mcs.anl.gov>
Message-ID: <168399.47181.qm@web36205.mail.mud.yahoo.com>

Thank you very much.
This is very helpful.
So the mismatch in size only comes from MatLoad()?
I am actually not a big fan of loading the matrices
either. I used it just to do some test. There is no
need to change the implementation for me at all.

So can I say that if I am going to construct the same
Matrix in the code using MatCreateMPIAIJ() and if I do
the preallocation exactly, then I should see roughly
15MB of memory used? 

Thank you.

Shi

--- Barry Smith <bsmith at mcs.anl.gov> wrote:

> 
>   Shi,
> 
>     The current algorithm used to do a
> MatLoad_MPIAIJ requires 
> memory on each process of about TWICE the memory
> required just for
> the matrix. For example, if a matrix requires 40
> megabytes total,
> after it is completely loaded on 4 processes it will
> take about 
> 10 megabytes on each process, BUT during MatLoad
> each process will
> use 20 megabytes (10 for the final matrix and 10 for
> work space
> to receive message in). This is why the total is 29
> meg: 15meg for the final
> matrix and around 18meg for the MatLoad.
> 
>   Barry
> 
> We could work hard and reduce the amount of memory
> used during the
> load process if this is problem for you. We are not
> fans of loading
> huge matrices from files so generally this is not a
> problem.
> 
> 
> On Wed, 28 Feb 2007, Shi Jin wrote:
> 
> > Hi, 
> > I am curious how much extra memory PETSc allocates
> in
> > the background. Since my estimate of memory usage
> of
> > the code is much smaller than what I see when it
> runs.
> > So I did this simple test:
> > First I used PETSc to dump a matrix in binary
> format
> > into a file. The file has a size of 13MB. I assume
> > this should be the same size that is used to store
> the
> > matrix in memory. Then I wrote a simple code that
> does
> > nothing but to load this matrix from the file by
> > MatLoad(). However, I found that the code consumes
> > 29MB of memory (VIRT=29M from top) using single
> > process. 
> > This is confirmed by the -malloc_log option where
> it
> > says
> >  Maximum memory PetscMalloc()ed 29246912 maximum
> size
> > of entire process 0
> > I've attached the output of the code with detailed
> > malloc information.
> > Could you please explain to me about the 
> difference
> > of  over two time?
> > I don't want to criticize anything but need an
> clear
> > idea of how much memory is needed so that I know
> > whether there is a chance for me to reduce the
> memory
> > usage of my production code.
> > Thank you very much.
> > 
> > Shi
> > 
> > 
> >  
> >
>
____________________________________________________________________________________
> > Need a quick answer? Get one in minutes from
> people who know.
> > Ask your question on www.Answers.yahoo.com
> 
> 



 
____________________________________________________________________________________
Do you Yahoo!?
Everyone is raving about the all-new Yahoo! Mail beta.
http://new.mail.yahoo.com



From balay at mcs.anl.gov  Wed Feb 28 17:23:51 2007
From: balay at mcs.anl.gov (Satish Balay)
Date: Wed, 28 Feb 2007 17:23:51 -0600 (CST)
Subject: Memory allocated by PETSC?
In-Reply-To: <Pine.LNX.4.64.0702281645430.15676@asterix>
References: <414054.87968.qm@web36208.mail.mud.yahoo.com>
 <Pine.LNX.4.64.0702281645430.15676@asterix>
Message-ID: <Pine.LNX.4.64.0702281723180.15676@asterix>

On Wed, 28 Feb 2007, Satish Balay wrote:

> > Maximum memory PetscMalloc()ed 29246912 maximum size of entire process 0
> 
> The choice of wording here is a bit misleading. PETSc is using
> getrusage(ru_maxrss) - which is resident set size. [so top should show
> similar numbers got RSS]

Ops - my comments are wrong here.. RSS here is printed as '0' - so
there is a problem with PETSc code somewhere..

Satish



From balay at mcs.anl.gov  Wed Feb 28 17:36:46 2007
From: balay at mcs.anl.gov (Satish Balay)
Date: Wed, 28 Feb 2007 17:36:46 -0600 (CST)
Subject: Memory allocated by PETSC?
In-Reply-To: <168399.47181.qm@web36205.mail.mud.yahoo.com>
References: <168399.47181.qm@web36205.mail.mud.yahoo.com>
Message-ID: <Pine.LNX.4.64.0702281724100.15676@asterix>

On Wed, 28 Feb 2007, Shi Jin wrote:

> Thank you very much.
> This is very helpful.
> So the mismatch in size only comes from MatLoad()?
> I am actually not a big fan of loading the matrices
> either. I used it just to do some test. There is no
> need to change the implementation for me at all.
> 
> So can I say that if I am going to construct the same
> Matrix in the code using MatCreateMPIAIJ() and if I do
> the preallocation exactly, then I should see roughly
> 15MB of memory used? 

more or less.. However this number [Maximum memory PetscMalloc()ed]
corresponds to malloced memory only. RSS numbers would be different.

Note that the extra memory in MatLoad() is just temporary - i.e its
freed at the end of this function. [and in the next stage the solver
might take lot more memory than this temporary matload stuff]

Satish



From bsmith at mcs.anl.gov  Wed Feb 28 20:39:55 2007
From: bsmith at mcs.anl.gov (Barry Smith)
Date: Wed, 28 Feb 2007 20:39:55 -0600 (CST)
Subject: Memory allocated by PETSC?
In-Reply-To: <Pine.LNX.4.64.0702281723180.15676@asterix>
References: <414054.87968.qm@web36208.mail.mud.yahoo.com>
 <Pine.LNX.4.64.0702281645430.15676@asterix> <Pine.LNX.4.64.0702281723180.15676@asterix>
Message-ID: <Pine.OSX.4.64.0702282039270.259@bsmith.mcs.anl.gov>


> 
> Ops - my comments are wrong here.. RSS here is printed as '0' - so
> there is a problem with PETSc code somewhere..
>
   I could never get Linux to give me this number correctly :-(
 
> Satish
> 
>