> In order to find where the extra time is consumed, I started from
> ksp/ksp/example/tutorials/ex22.c and changed it one line at a time. I found
> that the time is consumed in the call:
>  ierr =
> MatSetValuesStencil(B,7,row,7,col,&val[0][0],INSERT_VALUES);CHKERRQ(ierr);

You must have changed more than just this because the arrays just aren't
that big. This is a finite difference example. It inserts one row at a time.
Inserting 7 rows at a time doesn't make sense.

>   This is because I expected the computation time to be 7*1e-2 by this
> repetitive call. However, I find that the execution time is ~1e-2 only for
> l=3. For any other value of l, it is ~1e0. I can see that the only
> speciality of l=3 is that it corresponds to row=(i,j,k). Any other
> combination like (i+1,j,k) causes the call to
>  MatSetValuesBlockedStencil(B,1,&row[l],7,col,&val[0][0],INSERT_VALUES);CHKERRQ(ierr);}
> to be slower by 2 orders of magnitude. Could you please suggest why the
> performance goes down so drastically.

You have to preallocate correctly to get high performance. When you change
the insertion code to set values arbitrarily, the preallocation becomes
incorrect. This is all explained in the link I gave you earlier as well as
the Users Manual.


> I want to assemble a matrix for FEM application. I started from the example
> ksp/ksp/examples/tutorials/ex3.c
> This example shows the similar problem stated above when m>=120.

For simplicity, this example did not preallocate. I have added very naive
preallocation to this example in petsc-dev. If you add the two lines below,
assembly will be fast for large sizes.

changeset:   b961a1bfd123
user:        Jed Brown <jed at 59A2.org>
date:        Tue Jun 21 11:25:30 2011 +0200
summary:     Add naive preallocation to example

diff --git a/src/ksp/ksp/examples/tutorials/ex3.c
--- a/src/ksp/ksp/examples/tutorials/ex3.c
+++ b/src/ksp/ksp/examples/tutorials/ex3.c
@@ -64,6 +64,8 @@
   ierr = MatCreate(PETSC_COMM_WORLD,&A);CHKERRQ(ierr);
   ierr = MatSetFromOptions(A);CHKERRQ(ierr);
+  ierr = MatSeqAIJSetPreallocation(A,9,PETSC_NULL);CHKERRQ(ierr);
+  ierr =
MatMPIAIJSetPreallocation(A,9,PETSC_NULL,5,PETSC_NULL);CHKERRQ(ierr); /*
More than necessary */
   start = rank*(M/size) + ((M%size) < rank ? (M%size) : rank);
   end   = start + M/size + ((M%size) > rank);
