[petsc-users] Scalability of AO ?

Wed Mar 9 08:47:57 CST 2011

Hello PETSc experts

I have a parallel application that builds extensively on PETSc
functionality and also uses the AO commands AOCreateMapping and
AOApplicationToPetsc. We are currently doing some benchmarks on jaguar,
the world's second-fastest computer, where we find some interior
eigenvalues of a really large matrix (in conjunction with SLEPc).

The application runs fine when using 40'000 cores and a matrix size of
400 million. There are 20 million AO-indices. However, when I scale up
to 80'000 / 800 million / 40 million, I am running out of memory (jaguar
has 1.3GB/core). I am pretty sure that in our own code all vectors have
only the size of the local degrees of freedom, which stays constant at
around 10'000.

I figured out that I am running out of memory when I call
AOCreateMapping. When I look inside aomapping.c, I see a comment "get
all indices on all processors" near line 330 and some MPI_Allgatherv's.
That seems like the routine AOApplicationToPetsc is not scalable.

Not thinking for it too long, it seems to me that the task of creating
this mapping could be done without having the need of communicating all
indices to all processors (I am using PETSC_NULL for the mypetsc
argument). Let me know what you think about this.

Best
Sebastian