[petsc-users] SLEPc eigensolver that uses minimal memory and finds ALL eigenvalues of a real symmetric sparse matrix in reasonable time

Shitij Bhargava shitij.cse at gmail.com
Tue Aug 9 02:54:11 CDT 2011


Thanks Jose, Barry.

I tried what you said, but that gives me an error:

*[0]PETSC ERROR: --------------------- Error Message
------------------------------------
[0]PETSC ERROR: Argument out of range!
[0]PETSC ERROR: Can only get local values, trying 9!*

 This is probably because here I am trying to insert all rows of the matrix
through process 0, but process 0 doesnt own all the rows.

In any case, this seems very "unnatural", so I am using MPIAIJ the right way
as you said, where I assemble the MPIAIJ matrix in parallel instead of only
on one process. I have done that actually, and am running the code on the
cluster right now. Its going to take a long long time to finish, so I cant
confirm some of my doubts, which I am asking below:

1. If I run the code with 1 process, and say it takes M memory (peak) while
solving for eigenvalues, then when I run it with N processes, each will take
nearly M/N memory (peak) (probably a little more) right ? And for doing
this, I dont have to use any special MPI stuff....the fact that I am using
MPIAIJ, and building the EPS object from it, and then calling EPSSolve() is
enough ? I mean EPSSolve() is internally in some way distributing memory and
computation effort automatically when I use MPIAIJ, and run the code with
many processes, right ?
This confusion is there because when I use top, while running the code with
8 processes, each of them showed me nearly 250 mb initially, but each has
grown to use 270 mb in about 70 minutes. I understand that the method
krylovschur is such that memory requirements increase slowly, but the peak
on any process will be less (than if I ran only one process), right ?  (Even
though their memory requirements are growing, they will grow to some M/N
only, right ?)

Actually the fact that in this case, each of the process creates its own EPS
context, initializes it itself, and then calls EPSSolve() itself without any
"interaction" with other processes makes me wonder if they really are
working together, or just individually (I would have verified this myself,
but the program will take way too much time, and I know I would have to kill
it sooner or later).....or the fact that they initialize their own EPS
context with THEIR part of the MPI is enough to make them "cooperate and
work together" ? (Although I think this is what Barry meant in that last
post, but I am not too sure)

I am not too comfortable with the MPI way of thinking right now, probably
this is why I have this confusion.

Anyways, I cant thank you guys enough. I would have been scrounging through
documentation again and again to no avail if you guys had not helped me the
way you did. The responses were always prompt, always to the point (even
though my questions were sometimes not, probably because I didnt completely
understand the problems I was facing.....but you always knew what I was
asking) and very clear. At this moment, I dont know much about PETSc/SLEPc
myself, but I will be sure to contribute back to this list when I do. I have
nothing but sincere gratitude for you guys.


Thank you very much !

Shitij


On 9 August 2011 00:58, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
> On Aug 8, 2011, at 2:14 AM, Shitij Bhargava wrote:
>
> > Thank you Jed. That was indeed the problem. I installed a separate MPI
> for PETSc/SLEPc, but was running my program with a default, already
> installed one.
> >
> > Now, I have a different question. What I want to do is this:
> >
> > 1. Only 1 process, say root, calculates the matrix in SeqAIJ format
> > 2. Then root creates the EPS context, eps and initializes,sets
> parameters, problem type,etc. properly
> > 3. After this the root process broadcasts this eps object to other
> processes
> > 4. I use EPSSolve to solve for eigenvalues (all process together in
> cooperation resulting in memory distribution)
> > 5. I get the results from root
>
>    We do have an undocumented routine MatDistribute_MPIAIJ(MPI_Comm
> comm,Mat gmat,PetscInt m,MatReuse reuse,Mat *inmat) in
> src/mat/impls/aij/mpi/mpiaij.c that will take a SeqAIJ matrix and distribute
> it over a larger MPI communicator.
>
>   Note that you cannot create the EPS context etc on a the root process and
> then broadcast the object but once the matrix is distributed you can simple
> create the EPS context etc on the parallel communicator where the matrix is
> and run with that.
>
>   Barry
>
> >
> > is this possible ? I am not able to broadcast the EPS object, because it
> is not an MPI_DataType. Is there any PETSc/SLEPc function for this ? I am
> avoiding using MPIAIJ because that will mean making many changes in the
> existing code, including the numerous write(*,*) statements (i would have to
> convert them to PetscPrint in FORTRAN or something like that).
> > So I want a single process to handle matrix generation and assembly, but
> want to solve the eigenproblem in parallel by different processes. Running
> the subroutine EPSSolve in parallel and hence distribute memory is the only
> reason why I want to use MPI.
> >
> > Thanks a lot !!
> >
> > Shitij
> >
> > On 8 August 2011 11:05, Jed Brown <jedbrown at mcs.anl.gov> wrote:
> > On Mon, Aug 8, 2011 at 00:29, Shitij Bhargava <shitij.cse at gmail.com>
> wrote:
> > I ran it with:
> >
> > mpirun -np 2 ./slepcEigenMPI -eps_monitor
> >
> > I didnt do exactly what you said, because the matrix generation part in
> the actual program is quite time consuming itself. But I assume what I am
> doing is equivalent to what you meant to do? Also, I put MPD as
> PETSC_DECIDE, because I didnt know what to put it for this matrix dimension.
> >
> > This is the output I get: (part of the output)
> > MATRIX ASSMEBLY DONE !!!!!!!!
> >
> > MATRIX ASSMEBLY DONE !!!!!!!!
> >
> >   1 EPS nconv=98 first unconverged value (error) 1490.88 (1.73958730e-05)
> >   1 EPS nconv=98 first unconverged value (error) 1490.88 (1.73958730e-05)
> >   2 EPS nconv=282 first unconverged value (error) 3.04636e-27
> (2.49532175e-04)
> >   2 EPS nconv=282 first unconverged value (error) 3.04636e-27
> (2.49532175e-04)
> >
> > The most likely case is that you have more than one MPI implementation
> installed and that you are running with a different implementation than you
> built with. Compare the outputs:
> >
> > $ ldd ./slepcEigenMPI
> > $ which mpirun
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110809/ccba7caa/attachment.htm>


More information about the petsc-users mailing list