2 Questions about DAs

Tue May 13 10:46:35 CDT 2008

Hello:
Thanks for all of your help, this has helped me tremendously!

Milad

On Mon, May 12, 2008 at 7:22 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
>   A couple of items.
>
>     Overlapping communication and computation is pretty much a myth. The CPU
> is used by MPI to pack
>  the messages and put them on the network so it is not available for
> computation during this time. Usually
>  if you try to overlap communication and computation it will end up being
> slower and I've never seen it faster.
>  Vendors will try to trick you into buying a machine by saying it does it,
> but it really doesn't. Just forget about trying to do it.
>
>    Creating a DA involves a good amount of setup and some communication; it
> is fine to use a few DA's
>  but setting up hundreds of DAs is not a good idea UNLESS YOU DO TONS OF
> WORK for each DA.
>  In your case you are doing just a tiny amount of communication  with each
> DA so the DA setup time
>  is dominating.
>
>   If you have hundreds of vectors that you wish to communicate AT THE SAME
> TIME (seems strange but
>  I suppose it is possible), then rather than having hundreds of
> DAGlobalToLocalBegin/End() in a row
>  you will want to create an additional "meta" DA that has the same m,n,p as
> the regular DA but has a
>  dof equal to the number of vectors you wish to communicate at the same
> time. Use VecStrideScatterAll()
>  to get the individual vectors into a meta vector, do the
> DAGlobalToLocalBegin/End() on the meta vector
>  to get the ghost values and then use DAStrideGatherAll() to get the values
> into the 322 individual ghosted
>  vectors. The reason to do it this way is so the values in all the vectors
> are all sent together in a single
>  MPI message instead of the separate message that would needed for each of
> the small
>  DAGlobalToLocalBegin/End().
>
>    Barry
>
>
>
>
>
>  On May 12, 2008, at 6:21 PM, Milad Fatenejad wrote:
>
>
> >
> >
> >
> > Hi:
> > I created a simple test problem that demonstrates the issue. In the
> > test problem, 100 vectors are created using:
> > single.cpp: a single distributed array and
> > multi.cpp: 100 distributed arrays
> >
> > Some math is performed on the vectors, then they are scattered to
> > local vectors..
> >
> > The log summary (running 2 processes) shows that multi.cpp uses more
> > memory and performs more reductions than single.cpp, which is similar
> > to the experience I had with my program...
> >
> > I hope this helps
> > Milad
> >
> > On Mon, May 12, 2008 at 3:15 PM, Matthew Knepley <knepley at gmail.com>
> wrote:
> >
> > > On Mon, May 12, 2008 at 3:01 PM, Milad Fatenejad <icksa1 at gmail.com>
> wrote:
> > >
> > > > Hello:
> > > > I've attached the result of two calculations. The file "log-multi-da"
> > > > uses 1 DA for each vector (322 in all) and the file "log-single-da"
> > > > using 1 DA for the entire calculation. When using 322 DA's, about 10x
> > > > more time is spent in VecScatterBegin and VecScatterEnd. Both were
> > > > running using two processes
> > > >
> > > > I should mention that the source code for these two runs was exactly
> > > > the same, I didn't reorder the scatters differently. The only
> > > > difference was the number of DAs
> > > >
> > > > Any suggestions? Do you think this is related to the number of DA's,
> > > > or something else?
> > > >
> > >
> > > There are vastly different numbers of reductions and much bigger memory
> usage.
> > > Please send the code and I will look at it.
> > >
> > >  Matt
> > >
> > >
> > >
> > >
> > > > Thanks for your help
> > > > Milad
> > > >
> > > > On Mon, May 12, 2008 at 1:56 PM, Matthew Knepley <knepley at gmail.com>
> wrote:
> > > >
> > > > >
> > > > > On Mon, May 12, 2008 at 11:02 AM, Milad Fatenejad
> <mfatenejad at wisc.edu> wrote:
> > > > >
> > > > > > Hello:
> > > > > > I have two separate DA questions:
> > > > > >
> > > > > > 1) I am writing a large finite difference code and would like to
> be
> > > > > > able to represent an array of vectors. I am currently doing this
> by
> > > > > > creating a single DA and calling DACreateGlobalVector several
> times,
> > > > > > but the manual also states that:
> > > > > >
> > > > > > "PETSc currently provides no container for multiple arrays sharing
> the
> > > > > > same distributed array communication; note, however, that the dof
> > > > > > parameter handles many cases of interest."
> > > > > >
> > > > > > I also found the following mailing list thread which describes how
> to
> > > > > > use the dof parameter to represent several vectors:
> > > > > >
> > > > > >
> > > > > >
> http://www-unix.mcs.anl.gov/web-mail-archive/lists/petsc-users/2008/02/msg00040.html
> > > > > >
> > > > > > Where the following solution is proposed:
> > > > > > """
> > > > > > The easiest thing to do in C is to declare a struct:
> > > > > >
> > > > > > typedef struct {
> > > > > >  PetscScalar v[3];
> > > > > >  PetscScalar p;
> > > > > > } Space;
> > > > > >
> > > > > > and then cast pointers
> > > > > >
> > > > > >  Space ***array;
> > > > > >
> > > > > >  DAVecGetArray(da, u, (void *) &array);
> > > > > >
> > > > > >    array[k][j][i].v *= -1.0;
> > > > > > """
> > > > > >
> > > > > > The problem with the proposed solution, is that they use a struct
> to
> > > > > > get the individual values, but what if you don't know the number
> of
> > > > > > degrees of freedom at compile time?
> > > > > >
> > > > >
> > > > > It would be nice to get variable structs in C. However, you can just
> deference
> > > > > the object directly. For example, for 50 degrees of freedom, you can
> do
> > > > >
> > > > >  array[k][j][i][47] *= -1.0;
> > > > >
> > > > >
> > > > >
> > > > > > So my question is two fold:
> > > > > > a) Is there a problem with just having a single DA and calling
> > > > > > DACreateGlobalVector multiple times? Does this affect performance
> at
> > > > > > all (I have many different vectors)?
> > > > > >
> > > > >
> > > > > These are all independent objects. Thus, by itself, creating any
> number of
> > > > > Vecs does nothing to performance (unless you start to run out of
> memory).
> > > > >
> > > > >
> > > > >
> > > > > > b) Is there a way to use the dof parameter when creating a DA when
> the
> > > > > > number of degrees of freedom is not known at compile time?
> > > > > > Specifically, I would like to be able to access the individual
> values
> > > > > > of the vector, just like the example shows...
> > > > > >
> > > > >
> > > > >
> > > > > see above.
> > > > >
> > > > >
> > > > > > 2) The code I am writing has a lot of different parts which
> present a
> > > > > > lot of opportunities to overlap communication an computation when
> > > > > > scattering vectors to update values in the ghost points. Right
> now,
> > > > > > all of my vectors (there are ~50 of them) share a single DA
> because
> > > > > > they all have the same shape. However, by sharing a single DA, I
> can
> > > > > > only scatter one vector at a time. It would be nice to be able to
> > > > > > start scattering each vector right after I'm done computing it,
> and
> > > > > > finish scattering it right before I need it again but I can't
> because
> > > > > > other vectors might need to be scattered in between. I then
> re-wrote
> > > > > > part of my code so that each vector had its own DA object, but
> this
> > > > > > ended up being incredibly slow (I assume this is because I have so
> > > > > > many vectors).
> > > > > >
> > > > >
> > > > > The problem here is that buffering will have to be done for each
> outstanding
> > > > > scatter. Thus I see two resolutions:
> > > > >
> > > > >  1) Duplicate the DA scatter for as many Vecs as you wish to scatter
> at once.
> > > > >     This is essentially what you accomplish with separate DAs.
> > > > >
> > > > >  2) You the dof method. However, this scatter ALL the vectors every
> time.
> > > > >
> > > > > I do not understand what performance problem you would have with
> multiple
> > > > > DAs. With any performance questions, we suggest sending the output
> of
> > > > > -log_summary so we have data to look at.
> > > > >
> > > > >  Matt
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > > My question is, is there a way to scatter multiple vectors
> > > > > > simultaneously without affecting the performance of the code? Does
> it
> > > > > > make sense to do this?
> > > > > >
> > > > > >
> > > > > > I'd really appreciate any help...
> > > > > >
> > > > > > Thanks
> > > > > > Milad Fatenejad
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > What most experimenters take for granted before they begin their
> > > > > experiments is infinitely more interesting than any results to which
> > > > > their experiments lead.
> > > > > -- Norbert Wiener
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > >
> > > What most experimenters take for granted before they begin their
> > > experiments is infinitely more interesting than any results to which
> > > their experiments lead.
> > > -- Norbert Wiener
> > >
> > >
> > >
> > <log-multi><log-single><multi.cpp><single.cpp>
> >
>
>