how to initialize a parallel vector from one process?

Sun Jul 6 05:01:31 CDT 2008

Thank you Jed,

I'm in the same situation where I cannot assume that the data can fit in one process memory space.
I will certainly end by implementing it in the dirty way.
It would be neat if a function could take care of distributing the data to each process based on the global indices range. Maybe such a function already exists...

Another solution would be to write a new viewer similar to binary viewer... Except that instead of reading in a file one would perform a request to a database.
Did you consider this option while implementing your own initialization?
Do you know if there is any documentation on how to implement a new viewer?

Thierry

-----Original Message-----
From: owner-petsc-users at mcs.anl.gov [mailto:owner-petsc-users at mcs.anl.gov] On Behalf Of Jed Brown
Sent: Sunday, July 06, 2008 12:27 PM
To: petsc-users at mcs.anl.gov
Subject: Re: how to initialize a parallel vector from one process?

I've done something similar when reading and writing netCDF in parallel where
the files are too large to be stored on a single processor.  NetCDF-4 makes this
obsolete, but here's the idea:

* The new parallel job makes a PETSc DA and uses PETSC_DECIDE to for partitioning.

* Process zero reads the header and broadcasts the dimensions.

* Each process determines the index range it needs to interpolate the file data
  onto the locally owned computational grid.  Send this index range to rank
  zero.

* Rank zero reads each of the blocks sequentially (the netCDF API has a read
  (imin,imax)x(jmin,jmax)x(kmin,kmax)) and sends it to the appropriate process.

* Each process does the necessary interpolation locally.

I've found that this performs just fine for many GiB of state and hundreds of
processors.  You have to get your hands a bit dirty with MPI.  Of course, there
are simpler (pure PETSc solutions) if you can fit the whole state on process
zero or if you can use the PETSc binary format.

Jed

On Sun 2008-07-06 11:22, Tonellot, Thierry-Laurent D wrote:
> Hi,
>
>
>
> At the very beginning of my application I need to read data in a database to
> initialize a 3D vector distributed over several processes.
>
> The database I'm using can only be accessed by one process (for instance the
> process 0). Moreover, due to performance issues, we need to limit the request
> to the database. Therefore the data need to be read by slices, for instance
> (z,x) slices.
>
> A typical initialization would then consist in the following pseudo code:
>
>
>
> Loop over y
>
>             If (rank=0)
>
>                         Read slice y
>
>                         Send to each process the appropriate part of the data
> slice
>
> Else
>
>             Receive data
>
> Endif
>
> End loop
>
>
>
> This process is quite heavy and its performances will probably depend on the
> way it is implemented.
>
>
>
> I'm wondering if there is any way to perform this initialization efficiently
> using Petsc?
>
>
>
> I'm also considering other packages to handle distributed arrays and I'm
> wondering how a package like global arrays compares with petsc/DA?
>
>
>
> For instance global arrays seem to have a feature which is partly solving my
> problem above using the function "ga_fill_patch" which fills only a region of
> the parallel vector and can be called by any process...
>
>
>
> Thank you in advance,
>
>
>
> Thierry
>