sieve-dev a few questions on memory usage of isieve code

Wed Dec 10 11:13:04 CST 2008

HI there,

Just in case this thread is missed, I am sending it again. I would greatly
appreciate your advice here since I need to decide  my next  task.
Thank you very much.

Shi

On Tue, Dec 2, 2008 at 7:18 PM, Shi Jin <jinzishuai at gmail.com> wrote:

> Hi Matt,
>
> Thank you for your fix. I am able to build petsc-dev now without any
> problem.
>
> First, I want to let you know that now I am able to run very large
> simulations using isieve. Previously, we have a big limitation on the
> problem size since in order to distribute a large 2nd order finite element
> mesh, we need to find a large shared memory machine to do that. I was able
> to improve the situation a little bit by storing the distributed sieve data
> structured in files and load them in parallel on distributed memory
> clusters. However, it does not eliminate the need of a large shared memory
> machine at the very early stage. Recently, I eliminated this need based on
> the fact that so far our simulations are done in cylinders (round or
> rectangular) and there is no need to have different meshing along the axis
> direction, and in order to make particle tracking easier, our domain
> interface are simple plane surfaces perpendicular to the axis. So basically
> along  the axial axis, I simply have the repetitive mesh with a shift. I was
> able to reconstruct the unstrcutred mesh on each process based on the sieve
> data generated for a two-process distributed mesh since the slave nodes have
> basically the same topology. Now I can arbitrarily introduce as many
> processor as we want, making the whole domain much longer. This is not a
> general solution but it suits our need perfectly, at least for the time
> being.  If you are interested in my implementation, I am happy to share it.
>
> That said, I still face a bit of memory issue since I want to have as many
> elements per process as possible. I did a detailed profiling of the memory
> usage for a serial code (the parallel version is identical on each process)
> with 259,200 elements (379,093 second order nodes). The decomposition looks
> like the following:
> *sieve mesh:                           *290MB*
> discretization                         *25MB
> *uvwp                                   *   270MB *
> global order-p                        *90MB
> *global order-vel                     *132MB
> *caching                                  *25MB
> *USG->CFD: dumping mesh     *72MB*
> matrix/vector                          *557MB
> *---------------------------------------------
> Total:                                      *1461MB*
> *
> Where the matrix/vector is already as good as it can ever get but I think
> there is room for improvement on the other parts (highlighted in red).
>
> 1. About the sieve mesh: there is not much to be done since you have
> already optimized it. But I think it would be nice if we have the choice of
> not to include faces in the data structured. In fact,  among all points in
> the data structure, face points take about 50%. Since my code does not need
> faces, this will save quite significant amount of memory. Also I think it is
> a good feature to have for a general library  allowing users to choose the
> level of interpolation.  However, this is not critical.
>
> 2. About the global orders: I think I mentioned this before and was told
> that right now the global orders are stored in the old fashion and thus not
> optimized. It is possible to make use of continuous memory just as isieve
> does and it will surely save a lot of memory. I guess this is just a matter
> of time for it to happen, right?
>
> 3. Finally the fields: I am still using the idea of fibration. The u, v, w
> and p fields are obtained as fibrations of a single field s. It has worked
> very well but I see it takes a lot of memory to create,  llmost as much as
> the mesh itself. I remember you told me there is a new way. Create one mesh
> for each field and the meshes can share the same sieve data. And I think you
> have optimized it as ifield, right? But I have not tried it yet. This is the
> real pressing question I am asking here: how do the memory usages compare
> with each other for the two ways of building fields? If a lot of memory can
> be saved, then I am definitely going to switch to the new method. And I
> would love to have more guidance on how to implement it.
>
> Again, thank you very much for your help.
> --
> Sincerely,
> Shi Jin, Ph.D.
> http://www.ualberta.ca/~sjin1/ <http://www.ualberta.ca/%7Esjin1/>
>

-- 
Sincerely,
Shi Jin, Ph.D.
http://www.ualberta.ca/~sjin1/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.mcs.anl.gov/mailman/private/sieve-dev/attachments/20081210/c998dace/attachment.htm>