[petsc-dev] profiling question
Leo van Kampenhout
lvankampenhout at gmail.com
Tue Sep 21 03:41:14 CDT 2010
Dear all,
in order to calculate speedup (Sp = T1/Tp) I need an accurate measurement of
T1, the time to solve on 1 processor. I will be using the parallel algorithm
for that, but there seems to be a hick-up.
At the cluster I am currently working on, each node is made up by 12 PEs and
have shared memory. When I would just reserve 1 PE for my job, the other 11
processors are given to other users, therefore giving dynamic load on the
memory system resulting into inaccurate timings. The solve-times I get are
ranging between 1 and 5 minutes. For me, this is not very scientific either.
The second idea was to reserve all 12 PEs on the node and just let 1 PE run
the job. However, in this way the single CPU gets all the memory bandwidth
and has no waiting time, therefore giving very fast results. When I would
calculate speedup from here, the algorithm does not scale very well.
Another idea would be to spawn 12 identical jobs on 12 PEs and take the
average runtime. Unfortunately, there is only one PETSC_COMM_WORLD, so I
think this is impossible to do from within one program (MPI_COMM_WORLD).
Do you fellow PETSc-users have any ideas on the subject? It would be much
appreciated.
regards,
Leo van Kampenhout
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20100921/f94e5a85/attachment.html>
More information about the petsc-dev
mailing list