Dear all,<br><br>in order to calculate speedup (Sp = T1/Tp) I need an

accurate measurement of T1, the time to solve on 1 processor. I will be

using the parallel algorithm for that, but there seems to be a hick-up.

<br>

<br>At the cluster I am currently working on, each node is made up by

12 PEs and have shared memory. When I would just reserve 1 PE for my

job, the other 11 processors are given to other users, therefore giving

dynamic load on the memory system resulting into inaccurate timings.

The solve-times I get are ranging between 1 and 5 minutes. For me, this

is not very scientific either. <br>

<br>The second idea was to reserve all 12 PEs on the node and just let

1 PE run the job. However, in this way the single CPU gets all the

memory bandwidth and has no waiting time, therefore giving very fast

results. When I would calculate speedup from here, the algorithm does

not scale very well. <br>

<br>Another idea would be to spawn 12 identical jobs on 12 PEs and take

the average runtime. Unfortunately, there is only one PETSC_COMM_WORLD,

so I think this is impossible to do from within one program

(MPI_COMM_WORLD). <br>

<br>Do you fellow PETSc-users have any ideas on the subject? It would be much appreciated. <br><br>regards,<br><font color="#888888"><br>Leo van Kampenhout</font>