example timings

Rob Ross rross at mcs.anl.gov
Thu Jul 26 12:56:17 CDT 2007


Hi Michael,

It is likely that the parallel netcdf performance is flattening out 
because you're writing to a file system with only two servers (as 
Wei-Keng helped you discover in a subsequent email). It would be helpful 
to know what I/O rate you're getting there (eg. X GB/sec) and to also 
know what sort of peak rate you get with an I/O benchmark like IOR.

Without knowing how close to "peak" I/O you are, and just looking at 
this graph, I'd say that this behavior is pretty good -- you hit a much 
higher peak I/O rate, and it is sustained for large numbers of 
processes. That's all you can ask for from an I/O library, given limited 
I/O hardware.

Regards,

Rob

michael bane wrote:
> I've attached my example code and a plot of example timings (raw data
> available if anybody wants it) which seem to raise a few points. The
> hardware is a Bull badged Itanium2 box, with nodes connected by Quadrics
> QsNetII and each node is 4 dual core chips (details:
> http://www.mc.manchester.ac.uk/services/hpc/hardware) and it's running a
> version of mpich2.
> 
> I believe my example code is appropriate but I'm happy to hear about
> bugs/improvements
> 
> a) there's 10% variation in run times (!!!) -- see 'initialisation'
> which is serial and done on rank0 processor (also see serial-netcdf)
> 
> b) parallel netcdf is indeed faster than serial (ie 'normal) netcdf for
>> 2 processors, but only scales reasonably to about 8 cores -- this is
> disappointing. Any thoughts on this? I don't think it's the interconnect
> since the times level off rather than drop further...
> 
> c) the "gather" is my implementation of gathering data off all MPI
> processes and then writing (serial) netcdf file - most of this time is
> file I/O not comms gathering the data
> 
> d) 'serial_p_total' is using parallel netcdf "collective" mode but only
> rank0 process writing all the data. Not quite sure why it's so bad or
> takes longer as #PEs increase. Again any input is welcomed!!!
> 
> e) the example code is attached and may well include errors or mistakes
> but so please don't pass this around without prior permission. If people
> want I'm happy to provide once any discussion over the code/results has
> concluded.
> 
> I think it would be useful if there was a small repository of example
> testcases/benchmarks for (future) users...
> 
> 
> 
> ------------------------------------------------------------------------
> 




More information about the parallel-netcdf mailing list