example timings
Rob Ross
rross at mcs.anl.gov
Thu Jul 26 12:56:17 CDT 2007
Hi Michael,
It is likely that the parallel netcdf performance is flattening out
because you're writing to a file system with only two servers (as
Wei-Keng helped you discover in a subsequent email). It would be helpful
to know what I/O rate you're getting there (eg. X GB/sec) and to also
know what sort of peak rate you get with an I/O benchmark like IOR.
Without knowing how close to "peak" I/O you are, and just looking at
this graph, I'd say that this behavior is pretty good -- you hit a much
higher peak I/O rate, and it is sustained for large numbers of
processes. That's all you can ask for from an I/O library, given limited
I/O hardware.
Regards,
Rob
michael bane wrote:
> I've attached my example code and a plot of example timings (raw data
> available if anybody wants it) which seem to raise a few points. The
> hardware is a Bull badged Itanium2 box, with nodes connected by Quadrics
> QsNetII and each node is 4 dual core chips (details:
> http://www.mc.manchester.ac.uk/services/hpc/hardware) and it's running a
> version of mpich2.
>
> I believe my example code is appropriate but I'm happy to hear about
> bugs/improvements
>
> a) there's 10% variation in run times (!!!) -- see 'initialisation'
> which is serial and done on rank0 processor (also see serial-netcdf)
>
> b) parallel netcdf is indeed faster than serial (ie 'normal) netcdf for
>> 2 processors, but only scales reasonably to about 8 cores -- this is
> disappointing. Any thoughts on this? I don't think it's the interconnect
> since the times level off rather than drop further...
>
> c) the "gather" is my implementation of gathering data off all MPI
> processes and then writing (serial) netcdf file - most of this time is
> file I/O not comms gathering the data
>
> d) 'serial_p_total' is using parallel netcdf "collective" mode but only
> rank0 process writing all the data. Not quite sure why it's so bad or
> takes longer as #PEs increase. Again any input is welcomed!!!
>
> e) the example code is attached and may well include errors or mistakes
> but so please don't pass this around without prior permission. If people
> want I'm happy to provide once any discussion over the code/results has
> concluded.
>
> I think it would be useful if there was a small repository of example
> testcases/benchmarks for (future) users...
>
>
>
> ------------------------------------------------------------------------
>
More information about the parallel-netcdf
mailing list