example timings
Wei-keng Liao
wkliao at ece.northwestern.edu
Wed Jul 18 12:04:23 CDT 2007
Michael,
I noticed that your codes write the output files to your home directory
which is NFS mounted. Usually, NFS performance for parallel I/O is very
slow.
Can you try write to the Lustre file system? I can see Lustre is available
on your machine. You may also want to configure a Lustre directory with
multiple stripe counts to increase the I/O performance.
Wei-keng
On Wed, 18 Jul 2007, michael bane wrote:
> I've attached my example code and a plot of example timings (raw data
> available if anybody wants it) which seem to raise a few points. The
> hardware is a Bull badged Itanium2 box, with nodes connected by Quadrics
> QsNetII and each node is 4 dual core chips (details:
> http://www.mc.manchester.ac.uk/services/hpc/hardware) and it's running a
> version of mpich2.
>
> I believe my example code is appropriate but I'm happy to hear about
> bugs/improvements
>
> a) there's 10% variation in run times (!!!) -- see 'initialisation'
> which is serial and done on rank0 processor (also see serial-netcdf)
>
> b) parallel netcdf is indeed faster than serial (ie 'normal) netcdf for
>> 2 processors, but only scales reasonably to about 8 cores -- this is
> disappointing. Any thoughts on this? I don't think it's the interconnect
> since the times level off rather than drop further...
>
> c) the "gather" is my implementation of gathering data off all MPI
> processes and then writing (serial) netcdf file - most of this time is
> file I/O not comms gathering the data
>
> d) 'serial_p_total' is using parallel netcdf "collective" mode but only
> rank0 process writing all the data. Not quite sure why it's so bad or
> takes longer as #PEs increase. Again any input is welcomed!!!
>
> e) the example code is attached and may well include errors or mistakes
> but so please don't pass this around without prior permission. If people
> want I'm happy to provide once any discussion over the code/results has
> concluded.
>
> I think it would be useful if there was a small repository of example
> testcases/benchmarks for (future) users...
>
> --
> Michael Bane
> Centre for Atmospheric Science
> University of Manchester, U.K.
> http://cloudbase.phy.umist.ac.uk/people/bane/bane.htm
>
More information about the parallel-netcdf
mailing list