initial timings
Rob Ross
rross at mcs.anl.gov
Mon Aug 25 08:15:19 CDT 2003
Hi Reiner,
That will be IBM's MPI-IO library, which is specifically tuned for the
parallel file system, GPFS, available on that platform (SP).
Regards,
Rob
On Mon, 25 Aug 2003, Reiner Vogelsang wrote:
> Dear John,
> I would like to make some remarks on your results:
>
> First of all , thanks for posting your results.
>
> Moreover, two months ago I was running some performance and throughput
> measurements with ncrcat on one of our Altix 3000 servers. ncrcat is one
> of the NCO utilities. The setup of those measurements were such that
> several independent tasks of ncrcat were processing replicated sets of
> the same input data. The files were in the range of 1 GB and the
> filesystem was stripped over several FC disks and I/O channels.
>
> I found that the performance of the serial NetCDF library 3.5.0 could be
> increased significantly by using an internal port of the FFIO library
> (known from Cray machines )to the IA64 version to Redhat 7.2. The FFIO
> can perform an extra buffering or caching for writing and reading. It is
> an advantage over the standard raw POSIX I/O which is used in the serial
> NetCDF library, especially for strided I/O patterns which need a lot of
> seek operations.
>
> Do you know what kind of I/O statements are used in the MPI-I/O part of your
> MPI library?
>
> Anyway, your findings are very promising.
>
> Best regards
> Reiner
>
> Ps: Do you mind sending me your Fortran test? I was about to modify the
> test code for C in order to measure some performance numbers on a Altix
> 3000. I am happy to share the results with you.
>
> John Tannahill wrote:
>
> > Some initial timing results for parallel netCDF =>
> >
> > File size: 216 MB
> > Processors: 16 (5x3+1; lonxlat+master)
> > Platform: NERSC IBM-SP (seaborg)
> > 2D domain decomposition (lon/lat)
> > 600x600x150 (lonxlatxlev)
> > real*4 array for I/O
> > Fortran code
> >
> > Method 1: Use serial netCDF.
> > Slaves all read their own data.
> > For output:
> > Slaves send their data to the Master (MPI)
> > (all at once, no buffering; so file size restricted)
> > Master collects and outputs the data
> > (all at once)
> >
> > Method 2: Use ANL's parallel-netcdf, beta version 0.8.9.
> > Slaves all read their own data, but use parallel-netcdf calls.
> > For output:
> > Slaves all output their own data
> > (all at once)
> >
> > Read results =>
> >
> > Method 2 appears to be about 33% faster than Method 1.
> >
> > Write results =>
> >
> > Method 2 appears to be about 6-7 times faster than Method 1.
> >
> > Note that these preliminary results are based on the parameters given
> > above. Next week, I hope to look at different machines, different
> > file sizes (although I am memory limited on the Master as to how big
> > I can go), different numbers of processors, etc.
> >
> > Anyway, things look promising.
> >
> > Regards,
> > John
More information about the parallel-netcdf
mailing list