initial timings

Mon Aug 25 08:15:19 CDT 2003

Hi Reiner,

That will be IBM's MPI-IO library, which is specifically tuned for the 
parallel file system, GPFS, available on that platform (SP).

Regards,

Rob

On Mon, 25 Aug 2003, Reiner Vogelsang wrote:

> Dear John,
> I would like to make some remarks on your results:
> 
> First of all , thanks for posting your results.
> 
> Moreover, two months ago I was running some performance and throughput
> measurements with ncrcat on one of our Altix 3000 servers. ncrcat is one
> of the NCO utilities. The setup of those measurements were such that
> several independent tasks of ncrcat were processing replicated sets of
> the same input data. The files were in the range of 1 GB and the
> filesystem was stripped over several FC disks and I/O channels.
> 
> I found that the performance of the serial NetCDF library 3.5.0 could be
> increased significantly by using an internal port of the FFIO library
> (known from Cray machines )to the IA64 version to Redhat 7.2. The FFIO
> can perform an extra buffering or caching for writing and reading. It is
> an advantage over the standard raw POSIX I/O which is used in the serial
> NetCDF library, especially for strided I/O patterns which need a lot of
> seek operations.
> 
> Do you know what kind of I/O statements are used in the MPI-I/O part of your
> MPI library?
> 
> Anyway, your findings are very promising.
> 
> Best regards
>     Reiner
> 
> Ps: Do you mind sending me your Fortran test? I was about to modify the
> test code for C in order to measure some performance numbers on a Altix
> 3000. I am happy to share the results with you.
> 
> John Tannahill wrote:
> 
> > Some initial timing results for parallel netCDF =>
> >
> > File size:  216 MB
> > Processors:  16 (5x3+1; lonxlat+master)
> > Platform:   NERSC IBM-SP (seaborg)
> > 2D domain decomposition (lon/lat)
> > 600x600x150 (lonxlatxlev)
> > real*4 array for I/O
> > Fortran code
> >
> > Method 1:  Use serial netCDF.
> >             Slaves all read their own data.
> >             For output:
> >               Slaves send their data to the Master (MPI)
> >                 (all at once, no buffering; so file size restricted)
> >               Master collects and outputs the data
> >                 (all at once)
> >
> > Method 2:  Use ANL's parallel-netcdf, beta version 0.8.9.
> >             Slaves all read their own data, but use parallel-netcdf calls.
> >             For output:
> >               Slaves all output their own data
> >                 (all at once)
> >
> > Read results =>
> >
> >    Method 2 appears to be about 33% faster than Method 1.
> >
> > Write results =>
> >
> >    Method 2 appears to be about 6-7 times faster than Method 1.
> >
> > Note that these preliminary results are based on the parameters given
> > above.  Next week, I hope to look at different machines, different
> > file sizes (although I am memory limited on the Master as to how big
> > I can go), different numbers of processors, etc.
> >
> > Anyway, things look promising.
> >
> > Regards,
> > John