initial timings

Mon Aug 25 03:29:33 CDT 2003

Dear John,
I would like to make some remarks on your results:

First of all , thanks for posting your results.

Moreover, two months ago I was running some performance and throughput
measurements with ncrcat on
one of our  Altix 3000 servers. ncrcat is one of the NCO utilities. The setup
of those
measurements were such that several independent tasks of ncrcat were
processing replicated sets
of the same input data. The files were in the range of  1 GB and the
filesystem was stripped over several
FC disks and I/O channels.

I found that the performance of the serial NetCDF library 3.5.0 could be
increased significantly by
using an internal port of the FFIO library  (known from Cray machines )to the
IA64 version to Redhat 7.2. The FFIO can  perform an extra buffering or
caching for writing  and reading.
It is an advantage over the standard raw POSIX I/O which is used in the
serial NetCDF library, especially
for strided I/O patterns which need a lot of seek operations.

Do you know what kind of I/O statements are used in the MPI-I/O part of your
MPI library?

Anyway, your findings are very promising.

Best regards
    Reiner

Ps: Do you mind sending me your Fortran test? I was about to modify the test
code for C in order
to measure some performance numbers on a Altix 3000. I am happy to share the
results with you.

John Tannahill wrote:

> Some initial timing results for parallel netCDF =>
>
> File size:  216 MB
> Processors:  16 (5x3+1; lonxlat+master)
> Platform:   NERSC IBM-SP (seaborg)
> 2D domain decomposition (lon/lat)
> 600x600x150 (lonxlatxlev)
> real*4 array for I/O
> Fortran code
>
> Method 1:  Use serial netCDF.
>             Slaves all read their own data.
>             For output:
>               Slaves send their data to the Master (MPI)
>                 (all at once, no buffering; so file size restricted)
>               Master collects and outputs the data
>                 (all at once)
>
> Method 2:  Use ANL's parallel-netcdf, beta version 0.8.9.
>             Slaves all read their own data, but use parallel-netcdf calls.
>             For output:
>               Slaves all output their own data
>                 (all at once)
>
> Read results =>
>
>    Method 2 appears to be about 33% faster than Method 1.
>
> Write results =>
>
>    Method 2 appears to be about 6-7 times faster than Method 1.
>
> Note that these preliminary results are based on the parameters given
> above.  Next week, I hope to look at different machines, different
> file sizes (although I am memory limited on the Master as to how big
> I can go), different numbers of processors, etc.
>
> Anyway, things look promising.
>
> Regards,
> John
>
> --
> ============================
> John R. Tannahill
> Lawrence Livermore Nat. Lab.
> P.O. Box 808, M/S L-103
> Livermore, CA  94551
> 925-423-3514
> Fax:  925-423-4908
> ============================

--
--------------------------------------------------------------------------------
                                             _
                                           )/___                        _---_
                                         =_/(___)_-__                  (     )
                                        / /\\|/O[]/  \c             O   (   )
Reiner Vogelsang                        \__/ ----'\__/       ..o o O .o  -_-
Senior System Engineer

Silicon Graphics GmbH                   Home Office
Am Hochacker 3
D-85630 Grasbrunn                       52428 Juelich
Germany

Phone   +49-89-46108-0                  +49-2461-939265
Fax     +49-89-46108-222                +49-2461-939266
Mobile  +49-171-3583208
email    reiner at sgi.com