bad performance

Fri Nov 20 09:36:48 CST 2009

On Fri, Nov 20, 2009 at 04:32:35PM +0100, Joeckel, Patrick wrote:
> Dear parallel-netCDF developers and -users,
> 
> I recently finished a test implementation of
> parallel-netCDF in our Chemistry-Climate-Model
> using the recent version 1.0.3 of the library.
> 
> Running it on an IBM Power6 system
> 
> uname -a
> Linux p6012 2.6.16.60-0.42.7.1-ppc64 #1 SMP Tue Nov 3 12:20:42 CET
> 2009 ppc64 ppc64 ppc64 GNU/Linux
> 
> on 64 CPUs (i.e. 2 nodes), I detected that our "classical" gathering
> on one CPU and output through the serial netCDF interface
> is - for a typical application - 4.5 times faster (wall clock for
> I/O output time step) than the parallel-netCDF implementation.
> 
> I guess (I hope !) that I am doing something wrong and further that
> you can help me.
> I appreciate any suggestion and I am happy to provide more
> information that can help me to solve the problem.

Hi Patrick.

Can you send 'ncmpidump -h' or 'ncdump -h' of a typical dataset?

How is data decomposed across these 64 CPus? Do you write out entire
variables at a time, or faces, or sub-cubes?

Are you writing to GPFS, NFS, or some other file system?

This is linux... what MPI library are you using? IBM's PE or MPICH2,
or OpenMPI?

Thanks
==rob

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA