p-netcdf or netcdf4?

Rob Latham robl at mcs.anl.gov
Tue Jun 15 15:02:55 CDT 2010


On Tue, Jun 15, 2010 at 08:43:32PM +0100, michael wrote:
> Rob, glad to see it's far from contentious. Just to clarify that page
> (noting I'm not expect in the diffs from netcdf v3 to v4...):
> 
>  if I have netcdf v3 installed I can
>    (a) replace some netcdf (v3) calls with p-netcdf calls, link to
> p-netcdf, and get some parallelism (for an underlying parallel file sys)

There's a little bit more to it, but you've got the general idea.
You also need a way to decompose your variables across all N processes
to actually get any parallelism.    

Well, that's true no matter which library or approach you use to get
parallel I/O.  The "vara" and "vars" routines are used extensively for
this purpose.

>  if I have netcdf v4 installed I can
>    (a) amend my code from v3 to v4 then use the HDF5 "layer" to get some
> parallelism

If you don't do anything about how you split data among your parallel
processors, then the HDF5 layer only relaxes file format limitations.

>    (b) amend my code from v3 to v4, ensure I've p-netcdf installed, then
> use the p-netcdf "layer" to get some parallelism
> 
> And I think Jim's just said (b) uses 10 times less memory than (a) but I
> didn't pick up if there's much difference in I/O speed?

The file format shared by parallel-netcdf and traditional netcdf (v3)
is a good deal simpler than the file format shared by HDF5 and
netcdf (v4).  The more sophisticated file format allows for more
flexibility, but with some cost.   

A factor of 10 sounds kind of high, but bear in mind that when
comparing relative overhead there is almost no additional memory
overhead going through pnetcdf.  Even a small additional cache will
look relatively large by comparison.

I'm glad you're asking these important questions, and I hope I'm
explaining everything adequately.   Do not hesitate to follow up if
anything remains unclear.

==rob

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA


More information about the parallel-netcdf mailing list