[gdsjaar at sandia.gov: [netcdfgroup] strlen calls in NC_finddim and NC_findvar]
Rob Latham
robl at mcs.anl.gov
Fri Dec 4 09:10:35 CST 2009
Greg S. found something noteworthy on the serial netcdf list. We do
something similar (not surprising: i'm sure our NC_finddim and
NC_findvar functions are 99% unchanged from serial netcdf)
In NC_finddim we have a call to strlen as part of the condition of a
for loop. If there are a lot of dimensions as in Greg's case, then
yeah, we too would call strlen a lot.
http://trac.mcs.anl.gov/projects/parallel-netcdf/browser/trunk/src/lib/dim.c#L135
our ncmpii_NC_findvar calls strlen inside a loop for each variable in
a dataset.
http://trac.mcs.anl.gov/projects/parallel-netcdf/browser/trunk/src/lib/var.c#L317
How common are datasets with thousands of dimensions and thousands of
variables?
In a followup message, Greg found at least one case where "size" was
not the same as strlen(name) for one of these NC_dim types, so it
looks like the easy optimization won't work out after all.
The status quo isn't awful if you've got a small number of dimensions
and variables: if anybody else has a dataset like Greg's, though,
reply to this email and we'll put optimzing this workload on the todo
list.
thanks
==rob
----- Forwarded message from Greg Sjaardema <gdsjaar at sandia.gov> -----
Sender: netcdfgroup-bounces at unidata.ucar.edu
From: Greg Sjaardema <gdsjaar at sandia.gov>
Subject: [netcdfgroup] strlen calls in NC_finddim and NC_findvar
Date: Thu, 3 Dec 2009 15:41:49 -0700
Message-ID: <4B183EAD.20808 at sandia.gov>
User-Agent: Thunderbird 2.0.0.23 (X11/20090812)
To: "netcdfgroup at unidata.ucar.edu" <netcdfgroup at unidata.ucar.edu>
X-Spam-Status: No, score=-2.599 tagged_above=-10 required=6.6
tests=[BAYES_00=-2.599]
Delivered-To: netcdfgroup at conanmail.unidata.ucar.edu
Delivered-To: netcdfgroup at unidata.ucar.edu
I have a monstrous file with several thousand dimensions and variables
which is running slower than it should. I investigated the runtime
and found that strlen was the major time user in the NC_finddim and
NC_findvar calls. The obvious optimization was to cache the length of
the name instead of calling strlen each time. However, when I went to
do this, I discovered that the length is already cached as the nchars
field in the NC_string struct.
I did some checks in the code and also added some assertions to the
code and verified that, as far as I can tell, nchars is the correct
length of the string. Is there a reason that it isn't used and
strlen() is called instead? Switching the code to use nchars dropped
my execution time from 20 units to 6 units. I would like to make the
switch, but wondered if there was some strange corner case where the
nchars value is incorrect and will cause problems.
Thanks,
--Greg
_______________________________________________
netcdfgroup mailing list
netcdfgroup at unidata.ucar.edu
For list information or to unsubscribe, visit:
http://www.unidata.ucar.edu/mailing_lists/
----- End forwarded message -----
--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
More information about the parallel-netcdf
mailing list