parallel netcdf - first experiences

rene.redler redler at ccrl-nece.de
Tue Jul 29 09:25:05 CDT 2003


Hello Rob,

> The -C in the Makefiles is my fault and I've gotten rid of it (but not
> in time for 0.8.6 ).  Could you tell me more about the 'other odd
> things'?  It may drive me crazy, but we're going to try to get the
> Makefiles to work with as many make implementations as possible.

As already mentioned in my first email NEC's install comes with a different 
syntax than the Linux install. Changing those line to something like 

test -d <path/directory> || mkdir -p <path/director> && chmod 755 <path/director>
cp -r <dir/files_to_copy> <path/directory> && chmod 644 <path/directory>/<dir/file>

not very elegant but (I guess) portable and should run at least under 
csh and sh.

The Makefile in /src/lib lines 101 to 104 cause some problems. Only the 
program is generated while the library files are not processed. The latter 
should be invoked in the include files but the NEC's make is not able to 
resolve this properly. The reason is still unclear to me.

The Makefile in src/libf has an empty PROGRAM and PROG_CSRCS which causes 
line 89 to fail. The reason for this is still unclear to me as well.

But as long as cross-compiling works (which is the case already) I am not 
too worried about this.

> > Nevertheless trying to test_read the netCDF files failed, and the ncmpi
> > routines returned and invalid id or dimension. Did anyone encounter a 
> > similar behavior on any other architecture?

I placed some printf in the source code to learn more about what is going 
wrong.

test_read.c calls ncmpi_open: returned ncid1 = 0
hdr_get_NC (called from ncmpi_open) returns -46=Invalid dimension id or name
NC_computeshapes (called from hdr_get_NC)
NC_var_shape (called from NC_computeshapes)

When printing *ip in the loop starting at line 379 *ip is first 0 and then 
16777216 which is obviously a wrong id.

Up to now I was not able to identify why dimids[] contains these wrong 
values. It could be due to the fact that on the SX pointer are of size 64 
bit, while ints are of size 32 bit.
 
> On ia32 linux, these tests all pass.  Could you send which tests you
> ran and how you ran them?

So it did on my laptop today. I ran test_int, but the same will probably 
happen with test_double and test_float which I briefly tried out yesterday 
as well (without success).
 
> Thanks for helping us shake out problems in our software!

I would rather come up with solutions than only with a list of problems. 
But as a 100% Fortran programmer I am hardly at my limits identifying 
those kind of problems in C.

Rene

   _______________________________________________________________

      René Redler
      C&C Research Laboratories
      NEC Europe Ltd.                   Tel: +49 (0)2241 925240
      Rathausallee 10                   Fax: +49 (0)2241 925299 
      53757 Sankt Augustin              URL: www.ccrl-nece.de/~redler
   _______________________________________________________________






More information about the parallel-netcdf mailing list