[MOAB-dev] ITAPS and string handling

Jed Brown jed at 59A2.org
Sun Nov 15 11:05:22 CST 2009


In Fortran, the size of the buffer is passed explicitly and the extra
characters are padded with ' '.  There should be no NULL characters.
None of CGM's iGeom interface, nor iMesh_createTag, iMesh_getTagName, or
iMesh_getTagHandle correctly handles the trailing whitespace.

In C, ITAPS requires us to explicitly pass the "length" which one might
expect to be the size of the buffer.  Then std::string(char*,size_t)
picks up the NULL character and undefined garbage following it in the
buffer.  This causes inconsistency particularly when mixing C and C++
string handling.

MOAB and CGM pad with NULL (though not consistently,
e.g. iMesh_getTagName, iMesh_getError, and iGeom_load may not even null
terminate), write a character off the end of the array
(iMesh_getDescription), read off the end of Fortran arrays
(iMesh_createTag and iMesh_getTagHandle), and can leave a junk character
at the end of the array (iMesh_getDescription).  Also, iMesh_setError
can overwrite it's field and will reliably fail to null-terminate the
result when called from Fortran.

Even if the obvious string-handling bugs are fixed and we assume that
the Fortran runtime doesn't mind the NULL characters, we still have the
problem of improperly truncating Fortran strings (to ensure that they
are NULL terminated).  That is, a Fortran developer would expect to be
able to allocate a string of exactly the correct length and have all the
characters used.

The current state is that the interface is awkward to use from C and
(even if implemented consistently) cannot behave as expected from
Fortran.  I would be strongly in favor of adding one level of
indirection to the calls that involve string handling, thus allowing a
native interface from C and Fortran. [*]

Note that in Fortran the string length is passed by value, therefore the
iGeom_getFaceType() declaration is wrong.  Since it is almost a
guaranteed seg-fault, I suspect that this function has never been called
from Fortran with any implementation.

Jed


[*] Actually, I would put the indirection in for every call from Fortran
because it's more pleasant to use a native interface and the runtime
cost of wrappers like

  void foo_(double*a,int*b,double*c,int*d,int*e) { *e = Foo(a,*b,c,*d); }

is very small.

On my machine the fastest calling convention is to pass by value and
return the error code, it costs four extra cycles to pass by reference
(i.e. to call foo_() with Foo() inlined instead of calling Foo()
directly), and an additional 6 if Foo() is not inlined.  The exact
counts are sensitive to stack alignment, but it will always be less than
10 cycles and it would take a very contrived ITAPS use case for this to
be measurable.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 261 bytes
Desc: OpenPGP digital signature
URL: <http://lists.mcs.anl.gov/pipermail/moab-dev/attachments/20091115/7adedcaf/attachment.pgp>


More information about the moab-dev mailing list