[MOAB-dev] ITAPS and string handling

Tim Tautges tautges at mcs.anl.gov
Tue Nov 17 15:36:08 CST 2009


[cc'ing tstt-interface too, since it's really an itaps question...]

General statement: I've wondered about this general issue (string handling) from time to time too.  The reason it hasn't 
really come up much is that very few people, inside or outside the ITAPS project, use the interfaces directly from 
fortran.  MOAB handles these issues sometimes, mostly as we've encountered them but not consistently.  Your presumption 
about CGM not being used from fortran is correct, I believe.

I'm on the fence about how to address this.  Various options, in increasing degree of change, would be:
a. Be more careful about terminating with NULL in all places (question: does anybody know whether NULLs in strings are 
handled ok in typical Fortran runtimes?) where the string length allows, and be careful about not reading/writing off 
the end of strings.

b. Change all out-type strings in the interface to be handled more like dynamically-allocated arrays, and be careful in 
implementations to copy them / add termination where necessary.

c. Use wrapper functions.

In general I'm opposed to wrappers, and I'd resist that strongly at this point in itaps.  Between a and b above, I could 
go either way, depending on how fortran typically handles NULL in strings.  I've always thought string arguments as 
fixed-length has been a pain, but at the time I proposed them didn't think changing them to array-like arguments was 
worth the trouble.

- tim

Jed Brown wrote:
> In Fortran, the size of the buffer is passed explicitly and the extra
> characters are padded with ' '.  There should be no NULL characters.
> None of CGM's iGeom interface, nor iMesh_createTag, iMesh_getTagName, or
> iMesh_getTagHandle correctly handles the trailing whitespace.
> 
> In C, ITAPS requires us to explicitly pass the "length" which one might
> expect to be the size of the buffer.  Then std::string(char*,size_t)
> picks up the NULL character and undefined garbage following it in the
> buffer.  This causes inconsistency particularly when mixing C and C++
> string handling.
> 
> MOAB and CGM pad with NULL (though not consistently,
> e.g. iMesh_getTagName, iMesh_getError, and iGeom_load may not even null
> terminate), write a character off the end of the array
> (iMesh_getDescription), read off the end of Fortran arrays
> (iMesh_createTag and iMesh_getTagHandle), and can leave a junk character
> at the end of the array (iMesh_getDescription).  Also, iMesh_setError
> can overwrite it's field and will reliably fail to null-terminate the
> result when called from Fortran.
> 
> Even if the obvious string-handling bugs are fixed and we assume that
> the Fortran runtime doesn't mind the NULL characters, we still have the
> problem of improperly truncating Fortran strings (to ensure that they
> are NULL terminated).  That is, a Fortran developer would expect to be
> able to allocate a string of exactly the correct length and have all the
> characters used.
> 
> The current state is that the interface is awkward to use from C and
> (even if implemented consistently) cannot behave as expected from
> Fortran.  I would be strongly in favor of adding one level of
> indirection to the calls that involve string handling, thus allowing a
> native interface from C and Fortran. [*]
> 
> Note that in Fortran the string length is passed by value, therefore the
> iGeom_getFaceType() declaration is wrong.  Since it is almost a
> guaranteed seg-fault, I suspect that this function has never been called
> from Fortran with any implementation.
> 
> Jed
> 
> 
> [*] Actually, I would put the indirection in for every call from Fortran
> because it's more pleasant to use a native interface and the runtime
> cost of wrappers like
> 
>   void foo_(double*a,int*b,double*c,int*d,int*e) { *e = Foo(a,*b,c,*d); }
> 
> is very small.
> 
> On my machine the fastest calling convention is to pass by value and
> return the error code, it costs four extra cycles to pass by reference
> (i.e. to call foo_() with Foo() inlined instead of calling Foo()
> directly), and an additional 6 if Foo() is not inlined.  The exact
> counts are sensitive to stack alignment, but it will always be less than
> 10 cycles and it would take a very contrived ITAPS use case for this to
> be measurable.
> 

-- 
================================================================
"You will keep in perfect peace him whose mind is
   steadfast, because he trusts in you."               Isaiah 26:3

              Tim Tautges            Argonne National Laboratory
          (tautges at mcs.anl.gov)      (telecommuting from UW-Madison)
          phone: (608) 263-8485      1500 Engineering Dr.
            fax: (608) 263-4499      Madison, WI 53706



More information about the moab-dev mailing list