[MOAB-dev] ITAPS and string handling

Tue Nov 17 20:57:11 CST 2009

I'll tell you what we do, since I think it's pretty cool...

Doxygen, which you use to generate your docs (at least for ITAPS), can  
dump out an xml description of the functions that are documented. We  
read that into another program and automatically generate wrappers for  
other languages (have done Python, Lua, and Java), this writes the  
code that takes care of all the conversion issues between languages.  
This does require the interface be written consistently (which is good  
anyhow) and occasionally requires hints to the code that does the  
generation (which can be done with Doxygen using certain comment tags  
that only show up in the xml).

If the objection to wrappers is maintaining them, this kind of  
approach does solve that. If the objection to wrappers is extra cycles  
in execution time for another function call, then just ignore this  
(our target is mainly for integrating with higher level functionality  
so this is fine for us).

mark

On Nov 17, 2009, at 4:36 PM, Tim Tautges wrote:

> [cc'ing tstt-interface too, since it's really an itaps question...]
>
> General statement: I've wondered about this general issue (string  
> handling) from time to time too.  The reason it hasn't really come  
> up much is that very few people, inside or outside the ITAPS  
> project, use the interfaces directly from fortran.  MOAB handles  
> these issues sometimes, mostly as we've encountered them but not  
> consistently.  Your presumption about CGM not being used from  
> fortran is correct, I believe.
>
> I'm on the fence about how to address this.  Various options, in  
> increasing degree of change, would be:
> a. Be more careful about terminating with NULL in all places  
> (question: does anybody know whether NULLs in strings are handled ok  
> in typical Fortran runtimes?) where the string length allows, and be  
> careful about not reading/writing off the end of strings.
>
> b. Change all out-type strings in the interface to be handled more  
> like dynamically-allocated arrays, and be careful in implementations  
> to copy them / add termination where necessary.
>
> c. Use wrapper functions.
>
> In general I'm opposed to wrappers, and I'd resist that strongly at  
> this point in itaps.  Between a and b above, I could go either way,  
> depending on how fortran typically handles NULL in strings.  I've  
> always thought string arguments as fixed-length has been a pain, but  
> at the time I proposed them didn't think changing them to array-like  
> arguments was worth the trouble.
>
> - tim
>
> Jed Brown wrote:
>> In Fortran, the size of the buffer is passed explicitly and the extra
>> characters are padded with ' '.  There should be no NULL characters.
>> None of CGM's iGeom interface, nor iMesh_createTag,  
>> iMesh_getTagName, or
>> iMesh_getTagHandle correctly handles the trailing whitespace.
>> In C, ITAPS requires us to explicitly pass the "length" which one  
>> might
>> expect to be the size of the buffer.  Then std::string(char*,size_t)
>> picks up the NULL character and undefined garbage following it in the
>> buffer.  This causes inconsistency particularly when mixing C and C++
>> string handling.
>> MOAB and CGM pad with NULL (though not consistently,
>> e.g. iMesh_getTagName, iMesh_getError, and iGeom_load may not even  
>> null
>> terminate), write a character off the end of the array
>> (iMesh_getDescription), read off the end of Fortran arrays
>> (iMesh_createTag and iMesh_getTagHandle), and can leave a junk  
>> character
>> at the end of the array (iMesh_getDescription).  Also, iMesh_setError
>> can overwrite it's field and will reliably fail to null-terminate the
>> result when called from Fortran.
>> Even if the obvious string-handling bugs are fixed and we assume that
>> the Fortran runtime doesn't mind the NULL characters, we still have  
>> the
>> problem of improperly truncating Fortran strings (to ensure that they
>> are NULL terminated).  That is, a Fortran developer would expect to  
>> be
>> able to allocate a string of exactly the correct length and have  
>> all the
>> characters used.
>> The current state is that the interface is awkward to use from C and
>> (even if implemented consistently) cannot behave as expected from
>> Fortran.  I would be strongly in favor of adding one level of
>> indirection to the calls that involve string handling, thus  
>> allowing a
>> native interface from C and Fortran. [*]
>> Note that in Fortran the string length is passed by value,  
>> therefore the
>> iGeom_getFaceType() declaration is wrong.  Since it is almost a
>> guaranteed seg-fault, I suspect that this function has never been  
>> called
>> from Fortran with any implementation.
>> Jed
>> [*] Actually, I would put the indirection in for every call from  
>> Fortran
>> because it's more pleasant to use a native interface and the runtime
>> cost of wrappers like
>>  void foo_(double*a,int*b,double*c,int*d,int*e) { *e =  
>> Foo(a,*b,c,*d); }
>> is very small.
>> On my machine the fastest calling convention is to pass by value and
>> return the error code, it costs four extra cycles to pass by  
>> reference
>> (i.e. to call foo_() with Foo() inlined instead of calling Foo()
>> directly), and an additional 6 if Foo() is not inlined.  The exact
>> counts are sensitive to stack alignment, but it will always be less  
>> than
>> 10 cycles and it would take a very contrived ITAPS use case for  
>> this to
>> be measurable.
>
> -- 
> ================================================================
> "You will keep in perfect peace him whose mind is
>  steadfast, because he trusts in you."               Isaiah 26:3
>
>             Tim Tautges            Argonne National Laboratory
>         (tautges at mcs.anl.gov)      (telecommuting from UW-Madison)
>         phone: (608) 263-8485      1500 Engineering Dr.
>           fax: (608) 263-4499      Madison, WI 53706
>