[ExM Users] mkstatic questions

Ketan Maheshwari ketan at mcs.anl.gov
Fri May 30 13:18:39 CDT 2014


Thanks! to narrow down and eliminate application-adlb MPI issues I am now
trying to rebuild the application with bgxlc++ instead of bgmpixlcxx which
it was built originally. Will keep you posted on how things work out.


On Fri, May 30, 2014 at 12:49 PM, Tim Armstrong <tim.g.armstrong at gmail.com>
wrote:

>  I see.  Based on the two lines of output you sent me, the problem is
> something to do with message sizes on MPI.  Assuming your app isn't using
> MPI internally, it's probably some communication that ADLB is doing.  The
> error message would generally be caused by a mismatch of message size
> between sender and receiver.  The most likely explanation in the ADLB
> codebase is that the sender and receiver somehow disagree on sizes of
> structs, which doesn't make a whole lot of sense unless something strange
> was done during the build process, e.g. one file was compiled with
> different compiler settings, or you somehow linked to different versions of
> the function.
> .
> It's possible that it's a bug in the ADLB codebase that's nothing to do
> with how it was built, but it seems unlikely that something like that would
> have escaped all the tests.  It might help to look at the Tcl code or Swift
> that's being run, as well as to make sure that it runs correctly on a
> different environment.
>
> It would also be helpful to have a full log of the program output with
> debug logging enabled, since that will tell me what ADLB was doing at the
> time.
>
> I'm not sure if I can help with debugging the problem without more info.
>
>  - Tim
>
>
> On Fri, May 30, 2014 at 11:33 AM, Ketan Maheshwari <ketan at mcs.anl.gov>
> wrote:
>
>> I rebuilt the application recently without MPI. It seems to be working
>> outside of Swift on Cetus compute nodes.
>>
>>
>> On Fri, May 30, 2014 at 11:18 AM, Tim Armstrong <
>> tim.g.armstrong at gmail.com> wrote:
>>
>>>  Regarding the MPI error - that seems strange.  There are multiple
>>> places in the code that it might be.
>>>
>>>  One possible cause is if something funny happened in compiling/linking
>>> - e.g. multiple compilers or versions of things linked together.  Have you
>>> tried running the code locally?
>>>
>>> I'm a little perplexed because MPI tag 4 shouldn't be used in your
>>> application - the message type (Iget) is only really used for
>>> gemtc/coasters applications.  It would be helpful to debug further if I
>>> could get a log from the run with ADLB debugging enabled at compile time
>>> (--enable-log-debug for the ADLB configure stage, or setting EXM_DEBUG=1 in
>>> exm-settings.sh depending on how you built it).
>>>
>>>  - Tim
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/exm-user/attachments/20140530/c978b443/attachment.html>


More information about the ExM-user mailing list