[Darshan-users] [EXTERNAL] Re: Darshan error on Cray system with static compilation

Carns, Philip H. carns at mcs.anl.gov
Tue Jul 28 09:04:05 CDT 2020


That's right.  That particular code has moved around a bit over the years to accommodate refactoring (particularly in the 2.x to 3.x transition), but it hasn't fundamentally changed in a long time as far as I'm aware.

There is actually a brief mention of what's going on in that path in a 2009 workshop paper: https://www.mcs.anl.gov/uploads/cels/papers/P1660.pdf (look for the part about tsearch).

thanks,
-Phil
________________________________
From: Markomanolis, George <markomanolig at ornl.gov>
Sent: Tuesday, July 28, 2020 9:37 AM
To: Carns, Philip H. <carns at mcs.anl.gov>; darshan-users at lists.mcs.anl.gov <darshan-users at lists.mcs.anl.gov>
Subject: Re: [EXTERNAL] Re: Darshan error on Cray system with static compilation


Hi Phil,



Thanks for the answer. I assume this code is there for ages, right? Because they asked me if they should try a previous version and I was not so excited about that but better ask you if it is something new but from your words, I understand that it’s not.



Regards,

George



From: "Carns, Philip H." <carns at mcs.anl.gov>
Date: Saturday, July 25, 2020 at 2:24 PM
To: "Markomanolis, George" <markomanolig at ornl.gov>, "darshan-users at lists.mcs.anl.gov" <darshan-users at lists.mcs.anl.gov>
Subject: [EXTERNAL] Re: Darshan error on Cray system with static compilation



Hi George,



We've ever seen that assertion triggered before as far as I know (it's just defensive programming, not something that is supposed to happen).  It indicates that Darshan observed inconsistent results out of a binary search tree; possibly brought on by a memory corruption of some sort?



Unfortunately I'm not sure what to suggest on this one; we might need more information or a reproducer.



The application might have an I/O workload that triggers a buggy code path in Darshan.  It's also plausible that there is a memory corruption outside of Darshan (in the application or another library) that is just impacting that Darshan data structure by chance.



-Phil



________________________________

From: Darshan-users <darshan-users-bounces at lists.mcs.anl.gov> on behalf of Markomanolis, George <markomanolig at ornl.gov>
Sent: Thursday, July 23, 2020 2:36 PM
To: darshan-users at lists.mcs.anl.gov <darshan-users at lists.mcs.anl.gov>
Subject: [Darshan-users] Darshan error on Cray system with static compilation



Hi,



I just send an error that we can’t reproduce, it happens sometimes and it is on a system that I don’t even have access but they informed me about this error:



fms_MOM6_SIS2_compile.x: lib/darshan-common.c:262: darshan_track_common_val_counters: Assertion `found == counter' failed. forrtl: error (76): Abort trap signal Image PC Routine Line Source



This is a Cray system with static compilation. This error kills the application. Do you have any idea or it is difficult with so minimal information?



Regards,

George


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20200728/2b6b804e/attachment-0001.html>


More information about the Darshan-users mailing list