[Darshan-users] [EXTERNAL] Re: Darshan error on Cray system with static compilation

Markomanolis, George markomanolig at ornl.gov
Tue Jul 28 08:37:39 CDT 2020


Hi Phil,

Thanks for the answer. I assume this code is there for ages, right? Because they asked me if they should try a previous version and I was not so excited about that but better ask you if it is something new but from your words, I understand that it’s not.

Regards,
George

From: "Carns, Philip H." <carns at mcs.anl.gov>
Date: Saturday, July 25, 2020 at 2:24 PM
To: "Markomanolis, George" <markomanolig at ornl.gov>, "darshan-users at lists.mcs.anl.gov" <darshan-users at lists.mcs.anl.gov>
Subject: [EXTERNAL] Re: Darshan error on Cray system with static compilation

Hi George,

We've ever seen that assertion triggered before as far as I know (it's just defensive programming, not something that is supposed to happen).  It indicates that Darshan observed inconsistent results out of a binary search tree; possibly brought on by a memory corruption of some sort?

Unfortunately I'm not sure what to suggest on this one; we might need more information or a reproducer.

The application might have an I/O workload that triggers a buggy code path in Darshan.  It's also plausible that there is a memory corruption outside of Darshan (in the application or another library) that is just impacting that Darshan data structure by chance.

-Phil

________________________________
From: Darshan-users <darshan-users-bounces at lists.mcs.anl.gov> on behalf of Markomanolis, George <markomanolig at ornl.gov>
Sent: Thursday, July 23, 2020 2:36 PM
To: darshan-users at lists.mcs.anl.gov <darshan-users at lists.mcs.anl.gov>
Subject: [Darshan-users] Darshan error on Cray system with static compilation


Hi,



I just send an error that we can’t reproduce, it happens sometimes and it is on a system that I don’t even have access but they informed me about this error:



fms_MOM6_SIS2_compile.x: lib/darshan-common.c:262: darshan_track_common_val_counters: Assertion `found == counter' failed. forrtl: error (76): Abort trap signal Image PC Routine Line Source



This is a Cray system with static compilation. This error kills the application. Do you have any idea or it is difficult with so minimal information?



Regards,

George


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20200728/a590ac66/attachment.html>


More information about the Darshan-users mailing list