[Darshan-users] [EXTERNAL] Re: Darshan error on Cray system with static compilation
Markomanolis, George
markomanolig at ornl.gov
Tue Jul 28 08:37:39 CDT 2020
Hi Phil,
Thanks for the answer. I assume this code is there for ages, right? Because they asked me if they should try a previous version and I was not so excited about that but better ask you if it is something new but from your words, I understand that it’s not.
Regards,
George
From: "Carns, Philip H." <carns at mcs.anl.gov>
Date: Saturday, July 25, 2020 at 2:24 PM
To: "Markomanolis, George" <markomanolig at ornl.gov>, "darshan-users at lists.mcs.anl.gov" <darshan-users at lists.mcs.anl.gov>
Subject: [EXTERNAL] Re: Darshan error on Cray system with static compilation
Hi George,
We've ever seen that assertion triggered before as far as I know (it's just defensive programming, not something that is supposed to happen). It indicates that Darshan observed inconsistent results out of a binary search tree; possibly brought on by a memory corruption of some sort?
Unfortunately I'm not sure what to suggest on this one; we might need more information or a reproducer.
The application might have an I/O workload that triggers a buggy code path in Darshan. It's also plausible that there is a memory corruption outside of Darshan (in the application or another library) that is just impacting that Darshan data structure by chance.
-Phil
________________________________
From: Darshan-users <darshan-users-bounces at lists.mcs.anl.gov> on behalf of Markomanolis, George <markomanolig at ornl.gov>
Sent: Thursday, July 23, 2020 2:36 PM
To: darshan-users at lists.mcs.anl.gov <darshan-users at lists.mcs.anl.gov>
Subject: [Darshan-users] Darshan error on Cray system with static compilation
Hi,
I just send an error that we can’t reproduce, it happens sometimes and it is on a system that I don’t even have access but they informed me about this error:
fms_MOM6_SIS2_compile.x: lib/darshan-common.c:262: darshan_track_common_val_counters: Assertion `found == counter' failed. forrtl: error (76): Abort trap signal Image PC Routine Line Source
This is a Cray system with static compilation. This error kills the application. Do you have any idea or it is difficult with so minimal information?
Regards,
George
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20200728/a590ac66/attachment.html>
More information about the Darshan-users
mailing list