<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">Hi Ed,<br>
<br>
I don't think we've seen this particular error before. Is it also
the same application/executable every time in addition to being
the same user in each case?<br>
<br>
That portion of the log file is the index that tells the parser
library where to find data for each of the instrumentation
modules. Every Darshan log uses that index, so it would be
unusual (but of course not impossible by any means!) for it to be
broken. It is deterministically malloc'd at the same point in
time when Darshan is initialized:<br>
<br>
<a class="moz-txt-link-freetext" href="https://xgitlab.cels.anl.gov/darshan/darshan/blob/master/darshan-runtime/lib/darshan-core.c#L245">https://xgitlab.cels.anl.gov/darshan/darshan/blob/master/darshan-runtime/lib/darshan-core.c#L245</a><br>
<br>
If it is the same executable triggering the problem in each case,
then I would be suspicious of a stack overflow or some other
memory corruption in the application that just happens to cause
collateral damage in the address range that this malloc is
getting. <br>
<br>
Unfortunately that's the kind of thing that's hard to isolate
after the fact, though; all we know for sure is that the log is
broken. If you have access to the source code it sounds like it
might be pretty reproducable at run time, though.<br>
<br>
thanks,<br>
-Phil<br>
<br>
On 11/29/2017 12:14 PM, Ed Karrels wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CA+xfkFsptojh+iqbvLj+UoqKros_9GQ2QcMyw7d2e=qdm+3kEw@mail.gmail.com">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<div dir="ltr">
<div>
<div>
<div>I'm scanning through Darshan logs from Blue Waters, and
darshan-parser fails on bunch (1588) of log files. They're
all Darshan version 3 files, and all from the same user.
Every one of this user's Darshan version 3 files fails.
Their Darshan version 2 files are fine.<br>
<br>
</div>
I ran darshan-parser in a debugger, and found that the
headers seem to have a couple garbage entries.<br>
<br>
</div>
After the call to darshan_log_get_job(), the "len" fields in
fd->name_map and fd->mod_map[6] seem to be invalid:<br>
<br>
(gdb) p /x fd->name_map<br>
$42 = {off = 0x1fc, len = 0xfffffffffffffe04}<br>
(gdb) p /x fd->mod_map[6]<br>
$43 = {off = 0x277, len = 0xfffffffffffffd89}<br>
<br>
</div>
Have you seen errors like these before? Any idea why they're
happening? Since it's only one user, I suspect it's something
in their code, perhaps a failure during MPI_Finalize.<br>
<br>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Darshan-users mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Darshan-users@lists.mcs.anl.gov">Darshan-users@lists.mcs.anl.gov</a>
<a class="moz-txt-link-freetext" href="https://lists.mcs.anl.gov/mailman/listinfo/darshan-users">https://lists.mcs.anl.gov/mailman/listinfo/darshan-users</a>
</pre>
</blockquote>
<p><br>
</p>
</body>
</html>