[Darshan-users] Bad data in file headers

Phil Carns carns at mcs.anl.gov
Fri Dec 1 10:24:01 CST 2017


Hi Ed,

I don't think we've seen this particular error before.  Is it also the 
same application/executable every time in addition to being the same 
user in each case?

That portion of the log file is the index that tells the parser library 
where to find data for each of the instrumentation modules.  Every 
Darshan log uses that index, so it would be unusual (but of course not 
impossible by any means!) for it to be broken.  It is deterministically 
malloc'd at the same point in time when Darshan is initialized:

https://xgitlab.cels.anl.gov/darshan/darshan/blob/master/darshan-runtime/lib/darshan-core.c#L245

If it is the same executable triggering the problem in each case, then I 
would be suspicious of a stack overflow or some other memory corruption 
in the application that just happens to cause collateral damage in the 
address range that this malloc is getting.

Unfortunately that's the kind of thing that's hard to isolate after the 
fact, though; all we know for sure is that the log is broken.  If you 
have access to the source code it sounds like it might be pretty 
reproducable at run time, though.

thanks,
-Phil

On 11/29/2017 12:14 PM, Ed Karrels wrote:
> I'm scanning through Darshan logs from Blue Waters, and darshan-parser 
> fails on bunch (1588) of log files. They're all Darshan version 3 
> files, and all from the same user. Every one of this user's Darshan 
> version 3 files fails. Their Darshan version 2 files are fine.
>
> I ran darshan-parser in a debugger, and found that the headers seem to 
> have a couple garbage entries.
>
> After the call to darshan_log_get_job(), the "len" fields in 
> fd->name_map and fd->mod_map[6] seem to be invalid:
>
> (gdb) p /x fd->name_map
> $42 = {off = 0x1fc, len = 0xfffffffffffffe04}
> (gdb) p /x fd->mod_map[6]
> $43 = {off = 0x277, len = 0xfffffffffffffd89}
>
> Have you seen errors like these before? Any idea why they're 
> happening?  Since it's only one user, I suspect it's something in 
> their code, perhaps a failure during MPI_Finalize.
>
>
>
> _______________________________________________
> Darshan-users mailing list
> Darshan-users at lists.mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20171201/f7ef6ae8/attachment.html>


More information about the Darshan-users mailing list