[Darshan-users] Bad data in file headers

Latham, Robert J. robl at mcs.anl.gov
Fri Dec 1 15:59:28 CST 2017


On Fri, 2017-12-01 at 11:24 -0500, Phil Carns wrote:
> Hi Ed,
> 
> I don't think we've seen this particular error before.  Is it also
> the same application/executable every time in addition to being the
> same user in each case?
> 
> That portion of the log file is the index that tells the parser
> library where to find data for each of the instrumentation modules. 
> Every Darshan log uses that index, so it would be unusual (but of
> course not impossible by any means!) for it to be broken.  It is
> deterministically malloc'd at the same point in time when Darshan is
> initialized:

Is there any chance darshan2 tools are trying to operate on darshan3
files?  I hate to offer such an obvious suggestion, but you have
generated both log formats so at some point in time you had darshan2
tooling lying around.  Any darshan2 libraries or tools are probably
long gone by now but I wanted to rule out the obvious.

==rob


> https://xgitlab.cels.anl.gov/darshan/darshan/blob/master/darshan-runt
> ime/lib/darshan-core.c#L245
> 
> If it is the same executable triggering the problem in each case,
> then I would be suspicious of a stack overflow or some other memory
> corruption in the application that just happens to cause collateral
> damage in the address range that this malloc is getting.  
> 
> Unfortunately that's the kind of thing that's hard to isolate after
> the fact, though; all we know for sure is that the log is broken.  If
> you have access to the source code it sounds like it might be pretty
> reproducable at run time, though.
> 
> thanks,
> -Phil
> 
> On 11/29/2017 12:14 PM, Ed Karrels wrote:
> > I'm scanning through Darshan logs from Blue Waters, and darshan-
> > parser fails on bunch (1588) of log files.
> > They're               all Darshan version 3 files, and all from the
> > same user. Every one of this user's Darshan version 3 files fails. 
> > Their Darshan version 2 files are fine.
> > 
> > I ran darshan-parser in a debugger, and found that the headers seem
> > to have a couple garbage entries.
> > 
> > After the call to darshan_log_get_job(), the "len" fields in fd-
> > >name_map and fd->mod_map[6] seem to be invalid:
> > 
> > (gdb) p /x fd->name_map
> > $42 = {off = 0x1fc, len = 0xfffffffffffffe04}
> > (gdb) p /x fd->mod_map[6]
> > $43 = {off = 0x277, len = 0xfffffffffffffd89}
> > 
> > Have you seen errors like these before? Any idea why they're
> > happening?  Since it's only one user, I suspect it's something in
> > their code, perhaps a failure during MPI_Finalize.
> > 
> > 
> > 
> > _______________________________________________
> > Darshan-users mailing list
> > Darshan-users at lists.mcs.anl.gov
> > https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
> 
> _______________________________________________
> Darshan-users mailing list
> Darshan-users at lists.mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users


More information about the Darshan-users mailing list