[Darshan-users] getting plots
Shane Snyder
ssnyder at mcs.anl.gov
Mon Mar 14 22:55:49 CDT 2016
Maybe the reason the job summary graphs are hanging might be due to the
number of files the application is opening? It looks like there are over
500,000 files (100 each for 6,496 processes). I haven't tried generating
graphs for any logs that large myself, but that might be beyond what the
graphing utilities can realistically handle. It takes forever for me to
even parse the logs in text form.
As for the discrepancy in size, that may just be due to what the 'du'
utility is actually reporting. 'du' measures the size of a given file
based on the underlying file system block size. If the file is 1 byte,
and the block size is 1 MiB, the file is reported as 1 MiB.
Additionally, if you run 'du' on a directory containing numerous
subdirectories (as you have, 100 subdirectories), it counts the sizes of
the directories as well. Darshan will only report the I/O observed at
the application level, so it does not account for file system blocks or
directories. You can use 'du -b' to show the "actual" (i.e., not rounded
up to block sizes) of individual files, though it still counts
subdirectory sizes when determining the size of a given directory. If
you do that, is it closer to what Darshan reports?
--Shane
On 03/14/2016 06:44 PM, Burlen Loring wrote:
> sure, here is the link
> https://drive.google.com/open?id=0B3y5yyus32lveHljWkExal9TVmM
>
> On 03/14/2016 03:56 PM, Shane Snyder wrote:
>> Hi Burlen,
>>
>> Would you mind sharing your Darshan log with us? If you prefer, you
>> can send it to me off-list, or if it contains sensitive information
>> we can give you details on how to anonymize parts of it (e.g., file
>> names, etc.).
>>
>> I don't know for sure what the historical reason the "(may be
>> incorrect)" caveat is given with the total bytes read and written.
>> Someone correct me if I'm wrong, but I suspect that is to warn
>> against the possibility that the code actually wrote/read more data
>> than expected from the application's point of view? For instance, an
>> I/O optimization called data sieving is possible at the MPI-IO layer
>> which results in more data being read than expected from the
>> application's point of view to improve performance. That shouldn't
>> account for the drastic discrepancy you are seeing, though, so
>> perhaps something else is up.
>>
>> Thanks,
>> --Shane
>>
>> On 03/14/2016 05:29 PM, Burlen Loring wrote:
>>> Hi, I'd like to analyze our runs with darshan. I'm able to get the
>>> log files, but so far no luck plotting them.
>>>
>>> In the terminal after a while I see the following output, but then
>>> the program appears to hang. After ~20 min of no output and no
>>> evidence of it running in top, I killed it, and I didn't see any
>>> newly created files.
>>>
>>> I'm also wondering about the total bytes report and warning that it
>>> may be wrong. it does indeed seem way off, du reports 1.6T, but
>>> darshan only reports ~200G.
>>>
>>> Please, let me know what I did wrong! and if I should I be concerned
>>> about the numbers being so far off.
>>>
>>> Thanks
>>> Burlen
>>>
>>> $/work/apps/darshan/3.0.0-pre/bin/darshan-job-summary.pl
>>> loring_oscillator_id1336621_3-14-37256-5315836542621785504_1.darshan
>>> Slowest unique file time: 25.579892
>>> Slowest shared file time: 0
>>> Total bytes read and written by app (may be incorrect): 214218545937
>>> Total absolute I/O time: 25.579892
>>> **NOTE: above shared and unique file times calculated using MPI-IO
>>> timers if MPI-IO interface used on a given file, POSIX timers
>>> otherwise.
>>> _______________________________________________
>>> Darshan-users mailing list
>>> Darshan-users at lists.mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>>
>> _______________________________________________
>> Darshan-users mailing list
>> Darshan-users at lists.mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>
More information about the Darshan-users
mailing list