[Darshan-users] getting plots

Mon Mar 14 22:55:49 CDT 2016

Maybe the reason the job summary graphs are hanging might be due to the 
number of files the application is opening? It looks like there are over 
500,000 files (100 each for 6,496 processes). I haven't tried generating 
graphs for any logs that large myself, but that might be beyond what the 
graphing utilities can realistically handle. It takes forever for me to 
even parse the logs in text form.

As for the discrepancy in size, that may just be due to what the 'du' 
utility is actually reporting. 'du' measures the size of a given file 
based on the underlying file system block size. If the file is 1 byte, 
and the block size is 1 MiB, the file is reported as 1 MiB. 
Additionally, if you run 'du' on a directory containing numerous 
subdirectories (as you have, 100 subdirectories), it counts the sizes of 
the directories as well. Darshan will only report the I/O observed at 
the application level, so it does not account for file system blocks or 
directories. You can use 'du -b' to show the "actual" (i.e., not rounded 
up to block sizes) of individual files, though it still counts 
subdirectory sizes when determining the size of a given directory. If 
you do that, is it closer to what Darshan reports?

--Shane

On 03/14/2016 06:44 PM, Burlen Loring wrote:
> sure, here is the link
> https://drive.google.com/open?id=0B3y5yyus32lveHljWkExal9TVmM
>
> On 03/14/2016 03:56 PM, Shane Snyder wrote:
>> Hi Burlen,
>>
>> Would you mind sharing your Darshan log with us? If you prefer, you 
>> can send it to me off-list, or if it contains sensitive information 
>> we can give you details on how to anonymize parts of it (e.g., file 
>> names, etc.).
>>
>> I don't know for sure what the historical reason the "(may be 
>> incorrect)" caveat is given with the total bytes read and written. 
>> Someone correct me if I'm wrong, but I suspect that is to warn 
>> against the possibility that the code actually wrote/read more data 
>> than expected from the application's point of view? For instance, an 
>> I/O optimization called data sieving is possible at the MPI-IO layer 
>> which results in more data being read than expected from the 
>> application's point of view to improve performance. That shouldn't 
>> account for the drastic discrepancy you are seeing, though, so 
>> perhaps something else is up.
>>
>> Thanks,
>> --Shane
>>
>> On 03/14/2016 05:29 PM, Burlen Loring wrote:
>>> Hi, I'd like to analyze our runs with darshan. I'm able to get the 
>>> log files, but so far no luck plotting them.
>>>
>>> In the terminal after a while I see the following output, but then 
>>> the program appears to hang. After ~20 min of no output and no 
>>> evidence of it running in top, I killed it, and I didn't see any 
>>> newly created files.
>>>
>>> I'm also wondering about the total bytes report and warning that it 
>>> may be wrong. it does indeed seem way off, du reports 1.6T, but 
>>> darshan only reports ~200G.
>>>
>>> Please, let me know what I did wrong! and if I should I be concerned 
>>> about the numbers being so far off.
>>>
>>> Thanks
>>> Burlen
>>>
>>> $/work/apps/darshan/3.0.0-pre/bin/darshan-job-summary.pl 
>>> loring_oscillator_id1336621_3-14-37256-5315836542621785504_1.darshan
>>> Slowest unique file time: 25.579892
>>> Slowest shared file time: 0
>>> Total bytes read and written by app (may be incorrect): 214218545937
>>> Total absolute I/O time: 25.579892
>>> **NOTE: above shared and unique file times calculated using MPI-IO 
>>> timers if MPI-IO interface used on a given file, POSIX timers 
>>> otherwise.
>>> _______________________________________________
>>> Darshan-users mailing list
>>> Darshan-users at lists.mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>>
>> _______________________________________________
>> Darshan-users mailing list
>> Darshan-users at lists.mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>