[Darshan-users] getting plots

Shane Snyder ssnyder at mcs.anl.gov
Tue Mar 15 12:29:48 CDT 2016


When I try to generate the graphs on my laptop, it finishes in about 20 
minutes, but I do get an error and the graphs aren't properly generated. 
I'll try to dig and find out why that is.

Darshan doesn't log much data that is useful for constructing accurate 
time series representations of the I/O. Darshan logs aggregate operation 
counts, cumulative timers, and some timestamps demarcating the phases in 
which I/O is performed (e.g., the timestamp of the first open, first 
read, and first write, as well as the timestamp of the last file close, 
last read, and last write). Given that, we can estimate the average I/O 
bandwidth for an application, but in general we can't really get an 
accurate view of the bandwidth at specific times in the execution. 
Depending on the I/O workload the application is using, these estimates 
can be very close to what was actually achieved, or they can be pretty 
far off. You can check out these estimates by running the 
'darshan-parser' on your darshan log, and passing the '--perf' flag; 
that will give a few different estimates of performance. You can find 
more info on that here:

http://www.mcs.anl.gov/research/projects/darshan/docs/darshan3-util.html

--Shane

On 03/15/2016 11:18 AM, Burlen Loring wrote:
> I let darshan job summary run all night, still going but no indication 
> of progress.
>
> This is my first experience with darshan, let me ask a naive question: 
> is it possible to extract time series for a single process? write 
> bandwidth over time for instance? and time for file open (or close) vs 
> time?
>
> Thanks for all your help
> Burlen
>
> On 03/14/2016 09:53 PM, Burlen Loring wrote:
>> Yes, you are correct, it's file per process on 6496 processes, and 
>> the simulation runs for 100 time steps, plus there are some header 
>> files and directories created (I think by rank 0). It doesn't seem 
>> like too extreme of a case to me. We will also run 50k cores for 100 
>> time steps. It sounds like darshan can't analyze this type of i/o, 
>> but please let me know if you have any ideas!
>>
>> On the size discrepancy. My fault. Darshan had the size correct. I 
>> was looking at the wrong output file, 200G is the size of the smaller 
>> run (812 procs). I apologize that I didn't notice that sooner!
>>
>> On 03/14/2016 08:55 PM, Shane Snyder wrote:
>>> Maybe the reason the job summary graphs are hanging might be due to 
>>> the number of files the application is opening? It looks like there 
>>> are over 500,000 files (100 each for 6,496 processes). I haven't 
>>> tried generating graphs for any logs that large myself, but that 
>>> might be beyond what the graphing utilities can realistically 
>>> handle. It takes forever for me to even parse the logs in text form.
>>>
>>> As for the discrepancy in size, that may just be due to what the 
>>> 'du' utility is actually reporting. 'du' measures the size of a 
>>> given file based on the underlying file system block size. If the 
>>> file is 1 byte, and the block size is 1 MiB, the file is reported as 
>>> 1 MiB. Additionally, if you run 'du' on a directory containing 
>>> numerous subdirectories (as you have, 100 subdirectories), it counts 
>>> the sizes of the directories as well. Darshan will only report the 
>>> I/O observed at the application level, so it does not account for 
>>> file system blocks or directories. You can use 'du -b' to show the 
>>> "actual" (i.e., not rounded up to block sizes) of individual files, 
>>> though it still counts subdirectory sizes when determining the size 
>>> of a given directory. If you do that, is it closer to what Darshan 
>>> reports?
>>>
>>> --Shane
>>>
>>> On 03/14/2016 06:44 PM, Burlen Loring wrote:
>>>> sure, here is the link
>>>> https://drive.google.com/open?id=0B3y5yyus32lveHljWkExal9TVmM
>>>>
>>>> On 03/14/2016 03:56 PM, Shane Snyder wrote:
>>>>> Hi Burlen,
>>>>>
>>>>> Would you mind sharing your Darshan log with us? If you prefer, 
>>>>> you can send it to me off-list, or if it contains sensitive 
>>>>> information we can give you details on how to anonymize parts of 
>>>>> it (e.g., file names, etc.).
>>>>>
>>>>> I don't know for sure what the historical reason the "(may be 
>>>>> incorrect)" caveat is given with the total bytes read and written. 
>>>>> Someone correct me if I'm wrong, but I suspect that is to warn 
>>>>> against the possibility that the code actually wrote/read more 
>>>>> data than expected from the application's point of view? For 
>>>>> instance, an I/O optimization called data sieving is possible at 
>>>>> the MPI-IO layer which results in more data being read than 
>>>>> expected from the application's point of view to improve 
>>>>> performance. That shouldn't account for the drastic discrepancy 
>>>>> you are seeing, though, so perhaps something else is up.
>>>>>
>>>>> Thanks,
>>>>> --Shane
>>>>>
>>>>> On 03/14/2016 05:29 PM, Burlen Loring wrote:
>>>>>> Hi, I'd like to analyze our runs with darshan. I'm able to get 
>>>>>> the log files, but so far no luck plotting them.
>>>>>>
>>>>>> In the terminal after a while I see the following output, but 
>>>>>> then the program appears to hang. After ~20 min of no output and 
>>>>>> no evidence of it running in top, I killed it, and I didn't see 
>>>>>> any newly created files.
>>>>>>
>>>>>> I'm also wondering about the total bytes report and warning that 
>>>>>> it may be wrong. it does indeed seem way off, du reports 1.6T, 
>>>>>> but darshan only reports ~200G.
>>>>>>
>>>>>> Please, let me know what I did wrong! and if I should I be 
>>>>>> concerned about the numbers being so far off.
>>>>>>
>>>>>> Thanks
>>>>>> Burlen
>>>>>>
>>>>>> $/work/apps/darshan/3.0.0-pre/bin/darshan-job-summary.pl 
>>>>>> loring_oscillator_id1336621_3-14-37256-5315836542621785504_1.darshan
>>>>>> Slowest unique file time: 25.579892
>>>>>> Slowest shared file time: 0
>>>>>> Total bytes read and written by app (may be incorrect): 214218545937
>>>>>> Total absolute I/O time: 25.579892
>>>>>> **NOTE: above shared and unique file times calculated using 
>>>>>> MPI-IO timers if MPI-IO interface used on a given file, POSIX 
>>>>>> timers otherwise.
>>>>>> _______________________________________________
>>>>>> Darshan-users mailing list
>>>>>> Darshan-users at lists.mcs.anl.gov
>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>>>>>
>>>>> _______________________________________________
>>>>> Darshan-users mailing list
>>>>> Darshan-users at lists.mcs.anl.gov
>>>>> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>>>>
>>>
>>
>



More information about the Darshan-users mailing list