[Darshan-users] getting plots
Burlen Loring
bloring at lbl.gov
Tue Mar 15 15:22:12 CDT 2016
Shane, Phil, these commands are all working fine. Thanks
One thing I'd like to do is separate the open from the write. the
file-list-detailed option may give me what I need.
are the values in the columns time stamps? so I can estimate write time
by "<end_close> - <start_write>", and open time by "<start_write> -
<start_open>", columns?
I didn't see documentation for those columns and want to make sure I
have it right before embarking on a wild goose chase.
<start_open> <start_read> <start_write> <end_read>
<end_write> <end_close>
0.770823 0.000000 1.602510 0.000000 1.607319 1.608949
Burlen
On 03/15/2016 10:48 AM, Phil Carns wrote:
> One thing that you can do (not sure if would be helpful in this case)
> is filter a Darshan log file down so that it only includes
> instrumentation data for a single file, and then run
> darshan-job-summary.pl on just that one file view. If you wanted to
> try that, you can do the following:
>
> $ darshan-parser --file-list
> loring_oscillator_id1336628_3-14-37278-8825292184016672560_1.darshan
> |head -n 75
>
> # I'm just picking a file at random from the output, but this example
> is for
> # /global/cscratch1/sd/loring/sensei/fpp/10k/PosthocIO_5.vtmb
> # so I'm using it's corresponding hash value. The following command
> will write a new darshan log file that strips
> # away everything except for that one file:
>
> $ darshan-convert --file 13503923528039498363
> loring_oscillator_id1336628_3-14-37278-8825292184016672560_1.darshan
> onefile.darshan
>
> $ darshan-job-summary.pl onefile.darshan
>
> The resulting pdf is generated instantaneously and is easy to open,
> but it doesn't tell you anything about the I/O at all except what
> happened to that one file. It might be helpful for some cases, though.
>
> You can also use the following to get a text dump of the cumulative
> statistics across all files (which also runs pretty quickly):
>
> darshan-parser --total
> loring_oscillator_id1336628_3-14-37278-8825292184016672560_1.darshan
>
> Unfortunately that output is presented in text format instead of
> producing another darshan log that could then be visualized with
> darshan-job-summary.pl, but maybe that is something we could consider
> in a future version.
>
> thanks,
> -Phil
>
> On 03/15/2016 11:18 AM, Burlen Loring wrote:
>> I let darshan job summary run all night, still going but no indication
>> of progress.
>>
>> This is my first experience with darshan, let me ask a naive question:
>> is it possible to extract time series for a single process? write
>> bandwidth over time for instance? and time for file open (or close)
>> vs time?
>>
>> Thanks for all your help
>> Burlen
>>
>> On 03/14/2016 09:53 PM, Burlen Loring wrote:
>>> Yes, you are correct, it's file per process on 6496 processes, and the
>>> simulation runs for 100 time steps, plus there are some header files
>>> and directories created (I think by rank 0). It doesn't seem like too
>>> extreme of a case to me. We will also run 50k cores for 100 time
>>> steps. It sounds like darshan can't analyze this type of i/o, but
>>> please let me know if you have any ideas!
>>>
>>> On the size discrepancy. My fault. Darshan had the size correct. I was
>>> looking at the wrong output file, 200G is the size of the smaller run
>>> (812 procs). I apologize that I didn't notice that sooner!
>>>
>>> On 03/14/2016 08:55 PM, Shane Snyder wrote:
>>>> Maybe the reason the job summary graphs are hanging might be due to
>>>> the number of files the application is opening? It looks like there
>>>> are over 500,000 files (100 each for 6,496 processes). I haven't
>>>> tried generating graphs for any logs that large myself, but that
>>>> might be beyond what the graphing utilities can realistically handle.
>>>> It takes forever for me to even parse the logs in text form.
>>>>
>>>> As for the discrepancy in size, that may just be due to what the 'du'
>>>> utility is actually reporting. 'du' measures the size of a given file
>>>> based on the underlying file system block size. If the file is 1
>>>> byte, and the block size is 1 MiB, the file is reported as 1 MiB.
>>>> Additionally, if you run 'du' on a directory containing numerous
>>>> subdirectories (as you have, 100 subdirectories), it counts the sizes
>>>> of the directories as well. Darshan will only report the I/O observed
>>>> at the application level, so it does not account for file system
>>>> blocks or directories. You can use 'du -b' to show the "actual"
>>>> (i.e., not rounded up to block sizes) of individual files, though it
>>>> still counts subdirectory sizes when determining the size of a given
>>>> directory. If you do that, is it closer to what Darshan reports?
>>>>
>>>> --Shane
>>>>
>>>> On 03/14/2016 06:44 PM, Burlen Loring wrote:
>>>>> sure, here is the link
>>>>> https://drive.google.com/open?id=0B3y5yyus32lveHljWkExal9TVmM
>>>>>
>>>>> On 03/14/2016 03:56 PM, Shane Snyder wrote:
>>>>>> Hi Burlen,
>>>>>>
>>>>>> Would you mind sharing your Darshan log with us? If you prefer, you
>>>>>> can send it to me off-list, or if it contains sensitive information
>>>>>> we can give you details on how to anonymize parts of it (e.g., file
>>>>>> names, etc.).
>>>>>>
>>>>>> I don't know for sure what the historical reason the "(may be
>>>>>> incorrect)" caveat is given with the total bytes read and written.
>>>>>> Someone correct me if I'm wrong, but I suspect that is to warn
>>>>>> against the possibility that the code actually wrote/read more data
>>>>>> than expected from the application's point of view? For instance,
>>>>>> an I/O optimization called data sieving is possible at the MPI-IO
>>>>>> layer which results in more data being read than expected from the
>>>>>> application's point of view to improve performance. That shouldn't
>>>>>> account for the drastic discrepancy you are seeing, though, so
>>>>>> perhaps something else is up.
>>>>>>
>>>>>> Thanks,
>>>>>> --Shane
>>>>>>
>>>>>> On 03/14/2016 05:29 PM, Burlen Loring wrote:
>>>>>>> Hi, I'd like to analyze our runs with darshan. I'm able to get the
>>>>>>> log files, but so far no luck plotting them.
>>>>>>>
>>>>>>> In the terminal after a while I see the following output, but then
>>>>>>> the program appears to hang. After ~20 min of no output and no
>>>>>>> evidence of it running in top, I killed it, and I didn't see any
>>>>>>> newly created files.
>>>>>>>
>>>>>>> I'm also wondering about the total bytes report and warning that
>>>>>>> it may be wrong. it does indeed seem way off, du reports 1.6T, but
>>>>>>> darshan only reports ~200G.
>>>>>>>
>>>>>>> Please, let me know what I did wrong! and if I should I be
>>>>>>> concerned about the numbers being so far off.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Burlen
>>>>>>>
>>>>>>> $/work/apps/darshan/3.0.0-pre/bin/darshan-job-summary.pl
>>>>>>> loring_oscillator_id1336621_3-14-37256-5315836542621785504_1.darshan
>>>>>>>
>>>>>>> Slowest unique file time: 25.579892
>>>>>>> Slowest shared file time: 0
>>>>>>> Total bytes read and written by app (may be incorrect):
>>>>>>> 214218545937
>>>>>>> Total absolute I/O time: 25.579892
>>>>>>> **NOTE: above shared and unique file times calculated using MPI-IO
>>>>>>> timers if MPI-IO interface used on a given file, POSIX timers
>>>>>>> otherwise.
>>>>>>> _______________________________________________
>>>>>>> Darshan-users mailing list
>>>>>>> Darshan-users at lists.mcs.anl.gov
>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>>>>>> _______________________________________________
>>>>>> Darshan-users mailing list
>>>>>> Darshan-users at lists.mcs.anl.gov
>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>> _______________________________________________
>> Darshan-users mailing list
>> Darshan-users at lists.mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>
> _______________________________________________
> Darshan-users mailing list
> Darshan-users at lists.mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
More information about the Darshan-users
mailing list