[Darshan-users] getting plots

Burlen Loring bloring at lbl.gov
Mon Mar 14 23:53:51 CDT 2016


Yes, you are correct, it's file per process on 6496 processes, and the 
simulation runs for 100 time steps, plus there are some header files and 
directories created (I think by rank 0). It doesn't seem like too 
extreme of a case to me. We will also run 50k cores for 100 time steps. 
It sounds like darshan can't analyze this type of i/o, but please let me 
know if you have any ideas!

On the size discrepancy. My fault. Darshan had the size correct. I was 
looking at the wrong output file, 200G is the size of the smaller run 
(812 procs). I apologize that I didn't notice that sooner!

On 03/14/2016 08:55 PM, Shane Snyder wrote:
> Maybe the reason the job summary graphs are hanging might be due to 
> the number of files the application is opening? It looks like there 
> are over 500,000 files (100 each for 6,496 processes). I haven't tried 
> generating graphs for any logs that large myself, but that might be 
> beyond what the graphing utilities can realistically handle. It takes 
> forever for me to even parse the logs in text form.
>
> As for the discrepancy in size, that may just be due to what the 'du' 
> utility is actually reporting. 'du' measures the size of a given file 
> based on the underlying file system block size. If the file is 1 byte, 
> and the block size is 1 MiB, the file is reported as 1 MiB. 
> Additionally, if you run 'du' on a directory containing numerous 
> subdirectories (as you have, 100 subdirectories), it counts the sizes 
> of the directories as well. Darshan will only report the I/O observed 
> at the application level, so it does not account for file system 
> blocks or directories. You can use 'du -b' to show the "actual" (i.e., 
> not rounded up to block sizes) of individual files, though it still 
> counts subdirectory sizes when determining the size of a given 
> directory. If you do that, is it closer to what Darshan reports?
>
> --Shane
>
> On 03/14/2016 06:44 PM, Burlen Loring wrote:
>> sure, here is the link
>> https://drive.google.com/open?id=0B3y5yyus32lveHljWkExal9TVmM
>>
>> On 03/14/2016 03:56 PM, Shane Snyder wrote:
>>> Hi Burlen,
>>>
>>> Would you mind sharing your Darshan log with us? If you prefer, you 
>>> can send it to me off-list, or if it contains sensitive information 
>>> we can give you details on how to anonymize parts of it (e.g., file 
>>> names, etc.).
>>>
>>> I don't know for sure what the historical reason the "(may be 
>>> incorrect)" caveat is given with the total bytes read and written. 
>>> Someone correct me if I'm wrong, but I suspect that is to warn 
>>> against the possibility that the code actually wrote/read more data 
>>> than expected from the application's point of view? For instance, an 
>>> I/O optimization called data sieving is possible at the MPI-IO layer 
>>> which results in more data being read than expected from the 
>>> application's point of view to improve performance. That shouldn't 
>>> account for the drastic discrepancy you are seeing, though, so 
>>> perhaps something else is up.
>>>
>>> Thanks,
>>> --Shane
>>>
>>> On 03/14/2016 05:29 PM, Burlen Loring wrote:
>>>> Hi, I'd like to analyze our runs with darshan. I'm able to get the 
>>>> log files, but so far no luck plotting them.
>>>>
>>>> In the terminal after a while I see the following output, but then 
>>>> the program appears to hang. After ~20 min of no output and no 
>>>> evidence of it running in top, I killed it, and I didn't see any 
>>>> newly created files.
>>>>
>>>> I'm also wondering about the total bytes report and warning that it 
>>>> may be wrong. it does indeed seem way off, du reports 1.6T, but 
>>>> darshan only reports ~200G.
>>>>
>>>> Please, let me know what I did wrong! and if I should I be 
>>>> concerned about the numbers being so far off.
>>>>
>>>> Thanks
>>>> Burlen
>>>>
>>>> $/work/apps/darshan/3.0.0-pre/bin/darshan-job-summary.pl 
>>>> loring_oscillator_id1336621_3-14-37256-5315836542621785504_1.darshan
>>>> Slowest unique file time: 25.579892
>>>> Slowest shared file time: 0
>>>> Total bytes read and written by app (may be incorrect): 214218545937
>>>> Total absolute I/O time: 25.579892
>>>> **NOTE: above shared and unique file times calculated using MPI-IO 
>>>> timers if MPI-IO interface used on a given file, POSIX timers 
>>>> otherwise.
>>>> _______________________________________________
>>>> Darshan-users mailing list
>>>> Darshan-users at lists.mcs.anl.gov
>>>> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>>>
>>> _______________________________________________
>>> Darshan-users mailing list
>>> Darshan-users at lists.mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>>
>



More information about the Darshan-users mailing list