[Darshan-users] getting plots

Harms, Kevin harms at alcf.anl.gov
Tue Mar 15 09:06:52 CDT 2016


Burlen,

  the job summary parser isn't well optimized so creating a pdf for a log with a large number of files can be timing consuming. I have analyzed logs with 1.4M files and it takes a while. Maybe ~20 minutes. Also note that it makes vector images, so you will have many points to render when you view the pdf.

kevin




>Yes, you are correct, it's file per process on 6496 processes, and the 
>simulation runs for 100 time steps, plus there are some header files and 
>directories created (I think by rank 0). It doesn't seem like too 
>extreme of a case to me. We will also run 50k cores for 100 time steps. 
>It sounds like darshan can't analyze this type of i/o, but please let me 
>know if you have any ideas!
>
>On the size discrepancy. My fault. Darshan had the size correct. I was 
>looking at the wrong output file, 200G is the size of the smaller run 
>(812 procs). I apologize that I didn't notice that sooner!
>
>On 03/14/2016 08:55 PM, Shane Snyder wrote:
>> Maybe the reason the job summary graphs are hanging might be due to 
>> the number of files the application is opening? It looks like there 
>> are over 500,000 files (100 each for 6,496 processes). I haven't tried 
>> generating graphs for any logs that large myself, but that might be 
>> beyond what the graphing utilities can realistically handle. It takes 
>> forever for me to even parse the logs in text form.
>>
>> As for the discrepancy in size, that may just be due to what the 'du' 
>> utility is actually reporting. 'du' measures the size of a given file 
>> based on the underlying file system block size. If the file is 1 byte, 
>> and the block size is 1 MiB, the file is reported as 1 MiB. 
>> Additionally, if you run 'du' on a directory containing numerous 
>> subdirectories (as you have, 100 subdirectories), it counts the sizes 
>> of the directories as well. Darshan will only report the I/O observed 
>> at the application level, so it does not account for file system 
>> blocks or directories. You can use 'du -b' to show the "actual" (i.e., 
>> not rounded up to block sizes) of individual files, though it still 
>> counts subdirectory sizes when determining the size of a given 
>> directory. If you do that, is it closer to what Darshan reports?
>>
>> --Shane
>>
>> On 03/14/2016 06:44 PM, Burlen Loring wrote:
>>> sure, here is the link
>>> https://drive.google.com/open?id=0B3y5yyus32lveHljWkExal9TVmM
>>>
>>> On 03/14/2016 03:56 PM, Shane Snyder wrote:
>>>> Hi Burlen,
>>>>
>>>> Would you mind sharing your Darshan log with us? If you prefer, you 
>>>> can send it to me off-list, or if it contains sensitive information 
>>>> we can give you details on how to anonymize parts of it (e.g., file 
>>>> names, etc.).
>>>>
>>>> I don't know for sure what the historical reason the "(may be 
>>>> incorrect)" caveat is given with the total bytes read and written. 
>>>> Someone correct me if I'm wrong, but I suspect that is to warn 
>>>> against the possibility that the code actually wrote/read more data 
>>>> than expected from the application's point of view? For instance, an 
>>>> I/O optimization called data sieving is possible at the MPI-IO layer 
>>>> which results in more data being read than expected from the 
>>>> application's point of view to improve performance. That shouldn't 
>>>> account for the drastic discrepancy you are seeing, though, so 
>>>> perhaps something else is up.
>>>>
>>>> Thanks,
>>>> --Shane
>>>>
>>>> On 03/14/2016 05:29 PM, Burlen Loring wrote:
>>>>> Hi, I'd like to analyze our runs with darshan. I'm able to get the 
>>>>> log files, but so far no luck plotting them.
>>>>>
>>>>> In the terminal after a while I see the following output, but then 
>>>>> the program appears to hang. After ~20 min of no output and no 
>>>>> evidence of it running in top, I killed it, and I didn't see any 
>>>>> newly created files.
>>>>>
>>>>> I'm also wondering about the total bytes report and warning that it 
>>>>> may be wrong. it does indeed seem way off, du reports 1.6T, but 
>>>>> darshan only reports ~200G.
>>>>>
>>>>> Please, let me know what I did wrong! and if I should I be 
>>>>> concerned about the numbers being so far off.
>>>>>
>>>>> Thanks
>>>>> Burlen
>>>>>
>>>>> $/work/apps/darshan/3.0.0-pre/bin/darshan-job-summary.pl 
>>>>> loring_oscillator_id1336621_3-14-37256-5315836542621785504_1.darshan
>>>>> Slowest unique file time: 25.579892
>>>>> Slowest shared file time: 0
>>>>> Total bytes read and written by app (may be incorrect): 214218545937
>>>>> Total absolute I/O time: 25.579892
>>>>> **NOTE: above shared and unique file times calculated using MPI-IO 
>>>>> timers if MPI-IO interface used on a given file, POSIX timers 
>>>>> otherwise.
>>>>> _______________________________________________
>>>>> Darshan-users mailing list
>>>>> Darshan-users at lists.mcs.anl.gov
>>>>> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>>>>
>>>> _______________________________________________
>>>> Darshan-users mailing list
>>>> Darshan-users at lists.mcs.anl.gov
>>>> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>>>
>>
>
>_______________________________________________
>Darshan-users mailing list
>Darshan-users at lists.mcs.anl.gov
>https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4090 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20160315/f81f1bbe/attachment.bin>


More information about the Darshan-users mailing list