[Darshan-users] Does Darshan uses CPU time ?

Harms, Kevin harms at alcf.anl.gov
Thu Aug 5 09:40:58 CDT 2021


Florian,

  I looked at the log and it seems that you are opening multiple files in parallel on each process. I'm assuming you're using threads to do I/O within a process? If so, then you would need to do the aggregation over the total threads. I attached an analysis that plots some of the file timelines for one rank and you can see they are overlapped indicating there must be threads opening these files in parallel.

  Darshan itself doesn't have any specifics for tracking number of threads or thread ids.

kevin

________________________________________
From: Florian Lecomte <flo.lecomte17 at gmail.com>
Sent: Wednesday, August 4, 2021 9:35 AM
To: Harms, Kevin
Cc: darshan-users at lists.mcs.anl.gov
Subject: Re: [Darshan-users] Does Darshan uses CPU time ?

Hello,
Here is the log I generated, so total_STDIO_F_META_TIME is 67 seconds, runtime is (end-start+1)=3 seconds (as computed in darshan-job-summary.pl<http://darshan-job-summary.pl>), and nprocs is 12
So that would mean I have 67 / 36 > 1 Metadata time percentage, which seems weird.
Is the log file wrong, or maybe the application I want to analyze I/O has a strange behavior ?

Thank you for your help.
Cordially, Florian


Le mer. 4 août 2021 à 16:26, Harms, Kevin <harms at alcf.anl.gov<mailto:harms at alcf.anl.gov>> a écrit :
Florian,

  that sounds like an issue. If you can provide the log, we can take a closer look at the counters.

kevin

________________________________________
From: Florian Lecomte <flo.lecomte17 at gmail.com<mailto:flo.lecomte17 at gmail.com>>
Sent: Wednesday, August 4, 2021 3:14 AM
To: Harms, Kevin
Cc: darshan-users at lists.mcs.anl.gov<mailto:darshan-users at lists.mcs.anl.gov>
Subject: Re: [Darshan-users] Does Darshan uses CPU time ?

In fact, what I wanted to know is : how does Darshan get the read times, write times and metadata times ? I suppose it uses CPU clocks, but I use several nodes, with 2 processes on each node, and each node has 64 cores, So I wanted to know if the average time per process was already divided by the number of cores too, because cores work in parallel, and so this time couldn't be compared with real runtime.
I can send you the log file later today but basically, I have 36 secs of metadata operations, 3 seconds of runtime and 10 processes.

Thank you.
Cordially, Florian

Le mar. 3 août 2021 à 17:20, Harms, Kevin <harms at alcf.anl.gov<mailto:harms at alcf.anl.gov><mailto:harms at alcf.anl.gov<mailto:harms at alcf.anl.gov>>> a écrit :
Florian,

  STDIO_F_META_TIME should be the time spent in metadata accumulated across the processes if it is shared. The rank for the file would be -1 if it is shared.

  So for example:
    4 processes run for 10 seconds
    each process opens foo.txt for 1 second
    1 process stats foo.txt for 1 second

    the results would be foo.txt shows 5 seconds of meta time for foo.txt. So you can do 5 seconds / 4 processes for an average of 1.25s or you could do 5s / (10s * 4p) = 0.125 or 12.5%.

  If the rank value is > -1, the the reported time is just for that one rank and you need to potentially sum them up.

  If you meta time for a single file exceeds the runtime * number of processes, then something must be wrong with the timer collection. Can you send the log file? You can also look at the _START_TIMESTAMP and _END_TIMESTAMP and see if those agree with runtime or not?

kevin



________________________________________
From: Darshan-users <darshan-users-bounces at lists.mcs.anl.gov<mailto:darshan-users-bounces at lists.mcs.anl.gov><mailto:darshan-users-bounces at lists.mcs.anl.gov<mailto:darshan-users-bounces at lists.mcs.anl.gov>>> on behalf of Florian Lecomte <flo.lecomte17 at gmail.com<mailto:flo.lecomte17 at gmail.com><mailto:flo.lecomte17 at gmail.com<mailto:flo.lecomte17 at gmail.com>>>
Sent: Tuesday, August 3, 2021 9:37 AM
To: darshan-users at lists.mcs.anl.gov<mailto:darshan-users at lists.mcs.anl.gov><mailto:darshan-users at lists.mcs.anl.gov<mailto:darshan-users at lists.mcs.anl.gov>>
Subject: [Darshan-users] Does Darshan uses CPU time ?

Good evening,
I'd like to know if I have to divide metrics by the number of CPUs of the machine I use if I want to know the percentage of time spent on write operations for example.
Because when I divide for example STDIO_F_META_TIME by [real runtime (time spent in "real world") * number of processes], I often get something bigger than 1, which is not supposed to happen.
So to sum it up : Does Darshan give the average metric value per process per CPU, or only per process, and so it can not be compared with real spent time ?

Thank you very much.
Cordially, Florian, student in the HPC field.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ccross6_analysis.ipynb
Type: application/octet-stream
Size: 24657 bytes
Desc: ccross6_analysis.ipynb
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20210805/2a429aa2/attachment-0001.obj>


More information about the Darshan-users mailing list