[Darshan-users] Darshan & IPM results not the same
vsoni at mercator-ocean.fr
Tue Nov 9 03:47:20 CST 2021
The code does not use threading. And yes, there are many files I don't see in the darshan log, and they are relatively large compared to the ones intercepted.
And, the application does have fread() calls. But, the STDIO module does not have a significant value in total_STDIO_F_READ_TIME.
I realized that there are warnings in POSIX and STDIO modules about the incomplete data in the log. However, I see no change in the log even after setting DARSHAN_MODMEM to 1024 MiB.
Also, even though the application occupies only 110 GB memory out of 256 GB per node, setting DARSHAN_MODMEM to higher values such as 4096 MiB crashes the job (which makes me think that this value is per process - 128 per node?).
Is there any runtime environment variable to set for excluding a group of files instead of directories?
From: Harms, Kevin <harms at alcf.anl.gov>
Sent: Monday, November 8, 2021 8:36 PM
To: Vineet Soni <vsoni at mercator-ocean.fr>; darshan-users at lists.mcs.anl.gov
Subject: Re: Darshan & IPM results not the same
a few ideas:
- is the I/O done using fread() or similar? These are accounted under the STDIO module rather than the POSIX module. Can you check to see what STDIO module shows?
- is the application threaded? It's possible an issue with threading, but given the disparity that seems less likely.
- Perhaps an issue with darshan not intercepting a subset of the calls your application is making. If you look at the file name list, does it seem obvious that darshan is missing I/O from some set of files? (This could also be due to files being caught under the exclude list)
From: Darshan-users <darshan-users-bounces at lists.mcs.anl.gov> on behalf of Vineet Soni <vsoni at mercator-ocean.fr>
Sent: Monday, November 8, 2021 4:05 AM
To: darshan-users at lists.mcs.anl.gov
Subject: [Darshan-users] Darshan & IPM results not the same
I am trying to analyze the IO behavior of our codes with Darshan.
Compilers: Intel 2018
MPI: Intel MPI 2018
FS: Lustre (lustre-module disabled in Darshan configuration)
Darshan profiling: LD_PRELOAD
I observe a big difference in IO results from Darshan and IPM (v2.0.5) for one of our codes. I guess that both profilers are not profiling the same POSIX calls?
The POSIXIO calls profiled in IPM are:
fopen, fdopen, freopen, open, open64
fseek, lseek, lseek64
fgetpos, fsetpos, fgetc, getc, ungetc
truncate, ftruncate, truncate64, ftruncate64
While the ones profiled by Darshan are: https://github.com/darshan-hpc/darshan/blob/main/darshan-runtime/lib/darshan-posix.c ?
However, the huge difference is observed in the “read” call, which exists in both the profilers.
| | IPM | Darshan |
| Read (s) | 324.57 | 6.02 |
| Agg. Read (count) | 34 766 456 | 2 946 271 |
I tested Darshan and IPM with other codes (reading NOT the same files) to check if this issue is faced in them as well. But, I got the same results.
So, I don't understand what could be the reason that this application is not giving the same results.
Do you have any idea of why this could happen?
Thanks in advance.
PS: The application does a lot of IO, and is expected to spend a significant time in read operations.
More information about the Darshan-users