[Darshan-users] Darshan & IPM results not the same

Vineet Soni vsoni at mercator-ocean.fr
Tue Nov 9 03:47:20 CST 2021


Hi Kevin,

The code does not use threading. And yes, there are many files I don't see in the darshan log, and they are relatively large compared to the ones intercepted.
And, the application does have fread() calls. But, the STDIO module does not have a significant value in total_STDIO_F_READ_TIME.

I realized that there are warnings in POSIX and STDIO modules about the incomplete data in the log. However, I see no change in the log even after setting DARSHAN_MODMEM to 1024 MiB.
Also, even though the application occupies only 110 GB memory out of 256 GB per node, setting DARSHAN_MODMEM to higher values such as 4096 MiB crashes the job (which makes me think that this value is per process - 128 per node?).

Is there any runtime environment variable to set for excluding a group of files instead of directories?

Thanks,
Vineet


-----Original Message-----
From: Harms, Kevin <harms at alcf.anl.gov> 
Sent: Monday, November 8, 2021 8:36 PM
To: Vineet Soni <vsoni at mercator-ocean.fr>; darshan-users at lists.mcs.anl.gov
Subject: Re: Darshan & IPM results not the same

Vineet,

  a few ideas:
  - is the I/O done using fread() or similar? These are accounted under the STDIO module rather than the POSIX module. Can you check to see what STDIO module shows?
  - is the application threaded? It's possible an issue with threading, but given the disparity that seems less likely.
  - Perhaps an issue with darshan not intercepting a subset of the calls your application is making. If you look at the file name list, does it seem obvious that darshan is missing I/O from some set of files? (This could also be due to files being caught under the exclude list)

kevin

________________________________________
From: Darshan-users <darshan-users-bounces at lists.mcs.anl.gov> on behalf of Vineet Soni <vsoni at mercator-ocean.fr>
Sent: Monday, November 8, 2021 4:05 AM
To: darshan-users at lists.mcs.anl.gov
Subject: [Darshan-users] Darshan & IPM results not the same

Hello,

I am trying to analyze the IO behavior of our codes with Darshan.
Darshan: 3.3.0
Compilers: Intel 2018
MPI: Intel MPI 2018
FS: Lustre (lustre-module disabled in Darshan configuration)
Darshan profiling: LD_PRELOAD

I observe a big difference in IO results from Darshan and IPM (v2.0.5) for one of our codes. I guess that both profilers are not profiling the same POSIX calls?

The POSIXIO calls profiled in IPM are:

fopen, fdopen, freopen, open, open64
fclose, close
fflush
fread, read
fwrite, write
fseek, lseek, lseek64
ftell
rewind
fgetpos, fsetpos, fgetc, getc, ungetc
creat
truncate, ftruncate, truncate64, ftruncate64

While the ones profiled by Darshan are: https://github.com/darshan-hpc/darshan/blob/main/darshan-runtime/lib/darshan-posix.c ?

However, the huge difference is observed in the “read” call, which exists in both the profilers.

+-------------------+------------+-----------+
|                   |     IPM    |  Darshan  |
+-------------------+------------+-----------+
| Read (s)          |     324.57 |      6.02 |
+-------------------+------------+-----------+
| Agg. Read (count) | 34 766 456 | 2 946 271 |
+-------------------+------------+-----------+

I tested Darshan and IPM with other codes (reading NOT the same files) to check if this issue is faced in them as well. But, I got the same results.
So, I don't understand what could be the reason that this application is not giving the same results.

Do you have any idea of why this could happen?

Thanks in advance.

PS: The application does a lot of IO, and is expected to spend a significant time in read operations.

Best regards,
Vineet



More information about the Darshan-users mailing list