[Darshan-users] Unexpected Darshan I/O characterization running IOR BM

Cormac Garvey cormac.t.garvey at gmail.com
Mon Nov 26 18:35:00 CST 2018


Hi Phil,
Here is the information you requested. (See attached files)

strace_ior_write.log - Running strace on a single ior process (no mpirun)
(write file)
strace_ior_read.log - Running strace on a single ior process (no mpirun)
(read files generated in write step)

I do not have the original build environment for the ior binary I am using,
but I am sure beegfs was not enabled in the build, I did not explicitly
enable it and I recall during the build process it could not find the
necessary beegfs header files. I have attached the output from running nm
on the ior executable (nm ior >& nm_ior.log) instead.

Also
ldd /avere/apps/ior/bin/ior
        linux-vdso.so.1 =>  (0x00007ffd26b7e000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f981c0a7000)
        libmpi.so.12 =>
/opt/intel/compilers_and_libraries_2016.3.223/linux/mpi/intel64/lib/libmpi.so.12
(0x00007f981b8d7000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f981b514000)
        /lib64/ld-linux-x86-64.so.2 (0x000055a13c2f4000)
        librt.so.1 => /lib64/librt.so.1 (0x00007f981b30c000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f981b107000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f981aef1000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f981acd5000)

Thanks for your support,
Cormac.

On Mon, Nov 26, 2018 at 2:16 PM Carns, Philip H. <carns at mcs.anl.gov> wrote:

> I just glanced through the latest IOR revision in git and realized that it
> has some conditional logic for beegfs that will use a beegfs-specific
> library for some calls.
>
> I'm not sure if that is being triggered in your case or not, but that's a
> possibility.  Darshan doesn't have wrappers for that API.
>
> Could you check to see if HAVE_BEEGFS_BEEGFS_H is set in your src/config.h
> file?
>
> thanks,
> -Phil
>
>
>
>
> On 2018-11-26 15:50:10-05:00 Darshan-users wrote:
>
> Thanks Cormac.  I don't see any obvious problems from your output.  We'll
> see if we can reproduce here.
>
> In the mean time if you want to debug further, you could try running IOR
> as a single process (no mpirun or mpiexec) through strace to see if there
> is anything unusual in the file open path.
>
> thanks,
> -Phil
>
>
>
> On 2018-11-21 17:37:50-05:00 Cormac Garvey wrote:
>
> Hi Phil,
> Thanks for looking into this for me.
> Here is the information you requested.
> ldd ./ior
>         linux-vdso.so.1 =>  (0x00007fff0237c000)
>         libm.so.6 => /lib64/libm.so.6 (0x00007fce3e88f000)
>         libmpi.so.12 =>
> /opt/intel/compilers_and_libraries_2016.3.223/linux/mpi/intel64/lib/libmpi.so.12
> (0x00007fce3e0bf000)
>         libc.so.6 => /lib64/libc.so.6 (0x00007fce3dcfc000)
>         /lib64/ld-linux-x86-64.so.2 (0x000055de86c95000)
>         librt.so.1 => /lib64/librt.so.1 (0x00007fce3daf4000)
>         libdl.so.2 => /lib64/libdl.so.2 (0x00007fce3d8ef000)
>         libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fce3d6d9000)
>         libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fce3d4bd000)
> [testusera at ip-0A021004 bin]$
> Here is the info on testfile.0 and testfile.1 before executing the IOR
> read test.
> [testusera at ip-0A021004 testing]$ pwd
> /mnt/beegfs/testing
> [testusera at ip-0A021004 testing]$ ls -lt
> total 4194304
> -rw-r--r--. 1 testusera domain users 2147483648 Nov 21 22:18 testfile.1
> -rw-r--r--. 1 testusera domain users 2147483648 Nov 21 22:17 testfile.0
> [testusera at ip-0A021004 testing]$
> [testusera at ip-0A021004 testing]$ stat testfile.0
>   File: ‘testfile.0’
>   Size: 2147483648      Blocks: 4194304    IO Block: 524288 regular file
> Device: 29h/41d Inode: 2501562826468697566  Links: 1
> Access: (0644/-rw-r--r--)  Uid: (1264001104/testusera)   Gid:
> (1264000513/domain users)
> Context: system_u:object_r:tmp_t:s0
> Access: 2018-11-21 22:17:41.000000000 +0000
> Modify: 2018-11-21 22:17:56.000000000 +0000
> Change: 2018-11-21 22:17:56.000000000 +0000
>  Birth: -
> Result from running the IOR read test.
> IOR-3.2.0: MPI Coordinated Test of Parallel I/O
> Began               : Wed Nov 21 22:22:52 2018
> Command line        : /avere/apps/ior/bin/ior -a POSIX -B -r -k -z -v -o
> /mnt/beegfs/testing/testfile -i 2 -m -t 32m -b 256M -d 1
> Machine             : Linux ip-0A021005
> Start time skew across all tasks: 0.00 sec
> TestID              : 0
> StartTime           : Wed Nov 21 22:22:52 2018
> Path                : /mnt/beegfs/testing
> FS                  : 11.8 TiB   Used FS: 0.0%   Inodes: 0.0 Mi   Used
> Inodes: -nan%
> Participating tasks: 8
> Options:
> api                 : POSIX
> apiVersion          :
> test filename       : /mnt/beegfs/testing/testfile
> access              : single-shared-file
> type                : independent
> segments            : 1
> ordering in a file  : random
> ordering inter file : no tasks offsets
> tasks               : 8
> clients per node    : 8
> repetitions         : 2
> xfersize            : 32 MiB
> blocksize           : 256 MiB
> aggregate filesize  : 2 GiB
> Results:
> access    bw(MiB/s)  block(KiB) xfer(KiB)  open(s)    wr/rd(s)   close(s)
>  total(s)   iter
> ------    ---------  ---------- ---------  --------   --------   --------
>  --------   ----
> delaying 1 seconds . . .
> Commencing read performance test: Wed Nov 21 22:22:53 2018
> read      481.88     262144     32768      0.020854   4.23       2.97
>  4.25       0
> delaying 1 seconds . . .
> Commencing read performance test: Wed Nov 21 22:22:58 2018
> read      609.04     262144     32768      0.020336   3.34       2.06
>  3.36       1
> Max Read:  609.04 MiB/sec (638.62 MB/sec)
> Summary of all tests:
> Operation   Max(MiB)   Min(MiB)  Mean(MiB)     StdDev   Max(OPs)
>  Min(OPs)  Mean(OPs)     StdDev    Mean(s) Test# #Tasks tPN reps fPP reord
> reordoff reordrand seed segcnt   blksiz    xsize aggs(MiB)   API RefNum
> read          609.04     481.88     545.46      63.58      19.03
> 15.06      17.05       1.99    3.80633     0      8   8    2   0     0
>   1         0    0      1 268435456 33554432    2048.0 POSIX      0
> Finished            : Wed Nov 21 22:23:02 2018
> I also ran a test without the -B option and got the same result (no reads
> recorded)?
> Thanks,
> Cormac.
>
> On Wed, Nov 21, 2018 at 1:46 PM Carns, Philip H. via Darshan-users <
> darshan-users at lists.mcs.anl.gov> wrote:
>
>> Hi Cormac,
>>
>> That's strange.  The -i 2 and -m options are making IOR attempt to read
>> two files: testfile.0 and testfile.1.  Darshan is definitely aware of those
>> files, but not only are the read counters missing but also the open
>> counters.  It only shows a stat from each rank for the files in question.
>>
>> Can you confirm that both of the files exist and are big enough to
>> accommodate the requested read volume?  I wonder if IOR might stat the
>> files up front and exit early if they aren't the correct size?
>>
>> Could you also share the output of "ldd ior"?  I'm curious to make sure
>> there isn't anything unusual about the libraries linked in, but usually if
>> that were a problem you wouldn't get a log at all.
>>
>> Also one last idea, does the behavior change if you remove the -B
>> (O_DIRECT) option?  That shouldn't matter from Darshan's perspective, but
>> it might not hurt to check.
>>
>> thanks,
>> -Phil
>>
>>
>>
>> On 2018-11-20 19:59:55-05:00 Darshan-users wrote:
>>
>> Hello,
>> I recently installed Darshan 3.1.6 on Microsoft Azure VM running Centos
>> 7.4.
>> I got an unexpected result using Darshan to characterize the I/O for an
>> IOR benchmark
>> experiment.
>> export
>> LD_PRELOAD=/avere/apps/spack/linux-centos7-x86_64/gcc-4.8.5/darshan-runtime-3.1.6-daohky4yevagxajjl33lwk472lcgn6g4/lib/libdarshan.so
>> BLOCKSIZE=256M
>> mpirun -np 8 ior -a POSIX -B -r -k -z -v -o $FILEPATH -i 2 -m -t 32m -b
>> 256M -d 1
>> After the job completed, a Darshan log file was created, the resulting
>> text report (darshan-parser_ior_read_shared.out, its attached) was
>> generated using the following command
>> darshan-parser  --all
>> testuser_ior_id22-20-80229-14589323501426222819_1.darshan >&
>> darshan-parser_ior_read_shared.out
>> The above IOR benchmark is a read only benchmark to a shared file system,
>> but the resulting darshan report indicates there are no READ operations?
>> Any ideas why the resulting Darshan resport has no read operations? (Note
>> if I add a -w option to the above IOR benchmark  (i.e do a write and read
>> to a shared filesystem, darshan only
>> reports the writes and not reads?)
>> Any help would be appreciated,
>> Thanks for your support,
>> Cormac.
>>
>> _______________________________________________
>> Darshan-users mailing list
>> Darshan-users at lists.mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20181126/0bcb3a45/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nm_ior.log
Type: application/octet-stream
Size: 12473 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20181126/0bcb3a45/attachment-0003.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: strace_ior_read.log
Type: application/octet-stream
Size: 113780 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20181126/0bcb3a45/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: strace_ior_write.log
Type: application/octet-stream
Size: 114314 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20181126/0bcb3a45/attachment-0005.obj>


More information about the Darshan-users mailing list