[Darshan-users] Hang on post-process
Latham, Robert J.
robl at mcs.anl.gov
Tue Jul 13 13:46:42 CDT 2021
Howdy Jeff: thanks for sending the log file
It looks like a legitimate log file to me. `darshan-job-parser`, which
simply dumps the counters and such to stdout, gives me a reasonable
looking log file. here's the header:
# darshan log version: 3.21
# compression method: ZLIB
# exe: python3 cifar10-4-checkpoint.py
# uid: 1000
# jobid: 6041
# start_time: 1626196275
# start_time_asci: Tue Jul 13 12:11:15 2021
# end_time: 1626196561
# end_time_asci: Tue Jul 13 12:16:01 2021
# nprocs: 1
# run time: 287
# metadata: lib_ver = 3.3.1
# metadata: h = romio_no_indep_rw=true;cb_nodes=4
# log file regions
# -------------------------------------------------------
# header: 360 bytes (uncompressed)
# job data: 543 bytes (compressed)
# record table: 18164 bytes (compressed)
# POSIX module: 41682 bytes (compressed), ver=4
# STDIO module: 230 bytes (compressed), ver=2
And a darshan-job-summary.pl that I built back in August 2020 generates
a pdf for me in a few seconds. I've attached it for you but really we
should figure out what's going on in your environment
==rob
On Tue, 2021-07-13 at 13:41 -0400, Jeffrey Layton wrote:
> Good afternoon,
>
> Apologies for posting yet another problem :) I'm trying to use
> Darshan on a Tensorflow/Keras script. It's a simple model operating
> on the CIFAR-10 data set (fairly small). Darshan produces the output
> files but when I try to post-process one using darshan-job-
> summary.pl, it hangs and I end up having to kill the process (I
> waited about an hour - just to be sure).
>
> I run the script using the following:
>
> export DARSHAN_EXCLUDE_DIRS=/proc,/etc,/dev,/sys
> env LD_PRELOAD=/home/laytonjb/bin/darshan-3.3.1/lib/libdarshan.so
> python3 cifar10-4-checkpoint.py
>
> (I can provide the script if needed). It produces four files:
>
> $ ls -s
> total 72
> 4 laytonjb_ptxas_id6210-6210_7-13-47480-
> 2131301613401632697_1.darshan 60 laytonjb_python3_id6041-6041_7-13-
> 47475-2131301613401632697_1.darshan
> 4 laytonjb_ptxas_id6211-6211_7-13-47480-
> 2131301613401632697_1.darshan 4 laytonjb_uname_id6056-6056_7-13-
> 47475-2131301613401632697_1.darshan
>
>
> I chose to post-process the "python3" output but this is where it
> hangs. I'm attaching the darshan output file if that is of any help.
>
> Thanks for any help.
>
> Jeff
>
>
>
>
> _______________________________________________
> Darshan-users mailing list
> Darshan-users at lists.mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
-------------- next part --------------
A non-text attachment was scrubbed...
Name: laytonjb_python3_id6041-6041_7-13-47475-2131301613401632697_1.darshan.pdf
Type: application/pdf
Size: 81576 bytes
Desc: laytonjb_python3_id6041-6041_7-13-47475-2131301613401632697_1.darshan.pdf
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20210713/71a5eadb/attachment-0001.pdf>
More information about the Darshan-users
mailing list