[Darshan-users] Hang on post-process

Latham, Robert J. robl at mcs.anl.gov
Tue Jul 13 13:46:42 CDT 2021


Howdy Jeff: thanks for sending the log file

It looks like a legitimate log file to me.  `darshan-job-parser`, which
simply dumps the counters and such to stdout, gives me a reasonable
looking log file.  here's the header:

# darshan log version: 3.21
# compression method: ZLIB
# exe: python3 cifar10-4-checkpoint.py 
# uid: 1000
# jobid: 6041
# start_time: 1626196275
# start_time_asci: Tue Jul 13 12:11:15 2021
# end_time: 1626196561
# end_time_asci: Tue Jul 13 12:16:01 2021
# nprocs: 1
# run time: 287
# metadata: lib_ver = 3.3.1
# metadata: h = romio_no_indep_rw=true;cb_nodes=4

# log file regions
# -------------------------------------------------------
# header: 360 bytes (uncompressed)
# job data: 543 bytes (compressed)
# record table: 18164 bytes (compressed)
# POSIX module: 41682 bytes (compressed), ver=4
# STDIO module: 230 bytes (compressed), ver=2

And a darshan-job-summary.pl that I built back in August 2020 generates
a pdf for me in a few seconds.  I've attached it for you but really we
should figure out what's going on in your environment


==rob

On Tue, 2021-07-13 at 13:41 -0400, Jeffrey Layton wrote:
> Good afternoon,
> 
> Apologies for posting yet another problem :)  I'm trying to use
> Darshan on a Tensorflow/Keras script. It's a simple model operating
> on the CIFAR-10 data set (fairly small). Darshan produces the output
> files but when I try to post-process one using darshan-job-
> summary.pl, it hangs and I end up having to kill the process (I
> waited about an hour - just to be sure).
> 
> I run the script using the following:
> 
> export DARSHAN_EXCLUDE_DIRS=/proc,/etc,/dev,/sys
> env LD_PRELOAD=/home/laytonjb/bin/darshan-3.3.1/lib/libdarshan.so
> python3 cifar10-4-checkpoint.py
> 
> (I can provide the script if needed). It produces four files:
> 
> $ ls -s
> total 72
>  4 laytonjb_ptxas_id6210-6210_7-13-47480-
> 2131301613401632697_1.darshan  60 laytonjb_python3_id6041-6041_7-13-
> 47475-2131301613401632697_1.darshan
>  4 laytonjb_ptxas_id6211-6211_7-13-47480-
> 2131301613401632697_1.darshan   4 laytonjb_uname_id6056-6056_7-13-
> 47475-2131301613401632697_1.darshan
> 
> 
> I chose to post-process the "python3" output but this is where it
> hangs. I'm attaching the darshan output file if that is of any help.
> 
> Thanks for any help.
> 
> Jeff
> 
> 
> 
> 
> _______________________________________________
> Darshan-users mailing list
> Darshan-users at lists.mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
-------------- next part --------------
A non-text attachment was scrubbed...
Name: laytonjb_python3_id6041-6041_7-13-47475-2131301613401632697_1.darshan.pdf
Type: application/pdf
Size: 81576 bytes
Desc: laytonjb_python3_id6041-6041_7-13-47475-2131301613401632697_1.darshan.pdf
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20210713/71a5eadb/attachment-0001.pdf>


More information about the Darshan-users mailing list