[Darshan-users] Hang on post-process

Jeffrey Layton laytonjb at gmail.com
Tue Jul 13 14:00:23 CDT 2021


Thanks Rob!! I appreciate the pdf (at least I won't look like a slacker and
actually produced something).

What steps do you want to take to debug the issue? I'm guessing it's a
configuration issue or dependency issue on my side. BTW - I'm running
Ubuntu 20.04 on an AMD system.  I built Darshan 3.3.1 using gcc 9.3.0
(Ubuntu 20.04 version).

Thanks!

Jeff


On Tue, Jul 13, 2021 at 2:46 PM Latham, Robert J. <robl at mcs.anl.gov> wrote:

> Howdy Jeff: thanks for sending the log file
>
> It looks like a legitimate log file to me.  `darshan-job-parser`, which
> simply dumps the counters and such to stdout, gives me a reasonable
> looking log file.  here's the header:
>
> # darshan log version: 3.21
> # compression method: ZLIB
> # exe: python3 cifar10-4-checkpoint.py
> # uid: 1000
> # jobid: 6041
> # start_time: 1626196275
> # start_time_asci: Tue Jul 13 12:11:15 2021
> # end_time: 1626196561
> # end_time_asci: Tue Jul 13 12:16:01 2021
> # nprocs: 1
> # run time: 287
> # metadata: lib_ver = 3.3.1
> # metadata: h = romio_no_indep_rw=true;cb_nodes=4
>
> # log file regions
> # -------------------------------------------------------
> # header: 360 bytes (uncompressed)
> # job data: 543 bytes (compressed)
> # record table: 18164 bytes (compressed)
> # POSIX module: 41682 bytes (compressed), ver=4
> # STDIO module: 230 bytes (compressed), ver=2
>
> And a darshan-job-summary.pl that I built back in August 2020 generates
> a pdf for me in a few seconds.  I've attached it for you but really we
> should figure out what's going on in your environment
>
>
> ==rob
>
> On Tue, 2021-07-13 at 13:41 -0400, Jeffrey Layton wrote:
> > Good afternoon,
> >
> > Apologies for posting yet another problem :)  I'm trying to use
> > Darshan on a Tensorflow/Keras script. It's a simple model operating
> > on the CIFAR-10 data set (fairly small). Darshan produces the output
> > files but when I try to post-process one using darshan-job-
> > summary.pl, it hangs and I end up having to kill the process (I
> > waited about an hour - just to be sure).
> >
> > I run the script using the following:
> >
> > export DARSHAN_EXCLUDE_DIRS=/proc,/etc,/dev,/sys
> > env LD_PRELOAD=/home/laytonjb/bin/darshan-3.3.1/lib/libdarshan.so
> > python3 cifar10-4-checkpoint.py
> >
> > (I can provide the script if needed). It produces four files:
> >
> > $ ls -s
> > total 72
> >  4 laytonjb_ptxas_id6210-6210_7-13-47480-
> > 2131301613401632697_1.darshan  60 laytonjb_python3_id6041-6041_7-13-
> > 47475-2131301613401632697_1.darshan
> >  4 laytonjb_ptxas_id6211-6211_7-13-47480-
> > 2131301613401632697_1.darshan   4 laytonjb_uname_id6056-6056_7-13-
> > 47475-2131301613401632697_1.darshan
> >
> >
> > I chose to post-process the "python3" output but this is where it
> > hangs. I'm attaching the darshan output file if that is of any help.
> >
> > Thanks for any help.
> >
> > Jeff
> >
> >
> >
> >
> > _______________________________________________
> > Darshan-users mailing list
> > Darshan-users at lists.mcs.anl.gov
> > https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20210713/638b5e6d/attachment.html>


More information about the Darshan-users mailing list