<div dir="ltr"><div>I added the --verbose flag (sorry - I should have thought of that earlier). The job-summary still hangs. However, when I look at the directory, I see several pdf's of the individual plots, and I see a bunch of .dat, .eps, .tex, and a few other files:</div><div><br></div><div><span style="font-family:monospace">$ ls -s<br>total 340<br> 4 access-hist-eps.gplt 8 file-access-read.pdf 4 file-access-write.dat 4 fs-data-table.tex 8 op-counts.pdf 4 posix-access-hist.dat 0 summary.log 8 time-summary.pdf<br> 4 access-table.tex 4 file-access-read-sh.dat 24 file-access-write.eps 4 job-table.tex 4 pattern.dat 28 posix-access-hist.eps 4 summary.tex 4 title.tex<br> 4 file-access-eps.gplt 24 file-access-shared.eps 8 file-access-write.pdf 4 latex.output 24 pattern.eps 8 posix-access-hist.pdf 4 time-summary.dat 4 variance-table.tex<br>20 file-access-read.dat 8 file-access-shared.pdf 4 file-access-write-sh.dat 28 op-counts.eps 4 pattern-eps.gplt 4 posix-op-counts.dat 24 time-summary.eps<br>24 file-access-read.eps 4 file-access-table.tex 4 file-count-table.tex 4 op-counts-eps.gplt 8 pattern.pdf 4 stdio-op-counts.dat 4 time-summary-eps.gplt</span></div><div><br></div><div><br></div><div>The summary.log file is empty. But the summary.tex file looks correct (there is a \end{document} at the end of the document). I'm wondering if it gets stuck in converting summary.text to a pdf? Here are the pertinent processes:</div><div><br></div><div><br></div><div><span style="font-family:monospace">laytonjb 27458 5158 0 13:11 pts/2 00:00:00 perl /home/laytonjb/bin/darshan-3.3.1/bin/<a href="http://darshan-job-summary.pl">darshan-job-summary.pl</a> --verbose /home/laytonjb/darshan-logs/2021/7/13/laytonjb_python3_id6041-6041_7-13-47475-2131301613401632697_1.darshan --output python3.pdf<br>laytonjb 27492 27458 0 13:11 pts/2 00:00:00 sh -c pdflatex "\def\inclstdio{1} \\def\inclperf{1} \\def\incompletelog{1} \\def\titlecmd{python3} \ \def\titlemon{7} \ \def\titlemday{13} \ \def\titleyear{2021} \ \def\titlecmdline{ python3 cifar10-4-checkpoint.py } \ \def\jobid{ 6041} \ \def\jobuid{ 1000} \ \def\jobnprocs{ 1} \ \def\jobruntime{ 287} \ \def\filecri{0.046549} \ \def\filecrbi{9.35267639160156} \ \def\filecwi{0.046681} \ \def\filecwbi{0.772393226623535} \ \def\filecrs{0} \ \def\filecrbs{0} \ \def\filecws{0} \ \def\filecwbs{0} \ \def\filecmi{0.020773} \ \def\filecms{0} \ \def\filecmi{0.020773} \ \def\perflayer{POSIX} \ \def\perfest{88.94} \ \def\perfbytes{10.1} \ \def\stdioperfest{50.66} \ \def\stdioperfbytes{0.0} \ \input{summary.tex}" \ -halt-on-error > latex.output<br>laytonjb 27493 27492 0 13:11 pts/2 00:00:00 pdflatex \def\inclstdio{1} \def\inclperf{1} \def\incompletelog{1} \def\titlecmd{python3} \def\titlemon{7} \def\titlemday{13} \def\titleyear{2021} \def\titlecmdline{ python3 cifar10-4-checkpoint.py } \def\jobid{ 6041} \def\jobuid{ 1000} \def\jobnprocs{ 1} \def\jobruntime{ 287} \def\filecri{0.046549} \def\filecrbi{9.35267639160156} \def\filecwi{0.046681} \def\filecwbi{0.772393226623535} \def\filecrs{0} \def\filecrbs{0} \def\filecws{0} \def\filecwbs{0} \def\filecmi{0.020773} \def\filecms{0} \def\filecmi{0.020773} \def\perflayer{POSIX} \def\perfest{88.94} \def\perfbytes{10.1} \def\stdioperfest{50.66} \def\stdioperfbytes{0.0} \input{summary.tex} -halt-on-error<br></span></div><div><br></div><div><br></div><div>(Apologies for the length).<br></div><div><br></div><div>While it creates some .pdf files, I'm wondering if there is a problem before pdflatex is called to process summary.tex? This is the version output from pdflatex:</div><div><br></div><div>$ pdflatex --version<br>pdfTeX 3.14159265-2.6-1.40.20 (TeX Live 2019/Debian)<br>kpathsea version 6.3.1<br>Copyright 2019 Han The Thanh (pdfTeX) et al.<br>There is NO warranty. Redistribution of this software is<br>covered by the terms of both the pdfTeX copyright and<br>the Lesser GNU General Public License.<br>For more information about these matters, see the file<br>named COPYING and the pdfTeX source.<br>Primary author of pdfTeX: Han The Thanh (pdfTeX) et al.<br>Compiled with libpng 1.6.37; using libpng 1.6.37<br>Compiled with zlib 1.2.11; using zlib 1.2.11<br>Compiled with xpdf version 4.01<br></div><div><br></div><div><br></div><div>I'm guessing you and Rob are using CentOS? Ubuntu sometimes makes things difficult.<br></div><div><br></div><div>Thanks!</div><div><br></div><div><br></div><div>Jeff</div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jul 14, 2021 at 10:17 AM Snyder, Shane <<a href="mailto:ssnyder@mcs.anl.gov">ssnyder@mcs.anl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Hi Jeff,</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
I similarly tried running job-summary on your log using our current main branch (which is essentially just Darshan 3.3.1), and it worked fine, so not exactly sure what the problem is, but doesn't appear to be a general bug. You might be able to find some hints
about what's going wrong by running job-summary again with the '--verbose' flag -- this persists the temporary directory Darshan is using for creating the PDF files, including pdflatex logs, etc. You might be able to find some error messages in the 'summary.log'
file that give some sort of indication in what's failing/hanging? Not the most straightforward debugging strategy but I don't really have much else to suggest...<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
As a side note, we are in the middle of developing new Darshan analysis tools based on PyDarshan that will hopefully be available before too long. There's a lot more development momentum on our end towards these new PyDarshan-based analysis tools, with the
older tools likely being deprecated once these are available. I just mention this for you and other users so you're aware help is on the way and that we aren't completely ignoring these issues.<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Thanks,</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
--Shane<br>
</div>
<div id="gmail-m_464647812496366509appendonsend"></div>
<hr style="display:inline-block;width:98%">
<div id="gmail-m_464647812496366509divRplyFwdMsg" dir="ltr"><font style="font-size:11pt" face="Calibri, sans-serif" color="#000000"><b>From:</b> Darshan-users <<a href="mailto:darshan-users-bounces@lists.mcs.anl.gov" target="_blank">darshan-users-bounces@lists.mcs.anl.gov</a>> on behalf of Jeffrey Layton <<a href="mailto:laytonjb@gmail.com" target="_blank">laytonjb@gmail.com</a>><br>
<b>Sent:</b> Tuesday, July 13, 2021 2:00 PM<br>
<b>To:</b> Latham, Robert J. <<a href="mailto:robl@mcs.anl.gov" target="_blank">robl@mcs.anl.gov</a>><br>
<b>Cc:</b> <a href="mailto:darshan-users@lists.mcs.anl.gov" target="_blank">darshan-users@lists.mcs.anl.gov</a> <<a href="mailto:darshan-users@lists.mcs.anl.gov" target="_blank">darshan-users@lists.mcs.anl.gov</a>><br>
<b>Subject:</b> Re: [Darshan-users] Hang on post-process</font>
<div> </div>
</div>
<div>
<div dir="ltr">
<div>Thanks Rob!! I appreciate the pdf (at least I won't look like a slacker and actually produced something).</div>
<div><br>
</div>
<div>What steps do you want to take to debug the issue? I'm guessing it's a configuration issue or dependency issue on my side. BTW - I'm running Ubuntu 20.04 on an AMD system. I built Darshan 3.3.1 using gcc 9.3.0 (Ubuntu 20.04 version).</div>
<div><br>
</div>
<div>Thanks!</div>
<div><br>
</div>
<div>Jeff</div>
<div><br>
</div>
</div>
<br>
<div>
<div dir="ltr">On Tue, Jul 13, 2021 at 2:46 PM Latham, Robert J. <<a href="mailto:robl@mcs.anl.gov" target="_blank">robl@mcs.anl.gov</a>> wrote:<br>
</div>
<blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Howdy Jeff: thanks for sending the log file<br>
<br>
It looks like a legitimate log file to me. `darshan-job-parser`, which<br>
simply dumps the counters and such to stdout, gives me a reasonable<br>
looking log file. here's the header:<br>
<br>
# darshan log version: 3.21<br>
# compression method: ZLIB<br>
# exe: python3 cifar10-4-checkpoint.py <br>
# uid: 1000<br>
# jobid: 6041<br>
# start_time: 1626196275<br>
# start_time_asci: Tue Jul 13 12:11:15 2021<br>
# end_time: 1626196561<br>
# end_time_asci: Tue Jul 13 12:16:01 2021<br>
# nprocs: 1<br>
# run time: 287<br>
# metadata: lib_ver = 3.3.1<br>
# metadata: h = romio_no_indep_rw=true;cb_nodes=4<br>
<br>
# log file regions<br>
# -------------------------------------------------------<br>
# header: 360 bytes (uncompressed)<br>
# job data: 543 bytes (compressed)<br>
# record table: 18164 bytes (compressed)<br>
# POSIX module: 41682 bytes (compressed), ver=4<br>
# STDIO module: 230 bytes (compressed), ver=2<br>
<br>
And a <a href="http://darshan-job-summary.pl" rel="noreferrer" target="_blank">darshan-job-summary.pl</a> that I built back in August 2020 generates<br>
a pdf for me in a few seconds. I've attached it for you but really we<br>
should figure out what's going on in your environment<br>
<br>
<br>
==rob<br>
<br>
On Tue, 2021-07-13 at 13:41 -0400, Jeffrey Layton wrote:<br>
> Good afternoon,<br>
> <br>
> Apologies for posting yet another problem :) I'm trying to use<br>
> Darshan on a Tensorflow/Keras script. It's a simple model operating<br>
> on the CIFAR-10 data set (fairly small). Darshan produces the output<br>
> files but when I try to post-process one using darshan-job-<br>
> <a href="http://summary.pl" rel="noreferrer" target="_blank">summary.pl</a>, it hangs and I end up having to kill the process (I<br>
> waited about an hour - just to be sure).<br>
> <br>
> I run the script using the following:<br>
> <br>
> export DARSHAN_EXCLUDE_DIRS=/proc,/etc,/dev,/sys<br>
> env LD_PRELOAD=/home/laytonjb/bin/darshan-3.3.1/lib/libdarshan.so<br>
> python3 cifar10-4-checkpoint.py<br>
> <br>
> (I can provide the script if needed). It produces four files:<br>
> <br>
> $ ls -s<br>
> total 72<br>
> 4 laytonjb_ptxas_id6210-6210_7-13-47480-<br>
> 2131301613401632697_1.darshan 60 laytonjb_python3_id6041-6041_7-13-<br>
> 47475-2131301613401632697_1.darshan<br>
> 4 laytonjb_ptxas_id6211-6211_7-13-47480-<br>
> 2131301613401632697_1.darshan 4 laytonjb_uname_id6056-6056_7-13-<br>
> 47475-2131301613401632697_1.darshan<br>
> <br>
> <br>
> I chose to post-process the "python3" output but this is where it<br>
> hangs. I'm attaching the darshan output file if that is of any help.<br>
> <br>
> Thanks for any help.<br>
> <br>
> Jeff<br>
> <br>
> <br>
> <br>
> <br>
> _______________________________________________<br>
> Darshan-users mailing list<br>
> <a href="mailto:Darshan-users@lists.mcs.anl.gov" target="_blank">Darshan-users@lists.mcs.anl.gov</a><br>
> <a href="https://lists.mcs.anl.gov/mailman/listinfo/darshan-users" rel="noreferrer" target="_blank">
https://lists.mcs.anl.gov/mailman/listinfo/darshan-users</a><br>
</blockquote>
</div>
</div>
</div>
</blockquote></div>