<div dir="ltr">Also tried to run the runjob command in verbose "TRACE" mode but did not see anything unusual there. Attached.</div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Dec 8, 2014 at 2:07 PM, Ketan Maheshwari <span dir="ltr"><<a href="mailto:ketan@mcs.anl.gov" target="_blank">ketan@mcs.anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>I tried to get strace output with two methods:</div><div><br></div><div><div>stderr.txt: This was obtained by attaching the "--strace 0" switch to the runjob command. It seems to be exiting normally after writing a bunch of stuff.</div></div><div><br></div><div>strace.out: This one was obtained by wrapping the app exe with strace -o $HOME/strace.out ...</div><div><br></div><div>This one shows a stuck output with the last line as:</div><div><br></div><div>waitpid(-1, %</div><div><br></div><div>Please find both attached.</div><div><br></div><div>I am trying to figure why this one stuck but any insights will help.</div><div><br></div><div>Thanks,</div><div>Ketan</div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Dec 8, 2014 at 12:48 PM, Mihael Hategan <span dir="ltr"><<a href="mailto:hategan@mcs.anl.gov" target="_blank">hategan@mcs.anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span>I would put an strace in _swiftwrap around the executable to see what<br>
keeps it from completing.<br>
<br>
Mihael<br>
<br>
On Mon, 2014-12-08 at 10:29 -0600, Ketan Maheshwari wrote:<br>
</span><div><div>> Hi Mihael, All,<br>
><br>
> Can you help debugging this issue.<br>
><br>
> On BG/Q (cetus), running lammps with provider coaster.<br>
><br>
> The symptom is that the lammps task completes but Swift still thinks it is<br>
> running and continues to show "Active" status. Worker logs also show that<br>
> the task is running. The _wrapperlog is stalled in EXECUTE stage.<br>
><br>
> The script (bg.sh) running in the qsub "script" mode invokes runjob and it<br>
> seems that the line after runjob is not reached, meaning runjob does not<br>
> return.<br>
><br>
> The same configuration (qsub in script mode and runjob with same<br>
> parameters) , when run outside of Swift seems to be working (ie. script<br>
> exits on completion).<br>
><br>
> Attaching the bg.sh, and a tarball with Swift run dir and worker log in<br>
> DEBUG mode.<br>
><br>
> Thanks for any help further debugging this.<br>
><br>
> Best,<br>
> Ketan<br>
<br>
<br>
</div></div>_______________________________________________<br>
Swift-devel mailing list<br>
<a href="mailto:Swift-devel@ci.uchicago.edu" target="_blank">Swift-devel@ci.uchicago.edu</a><br>
<a href="https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel" target="_blank">https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel</a><br>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>