[Swift-devel] Lammps on BGQ: task completes but status shows active

Ketan Maheshwari ketan at mcs.anl.gov
Mon Dec 8 15:10:03 CST 2014


Also tried to run the runjob command in verbose "TRACE" mode but did not
see anything unusual there. Attached.

On Mon, Dec 8, 2014 at 2:07 PM, Ketan Maheshwari <ketan at mcs.anl.gov> wrote:

> I tried to get strace output with two methods:
>
> stderr.txt: This was obtained by attaching the "--strace 0" switch to the
> runjob command. It seems to be exiting normally after writing a bunch of
> stuff.
>
> strace.out: This one was obtained by wrapping the app exe with strace -o
> $HOME/strace.out  ...
>
> This one shows a stuck output with the last line as:
>
> waitpid(-1, %
>
> Please find both attached.
>
> I am trying to figure why this one stuck but any insights will help.
>
> Thanks,
> Ketan
>
> On Mon, Dec 8, 2014 at 12:48 PM, Mihael Hategan <hategan at mcs.anl.gov>
> wrote:
>
>> I would put an strace in _swiftwrap around the executable to see what
>> keeps it from completing.
>>
>> Mihael
>>
>> On Mon, 2014-12-08 at 10:29 -0600, Ketan Maheshwari wrote:
>> > Hi Mihael, All,
>> >
>> > Can you help debugging this issue.
>> >
>> > On BG/Q (cetus), running lammps with provider coaster.
>> >
>> > The symptom is that the lammps task completes but Swift still thinks it
>> is
>> > running and continues to show "Active" status. Worker logs also show
>> that
>> > the task is running. The _wrapperlog is stalled in EXECUTE stage.
>> >
>> > The script (bg.sh) running in the qsub "script" mode invokes runjob and
>> it
>> > seems that the line after runjob is not reached, meaning runjob does not
>> > return.
>> >
>> > The same configuration (qsub in script mode and runjob with same
>> > parameters) , when run outside of Swift seems to be working (ie. script
>> > exits on completion).
>> >
>> > Attaching the bg.sh, and a tarball with Swift run dir and worker log in
>> > DEBUG mode.
>> >
>> > Thanks for any help further debugging this.
>> >
>> > Best,
>> > Ketan
>>
>>
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20141208/d606a9bf/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: trace.stderr.txt.gz
Type: application/x-gzip
Size: 776970 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20141208/d606a9bf/attachment.bin>


More information about the Swift-devel mailing list