[Swift-devel] process hogging memory on ranger login
Mihael Hategan
hategan at mcs.anl.gov
Wed Nov 25 11:38:09 CST 2009
On Wed, 2009-11-25 at 11:33 -0600, skenny at uchicago.edu wrote:
> so, my (~1 million job) workflow, submitted to ranger hangs in
> this state:
>
> Progress: Submitted:16383 Finished successfully:55681
> Progress: Submitted:16383 Finished successfully:55681
>
> on ranger i have nothing in the queue. but i am showing a
> process still running on login3:
>
> 8825 tg457040 28 12 472m 232m 5660 S 15.8 0.7 130:56.41 java
>
> i am showing some errors in the stderr.txt of the jobs that
> were running (they access our database which apparently went
> down at some point). however, it seems troubling that when the
> app fails that coaster job is still running on the remote site
> and the workflow hangs w/o reporting anything...
>
> the log is too large to attach, but is here on ci:
>
> /ci/projects/cnari/logs/skenny/importDTI-20091124-1655-agj0mze1.log
>
> let me know if you need the coaster log as well.
That may occur at times, such as when the service runs out of memory. So
yes, I do need the coaster log.
Regardless of the exact reason, I think that there needs to be extra
logic in there to ensure liveness. In other words a lost state should
not be interpreted as "still in last state" but "failure".
More information about the Swift-devel
mailing list