[Swift-devel] process hogging memory on ranger login

Mihael Hategan hategan at mcs.anl.gov
Wed Nov 25 11:38:09 CST 2009


On Wed, 2009-11-25 at 11:33 -0600, skenny at uchicago.edu wrote:

> so, my (~1 million job) workflow, submitted to ranger hangs in
> this state:
> 
> Progress:  Submitted:16383  Finished successfully:55681
> Progress:  Submitted:16383  Finished successfully:55681
> 
> on ranger i have nothing in the queue. but i am showing a
> process still running on login3:
>  
>  8825 tg457040  28  12  472m 232m 5660 S 15.8  0.7 130:56.41 java
> 
> i am showing some errors in the stderr.txt of the jobs that
> were running (they access our database which apparently went
> down at some point). however, it seems troubling that when the
> app fails that coaster job is still running on the remote site
> and the workflow hangs w/o reporting anything...
> 
> the log is too large to attach, but is here on ci:
> 
> /ci/projects/cnari/logs/skenny/importDTI-20091124-1655-agj0mze1.log
> 
> let me know if you need the coaster log as well.

That may occur at times, such as when the service runs out of memory. So
yes, I do need the coaster log.

Regardless of the exact reason, I think that there needs to be extra
logic in there to ensure liveness. In other words a lost state should
not be interpreted as "still in last state" but "failure".




More information about the Swift-devel mailing list