[Swift-devel] process hogging memory on ranger login

skenny at uchicago.edu skenny at uchicago.edu
Wed Nov 25 11:33:08 CST 2009


>> so, i was trying to re-run this workflow with the latest swift
>> (swift-r3191 cog-r2620) to try and replicate the error.
>> however, a new error has surfaced...the environment, as
>> specified in my tc.data file, is no-longer being set by swift
>> on the remote end. is it possible this is due to recent
>> changes in swift? i am running the same workflow, same tc &
>> sites files with the newer swift and am getting errors (from
>> the app) due to my LD_LIBRARY_PATH not being set. if i switch
>> back to swift-r3116 cog-r2482, the error goes away.
>> 
>> 
>
>This was a bug introduced earlier during some changes to how
the profile
>stuff was handled. Should be fixed in swift r3192.

cool, this seems to be fixed, thanks mihael! now i'm able to
replicate the error with the latest swift. 

so, my (~1 million job) workflow, submitted to ranger hangs in
this state:

Progress:  Submitted:16383  Finished successfully:55681
Progress:  Submitted:16383  Finished successfully:55681

on ranger i have nothing in the queue. but i am showing a
process still running on login3:
 
 8825 tg457040  28  12  472m 232m 5660 S 15.8  0.7 130:56.41 java

i am showing some errors in the stderr.txt of the jobs that
were running (they access our database which apparently went
down at some point). however, it seems troubling that when the
app fails that coaster job is still running on the remote site
and the workflow hangs w/o reporting anything...

the log is too large to attach, but is here on ci:

/ci/projects/cnari/logs/skenny/importDTI-20091124-1655-agj0mze1.log

let me know if you need the coaster log as well.

~sk



More information about the Swift-devel mailing list