[Swift-devel] Coasters and std's on ranger

Michael Wilde wilde at mcs.anl.gov
Tue Jul 14 14:46:17 CDT 2009


So are any of the following reasonable ways to proceed?

1) Develop an SGE provider (hopefully heavily based on the PBS provider) 
and run on Ranger locally.

2) Debug getting Coasters, GRAM and SGE to coexist nicely (ie the 
debugging route in progress now)

3) Start the coaster service manually in one block allocation and have 
it rendezvous with Swift

For (2) can we create a GRAM test job outside of Swift that we can 
debug, to try to find a set of GRAM options that work? I need to read 
the thread more carefully, but I dont understand if the problem is in 
Ranger SGE, the GRAM SGE jobmanager, or the interaction between them.

I'll re-read the thread first before asking for more clarification; I 
didnt get it on first read.

- Mike


On 7/14/09 2:33 PM, Mihael Hategan wrote:
> This isn't the app stdout/stderr, but the job stdout/stderr. They are
> redirected in coasters for debugging/accounting purposes, and with SGE
> because the [censored] thing doesn't work otherwise.
> 
> On Tue, 2009-07-14 at 14:29 -0500, Michael Wilde wrote:
>> Will the current code work for swift programs that dont use stdout or 
>> stderr? (Ie where the app wrappers redirect these to a file?)
>>
>> - Mike
>>
>> On 7/14/09 2:21 PM, Mihael Hategan wrote:
>>> On Tue, 2009-07-14 at 13:22 -0500, skenny at uchicago.edu wrote:
>>>> darn...
>>>>
>>>> Execution failed:
>>>>         Exception in RInvoke:
>>>> Arguments: [scripts/4reg_dummy.R,
>>>> matrices/4_reg/network1/gestspeech.cov, 29, 0.5, speech]
>>>> Host: RANGER
>>>> Directory:
>>>> 4reg_speech-20090714-1309-b650zi68/jobs/3/RInvoke-351ksndj
>>>> stderr.txt:
>>>>
>>>> stdout.txt:
>>>>
>>>> ----
>>>>
>>>> Caused by:
>>>>         Block task failed: 0714-090151-000000Block task ended
>>>> prematurely
>>>>
>>>> Cleaning up...
>>>> Shutting down service at https://129.114.50.163:38571
>>>>
>>>> i can file a bug report with TG if need be, but i'm not quite
>>>> sure the best thing to tell them (?) also, i'm wondering how
>>>> coasters was previously able to work around this bug? 
>>> By redirecting stdout+stderr to memory, but that causes the "job manager
>>> could not stage out a file" problem.
>>>
>>>
>>> _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 



More information about the Swift-devel mailing list