[Swift-user] Re: [Swift-devel] trouble resuming

skenny at uchicago.edu skenny at uchicago.edu
Fri Sep 25 13:30:14 CDT 2009


ok, i see what you're saying...it's 'theoretically' possible,
but how to actually tell swift to do it is the tricky bit ;) 

don't know if this is helpful for figuring out a way to do so,
but i tried the following:

type file;
type Rscript;
type mxModel;

app (external min) mxModelProcessor(file covMatrix, Rscript
mxModProc, int modnum, float weight, string cond, int net)
{
        RInvoke @filename(mxModProc) @filename(covMatrix)
modnum weight cond net;
}

file
covMatrix<single_file_mapper;file=@strcat("matrices/4_reg/network1/speech.cov")>;
Rscript
mxScript<single_file_mapper;file=@strcat("scripts/dbtest.R")>;

external dbdone[];
int totalperms[] = [1:200];
float initweight = .5;
int net = 1;
foreach perm in totalperms{
        dbdone[perm] = mxModelProcessor(covMatrix, mxScript,
perm, initweight, "speech", net);
        trace(@dbdone[perm]);
 }

in order to test restart, i made the workflow die by deleting
the remote db table it's trying to access while the worflow
was still running. in this case, it looks like nothing is
written to the rlog (w/the exception of its timestamp).

the trace spits out something like this:

SwiftScript trace:
_concurrent/dbdone-d664f24e-673d-47e2-bd83-69027de4928a--array//elt-4
SwiftScript trace:
_concurrent/dbdone-d664f24e-673d-47e2-bd83-69027de4928a--array/h24//elt-124
SwiftScript trace:
_concurrent/dbdone-d664f24e-673d-47e2-bd83-69027de4928a--array/h9//elt-84
SwiftScript trace:
_concurrent/dbdone-d664f24e-673d-47e2-bd83-69027de4928a--array//elt-12

...

swift does print a successful 'stage out' for the jobs that
successfully completed. 

again, i'm not sure if this is helpful, but thought it was
worth sharing...log attached. 

~sk

---- Original message ----
>Date: Wed, 23 Sep 2009 14:48:01 -0500
>From: Mihael Hategan <hategan at mcs.anl.gov>  
>Subject: Re: [Swift-devel] trouble resuming  
>To: skenny at uchicago.edu
>Cc: Michael Andric <andric at uchicago.edu>,
swift-user at ci.uchicago.edu, swift-devel at ci.uchicago.edu
>
>On Wed, 2009-09-23 at 02:55 -0500, skenny at uchicago.edu wrote:
>> i think the main issue is that the rlog only contains
>> thread id's/mappings for files and not externals (even if
>> that's all you return). 
>> 
>> e.g. the rlog will contain something like: 
>> 
>> null.!unmapped
>> null.!unmapped
>> null.!unmapped
>> null.!unmapped
>> null.!unmapped
>> 
>> ... 
>> 
>> if externals could be logged, i think the code below would
>> still need to have loop_query return its external in order for
>> that to work properly...regardless though, i don't *think*
>> jobs relying entirely on externals can be resumed in swift,
>> but maybe mihael will tell me i'm wrong and that there's a
>> magical solution ;)
>> 
>
>I can't so far see anything major that would prevent
externals from
>keeping consistency on a run. Externals are a way to tell
swift that the
>data management for certain data shouldn't be done by swift.
Assuming
>that said data management is done "properly", it is
equivalent to swift
>doing it.
>
>So yeah, I think you might be wrong there :)
>
>Now, the implementation, that's another story. I'll have to
look into
>that.
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: semtest-20090925-1316-o4co0x47.log
Type: application/octet-stream
Size: 2475528 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20090925/3c9830a7/attachment.obj>


More information about the Swift-user mailing list