From marcin at galton.uchicago.edu Sun Sep 20 22:17:58 2009 From: marcin at galton.uchicago.edu (Marcin Hitczenko) Date: Sun, 20 Sep 2009 22:17:58 -0500 (CDT) Subject: [Swift-user] swift and R Message-ID: <3793.207.181.247.22.1253503078.squirrel@galton.uchicago.edu> Hi, I am trying to use swift in order to run an R script on various datasets on jazz. For each run, I would like to return some output by writing data to a file. I have tried to do this in two ways, and while one works I was told this was a potentially risky way of doing things. What I try to do is to feed all relevant filenames to R (location of data and location of output file) through swift as described on http://www.ci.uchicago.edu/~erin/rguide/rforswift.php. However, I run into two problems. Swift won't run because it complains that the file to which I want to write the data does not exist (which is true, but I would like to not have to create these files beforehand). However, even when I create the (empty) file, nothing gets written to it. I check the R log file and it seems that the data is written to something, but when I check the file I created, it is still empty. The method that works is for me to just feed the datafile to R via swift and within the R program itself declare the location to write to. Like I said before, I was told this was not ideal, but I am not sure how to fix the problems described above? Also, I noticed that a directory forms during the running of the swift code, which I thought used to automatically be removed (except when there was an error) at the completion of the run. This folder has not been disappearing at completion (based on .out file). Am I not remembering correctly that this should be a temporary directory? or am I doing something incorrect? Sorry for the long email, but I figured I should at least partially detail what I am trying to do/having problems with. Thank you for your time and effort. Best, Marcin From HodgessE at uhd.edu Mon Sep 21 12:48:22 2009 From: HodgessE at uhd.edu (Hodgess, Erin) Date: Mon, 21 Sep 2009 12:48:22 -0500 Subject: [Swift-user] swift and R References: <3793.207.181.247.22.1253503078.squirrel@galton.uchicago.edu> Message-ID: <70A5AC06FDB5E54482D19E1C04CDFCF307C375FC@BALI.uhd.campus> Hi Marcin! Could you send out the swift and R files, please? Thanks, Erin Erin M. Hodgess, PhD Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: hodgesse at uhd.edu -----Original Message----- From: swift-user-bounces at ci.uchicago.edu on behalf of Marcin Hitczenko Sent: Sun 9/20/2009 10:17 PM To: swift-user at ci.uchicago.edu Subject: [Swift-user] swift and R Hi, I am trying to use swift in order to run an R script on various datasets on jazz. For each run, I would like to return some output by writing data to a file. I have tried to do this in two ways, and while one works I was told this was a potentially risky way of doing things. What I try to do is to feed all relevant filenames to R (location of data and location of output file) through swift as described on http://www.ci.uchicago.edu/~erin/rguide/rforswift.php. However, I run into two problems. Swift won't run because it complains that the file to which I want to write the data does not exist (which is true, but I would like to not have to create these files beforehand). However, even when I create the (empty) file, nothing gets written to it. I check the R log file and it seems that the data is written to something, but when I check the file I created, it is still empty. The method that works is for me to just feed the datafile to R via swift and within the R program itself declare the location to write to. Like I said before, I was told this was not ideal, but I am not sure how to fix the problems described above? Also, I noticed that a directory forms during the running of the swift code, which I thought used to automatically be removed (except when there was an error) at the completion of the run. This folder has not been disappearing at completion (based on .out file). Am I not remembering correctly that this should be a temporary directory? or am I doing something incorrect? Sorry for the long email, but I figured I should at least partially detail what I am trying to do/having problems with. Thank you for your time and effort. Best, Marcin _______________________________________________ Swift-user mailing list Swift-user at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-user -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Mon Sep 21 13:07:03 2009 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 21 Sep 2009 13:07:03 -0500 Subject: [Swift-user] swift and R In-Reply-To: <3793.207.181.247.22.1253503078.squirrel@galton.uchicago.edu> References: <3793.207.181.247.22.1253503078.squirrel@galton.uchicago.edu> Message-ID: <1253556423.4309.5.camel@localhost> On Sun, 2009-09-20 at 22:17 -0500, Marcin Hitczenko wrote: > However, I run into two problems. Swift won't run because it complains > that the file to which I want to write the data does not exist (which is > true, but I would like to not have to create these files beforehand). I suspect swift complains that the R invocation did not generate the file it should have. Can you paste the exact error message? > However, even when I create the (empty) file, nothing gets written to it. > I check the R log file and it seems that the data is written to something, > but when I check the file I created, it is still empty. > > The method that works is for me to just feed the datafile to R via swift > and within the R program itself declare the location to write to. Like I > said before, I was told this was not ideal, but I am not sure how to fix > the problems described above? > > Also, I noticed that a directory forms during the running of the swift > code, More than one directory is created during a swift run. Which one are you referring to (i.e. paste the exact pathname)? Mihael From andric at uchicago.edu Mon Sep 21 16:28:28 2009 From: andric at uchicago.edu (Michael Andric) Date: Mon, 21 Sep 2009 16:28:28 -0500 Subject: [Swift-user] trouble resuming Message-ID: I'm having trouble resuming swift-jobs. When resuming, it goes through 'Initializing' every single job in the workflow and just finishes without actually picking up where it left off. Below is the swift script. Thanks Michael ## type declarations: type file{} type Rscript; ## Mediator app declaration: app (external turn) run_query (string med_args, file config, Rscript code, file Annot){ Mediator med_args @filename(code) @filename(Annot); } ## this process sets parameters and calls Mediator: loop_query(int vert, string user, string db, string host, string query_outline, Rscript code, file config, string subject, string h, int beginTS, int endTS, file Annot){ string outPrefix = @strcat("gest_vs_nogest_vert",vert,h); string med_args = @strcat("--user ","andric"," --conf ", @filename(config)," --db ", db," --host ", host, " --vox ", vert," --subject ", subject," --subquery tsTSVAR"," --begin_ts ",beginTS," --end_ts ",endTS, " --query ", query_outline," --r_swift_args ",outPrefix," ",vert," ",h," ",subject, " --outprefix ", "FAH_Q", " --r_script ", at filename(code)); external turnpt = run_query(med_args, config, code, Annot); } ## needed parameters to use Mediator: string user = @arg("user"); string db = "HEL"; string host = "tp-neurodb.ci.uchicago.edu"; file config; ## mapping the R code: Rscript code; file Annot; ## variables to move across in the foreach loops: string declarelist[] = ["ss2"]; string hemilist[] = ["rh"]; int vertices[] = [1:155991:1]; #int vertices[] = [0:1:1]; foreach subject in declarelist{ foreach h in hemilist{ int beginTS = 0; int endTS = 1254; string query_outline = @strcat("SELECT SUBQUERY FROM ",subject,"TS_data",h," WHERE subject = '",subject,"' AND vertex=VOX"); foreach vert in vertices{ loop_query(vert, user, db, host, query_outline, code, config, subject, h, beginTS, endTS, Annot); } } } -------------- next part -------------- An HTML attachment was scrubbed... URL: From skenny at uchicago.edu Wed Sep 23 02:55:55 2009 From: skenny at uchicago.edu (skenny at uchicago.edu) Date: Wed, 23 Sep 2009 02:55:55 -0500 (CDT) Subject: [Swift-user] Re: [Swift-devel] trouble resuming Message-ID: <20090923025555.CCT43329@m4500-02.uchicago.edu> i think the main issue is that the rlog only contains thread id's/mappings for files and not externals (even if that's all you return). e.g. the rlog will contain something like: null.!unmapped null.!unmapped null.!unmapped null.!unmapped null.!unmapped ... if externals could be logged, i think the code below would still need to have loop_query return its external in order for that to work properly...regardless though, i don't *think* jobs relying entirely on externals can be resumed in swift, but maybe mihael will tell me i'm wrong and that there's a magical solution ;) ~sk ---- Original message ---- >Date: Mon, 21 Sep 2009 16:28:28 -0500 >From: Michael Andric >Subject: [Swift-devel] trouble resuming >To: swift-user at ci.uchicago.edu, swift-devel at ci.uchicago.edu > > I'm having trouble resuming swift-jobs. ?When > resuming, it goes through 'Initializing' every > single job in the workflow and just finishes without > actually picking up where it left off. ?Below is > the swift script.? > Thanks > Michael? > ## type declarations: > type file{} > type Rscript; > ## Mediator app declaration: > app (external turn) run_query (string med_args, file > config, Rscript code, file Annot){ > ?? ?Mediator med_args @filename(code) > @filename(Annot); > } > ## this process sets parameters and calls Mediator: > loop_query(int vert, string user, string db, string > host, string query_outline, Rscript code, file > config, string subject, string h, int beginTS, int > endTS, file Annot){ > ?? ?string outPrefix = > @strcat("gest_vs_nogest_vert",vert,h); > ?? ?string med_args = @strcat("--user > ","andric"," --conf ", @filename(config)," --db ", > db," --host ", host, > ?? ? ? ?" --vox ", vert," --subject ", > subject," --subquery tsTSVAR"," --begin_ts > ",beginTS," --end_ts ",endTS, > ?? ? ? ?" --query ", query_outline," > --r_swift_args ",outPrefix," ",vert," ",h," > ",subject, " --outprefix ", "FAH_Q", " --r_script > ", at filename(code)); > ?? ?external turnpt = run_query(med_args, config, > code, Annot); > } > ## needed parameters to use Mediator: > string user = @arg("user"); > string db = "HEL"; > string host = "tp-neurodb.ci.uchicago.edu"; > file config; > ## mapping the R code: > Rscript code file="Rturning/turnchi_ss2.R">; > file Annot file="Rturning/resampled_coding_CarStory.txt">; > ## variables to move across in the foreach loops: > string declarelist[] = ["ss2"]; > string hemilist[] = ["rh"]; > int vertices[] = [1:155991:1]; > #int vertices[] = [0:1:1]; > foreach subject in declarelist{ > ?? ?foreach h in hemilist{ > ?? ? ? ?int beginTS = 0; > ?? ? ? ?int endTS = 1254; > ?? ? ? ?string query_outline = @strcat("SELECT > SUBQUERY FROM ",subject,"TS_data",h," WHERE subject > = '",subject,"' AND vertex=VOX"); > ?? ? ? ?foreach vert in vertices{ > ?? ? ? ? ? ?loop_query(vert, user, db, host, > query_outline, code, config, subject, h, beginTS, > endTS, Annot); > ?? ? ? ?} > ?? ?} > } >________________ >_______________________________________________ >Swift-devel mailing list >Swift-devel at ci.uchicago.edu >http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From hategan at mcs.anl.gov Wed Sep 23 14:48:01 2009 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 23 Sep 2009 14:48:01 -0500 Subject: [Swift-user] Re: [Swift-devel] trouble resuming In-Reply-To: <20090923025555.CCT43329@m4500-02.uchicago.edu> References: <20090923025555.CCT43329@m4500-02.uchicago.edu> Message-ID: <1253735281.381.3.camel@localhost> On Wed, 2009-09-23 at 02:55 -0500, skenny at uchicago.edu wrote: > i think the main issue is that the rlog only contains > thread id's/mappings for files and not externals (even if > that's all you return). > > e.g. the rlog will contain something like: > > null.!unmapped > null.!unmapped > null.!unmapped > null.!unmapped > null.!unmapped > > ... > > if externals could be logged, i think the code below would > still need to have loop_query return its external in order for > that to work properly...regardless though, i don't *think* > jobs relying entirely on externals can be resumed in swift, > but maybe mihael will tell me i'm wrong and that there's a > magical solution ;) > I can't so far see anything major that would prevent externals from keeping consistency on a run. Externals are a way to tell swift that the data management for certain data shouldn't be done by swift. Assuming that said data management is done "properly", it is equivalent to swift doing it. So yeah, I think you might be wrong there :) Now, the implementation, that's another story. I'll have to look into that. From skenny at uchicago.edu Fri Sep 25 13:30:14 2009 From: skenny at uchicago.edu (skenny at uchicago.edu) Date: Fri, 25 Sep 2009 13:30:14 -0500 (CDT) Subject: [Swift-user] Re: [Swift-devel] trouble resuming Message-ID: <20090925133014.CCW92280@m4500-02.uchicago.edu> ok, i see what you're saying...it's 'theoretically' possible, but how to actually tell swift to do it is the tricky bit ;) don't know if this is helpful for figuring out a way to do so, but i tried the following: type file; type Rscript; type mxModel; app (external min) mxModelProcessor(file covMatrix, Rscript mxModProc, int modnum, float weight, string cond, int net) { RInvoke @filename(mxModProc) @filename(covMatrix) modnum weight cond net; } file covMatrix; Rscript mxScript; external dbdone[]; int totalperms[] = [1:200]; float initweight = .5; int net = 1; foreach perm in totalperms{ dbdone[perm] = mxModelProcessor(covMatrix, mxScript, perm, initweight, "speech", net); trace(@dbdone[perm]); } in order to test restart, i made the workflow die by deleting the remote db table it's trying to access while the worflow was still running. in this case, it looks like nothing is written to the rlog (w/the exception of its timestamp). the trace spits out something like this: SwiftScript trace: _concurrent/dbdone-d664f24e-673d-47e2-bd83-69027de4928a--array//elt-4 SwiftScript trace: _concurrent/dbdone-d664f24e-673d-47e2-bd83-69027de4928a--array/h24//elt-124 SwiftScript trace: _concurrent/dbdone-d664f24e-673d-47e2-bd83-69027de4928a--array/h9//elt-84 SwiftScript trace: _concurrent/dbdone-d664f24e-673d-47e2-bd83-69027de4928a--array//elt-12 ... swift does print a successful 'stage out' for the jobs that successfully completed. again, i'm not sure if this is helpful, but thought it was worth sharing...log attached. ~sk ---- Original message ---- >Date: Wed, 23 Sep 2009 14:48:01 -0500 >From: Mihael Hategan >Subject: Re: [Swift-devel] trouble resuming >To: skenny at uchicago.edu >Cc: Michael Andric , swift-user at ci.uchicago.edu, swift-devel at ci.uchicago.edu > >On Wed, 2009-09-23 at 02:55 -0500, skenny at uchicago.edu wrote: >> i think the main issue is that the rlog only contains >> thread id's/mappings for files and not externals (even if >> that's all you return). >> >> e.g. the rlog will contain something like: >> >> null.!unmapped >> null.!unmapped >> null.!unmapped >> null.!unmapped >> null.!unmapped >> >> ... >> >> if externals could be logged, i think the code below would >> still need to have loop_query return its external in order for >> that to work properly...regardless though, i don't *think* >> jobs relying entirely on externals can be resumed in swift, >> but maybe mihael will tell me i'm wrong and that there's a >> magical solution ;) >> > >I can't so far see anything major that would prevent externals from >keeping consistency on a run. Externals are a way to tell swift that the >data management for certain data shouldn't be done by swift. Assuming >that said data management is done "properly", it is equivalent to swift >doing it. > >So yeah, I think you might be wrong there :) > >Now, the implementation, that's another story. I'll have to look into >that. > -------------- next part -------------- A non-text attachment was scrubbed... Name: semtest-20090925-1316-o4co0x47.log Type: application/octet-stream Size: 2475528 bytes Desc: not available URL: From hategan at mcs.anl.gov Fri Sep 25 13:57:43 2009 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 25 Sep 2009 13:57:43 -0500 Subject: [Swift-user] Re: [Swift-devel] trouble resuming In-Reply-To: <20090925133014.CCW92280@m4500-02.uchicago.edu> References: <20090925133014.CCW92280@m4500-02.uchicago.edu> Message-ID: <1253905063.1765.2.camel@localhost> On Fri, 2009-09-25 at 13:30 -0500, skenny at uchicago.edu wrote: > ok, i see what you're saying...it's 'theoretically' possible, > but how to actually tell swift to do it is the tricky bit ;) Right. I suspect the problem is that external variables don't have mappers that implement things properly. I'd file a bug report. > > in order to test restart, i made the workflow die by deleting > the remote db table it's trying to access while the worflow > was still running. If you mess with intermediate data, whether external or not, even if swift resumes, things are going to be in an inconsistent state. > in this case, it looks like nothing is > written to the rlog (w/the exception of its timestamp). Right. Things are only rlogged when the application is successful.