From tiberius at ci.uchicago.edu Tue May 6 09:33:47 2008 From: tiberius at ci.uchicago.edu (Tiberiu Stef-Praun) Date: Tue, 6 May 2008 09:33:47 -0500 Subject: [Swift-user] Workflow handling a large parallelism ? Message-ID: Hi I need to build a fairly simple workflow (fan-in structure) that has a relatively large number of parallel tasks (~1000). Can you help me with solution smarter than the one below ? //note, code is not debugged, will probably not run as is, however, this is how I would code it up in swift type file{} (file simOut)run_sim(int n){ app{ run_sim n stdout=@filename(simOut) ; } } (file simMerged)merge_sim(file simFiles[]){ app{ cat @filenames(simFiles) ; } } (file simFiles[]) batch_sim (){ int simRange = [1:1000]; forach i in simRange { simFiles[i]=run_sim(i); } } //I am concerned about this, I would like to be able to generate the filenames in swift, //not to be forced to list all the names string filenames[] = [ "sim_000",'sim_001", ... "sim_999" ]; file simOutputs[] ; filenames = batch_sim(); file mergedSim; mergedSim = merge_sim(simOutputs); -- Tiberiu (Tibi) Stef-Praun, PhD Research Staff, Computation Institute 5640 S. Ellis Ave, #405 University of Chicago http://www-unix.mcs.anl.gov/~tiberius/ From benc at hawaga.org.uk Tue May 6 09:48:33 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 6 May 2008 14:48:33 +0000 (GMT) Subject: [Swift-user] Workflow handling a large parallelism ? In-Reply-To: References: Message-ID: > (file simMerged)merge_sim(file simFiles[]){ > app{ cat @filenames(simFiles) ; } > } send stdout into simMerged probably. > (file simFiles[]) batch_sim (){ > > int simRange = [1:1000]; > forach i in simRange { > simFiles[i]=run_sim(i); > } > } > > //I am concerned about this, I would like to be able to generate the > filenames in swift, > //not to be forced to list all the names > > string filenames[] = [ "sim_000",'sim_001", ... "sim_999" ]; > file simOutputs[] ; Use simple_mapper perhaps like this: file simOutputs[] ; I think you'll get 4 digit numbers that way, but perhaps you can cope with that. -- From uhasson at uchicago.edu Tue May 6 17:43:13 2008 From: uhasson at uchicago.edu (Uri Hasson) Date: Tue, 6 May 2008 17:43:13 -0500 (CDT) Subject: [Swift-user] format of multi lined functions in SWIFT Message-ID: <20080506174313.AZT54213@m4500-03.uchicago.edu> Hello all, I would like to use SWIFT as a wrapper for a deconvolution software we use. A typical script is multi lined and when in shell, we use "\". If I wanted to define a deconvolve function in swift, I would want to pass to it 12 parameters. Should give a carriage return at the end of each line, use "\", or any other escape character? () my function () { app { line 1 continue line 1 end line 1 } } Here is how the command line looks in a shell script: 3dDeconvolve -jobs 4 -input $input_dir/runs_temp/scaled_run-$counter+orig -polort 3 -num_stimts 12 \ -stim_file 1 $current_dir/$x'[0]' -stim_label 1 classaction \ -stim_file 2 $current_dir/$x'[1]' -stim_label 2 verifyaction \ -stim_file 3 $current_dir/$x'[2]' -stim_label 3 classobj \ -stim_file 4 $current_dir/$x'[3]' -stim_label 4 verifyobj \ -stim_file 5 $current_dir/$x'[4]' -stim_label 5 testclass \ -stim_file 6 $current_dir/$x'[5]' -stim_label 6 testver \ -stim_file 7 $input_dir/runbyrun.motion.$counter'[1]' -stim_base 7 -stim_label 7 roll \ -stim_file 8 $input_dir/runbyrun.motion.$counter'[2]' -stim_base 8 -stim_label 8 pitch \ -stim_file 9 $input_dir/runbyrun.motion.$counter'[3]' -stim_base 9 -stim_label 9 yaw \ -stim_file 10 $input_dir/runbyrun.motion.$counter'[4]' -stim_base 10 -stim_label 10 dx \ -stim_file 11 $input_dir/runbyrun.motion.$counter'[5]' -stim_base 11 -stim_label 11 dy \ -stim_file 12 $input_dir/runbyrun.motion.$counter'[6]' -stim_base 12 -stim_label 12 dz \ -stim_maxlag 1 17 -stim_maxlag 2 17 -stim_maxlag 3 17 -stim_maxlag 4 17 -stim_maxlag 5 17 -stim_maxlag 6 17 \ -censor ./censor.prep/run"$counter".censor \ -tout -bout -nofullf_atall -nodmbase -xsave -nfirst 0 \ -iresp 1 classaction.irf.run$counter -iresp 2 verifyaction.irf.run$counter -iresp 3 classobj.irf.run$counter -iresp 4 verifyobj.irf.run$counter \ -bucket Ss5.deconmean.reg1strunCensor.run_$counter From benc at hawaga.org.uk Wed May 7 05:29:24 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 7 May 2008 10:29:24 +0000 (GMT) Subject: [Swift-user] Re: format of multi lined functions in SWIFT In-Reply-To: <20080506174313.AZT54213@m4500-03.uchicago.edu> References: <20080506174313.AZT54213@m4500-03.uchicago.edu> Message-ID: > A typical script is multi lined and when in shell, we use "\". The multilinedness you show in your example command-line looks like its only for readability. I think you could equally well (behaviour wise) remove all the \newlines and have it all one line. If so, then you don't need to pass any new line characters through to the app. In Swift you can split parameters over multiple lines anyway, without needing any special separator character - the command is terminated when a ; is reached. So you can probably write something like this: myapp "-jobs" "4" "-input" "-num_stimts 12" "-stim_file" "1" "$current_dir/$x'[0]'" "-stim_label" "1" "classaction" . . . ; > "-stim_file" "2" "$current_dir/$x'[1]'" -stim_label 2 verifyaction \ > -stim_file 3 $current_dir/$x'[2]' -stim_label 3 classobj \ > -stim_file 4 $current_dir/$x'[3]' -stim_label 4 verifyobj \ > -stim_file 5 $current_dir/$x'[4]' -stim_label 5 testclass \ > -stim_file 6 $current_dir/$x'[5]' -stim_label 6 testver \ > -stim_file 7 $input_dir/runbyrun.motion.$counter'[1]' > -stim_base 7 -stim_label 7 roll \ > -stim_file 8 $input_dir/runbyrun.motion.$counter'[2]' > -stim_base 8 -stim_label 8 pitch \ > -stim_file 9 $input_dir/runbyrun.motion.$counter'[3]' > -stim_base 9 -stim_label 9 yaw \ > -stim_file 10 $input_dir/runbyrun.motion.$counter'[4]' > -stim_base 10 -stim_label 10 dx \ > -stim_file 11 $input_dir/runbyrun.motion.$counter'[5]' > -stim_base 11 -stim_label 11 dy \ > -stim_file 12 $input_dir/runbyrun.motion.$counter'[6]' > -stim_base 12 -stim_label 12 dz \ > -stim_maxlag 1 17 -stim_maxlag 2 17 -stim_maxlag 3 17 > -stim_maxlag 4 17 -stim_maxlag 5 17 -stim_maxlag 6 17 \ > -censor ./censor.prep/run"$counter".censor \ > -tout -bout -nofullf_atall -nodmbase -xsave -nfirst 0 \ > -iresp 1 classaction.irf.run$counter -iresp 2 > verifyaction.irf.run$counter -iresp 3 classobj.irf.run$counter > -iresp 4 verifyobj.irf.run$counter \ > -bucket Ss5.deconmean.reg1strunCensor.run_$counter > From tiberius at ci.uchicago.edu Wed May 7 09:44:59 2008 From: tiberius at ci.uchicago.edu (Tiberiu Stef-Praun) Date: Wed, 7 May 2008 09:44:59 -0500 Subject: [Swift-user] Workflow handling a large parallelism ? In-Reply-To: References: Message-ID: This worked, thank you ! The simple_mapper is a bit unintuitive, I would call it sequential_file_mapper or something like that, and remove the limitation that file names get only 4 digits generated automatically (and zero-padded). Tibi On Tue, May 6, 2008 at 9:48 AM, Ben Clifford wrote: > > > (file simMerged)merge_sim(file simFiles[]){ > > app{ cat @filenames(simFiles) ; } > > } > > send stdout into simMerged probably. > > > > (file simFiles[]) batch_sim (){ > > > > int simRange = [1:1000]; > > forach i in simRange { > > simFiles[i]=run_sim(i); > > } > > } > > > > //I am concerned about this, I would like to be able to generate the > > filenames in swift, > > //not to be forced to list all the names > > > > string filenames[] = [ "sim_000",'sim_001", ... "sim_999" ]; > > file simOutputs[] ; > > Use simple_mapper perhaps like this: > > file simOutputs[] ; > > I think you'll get 4 digit numbers that way, but perhaps you can cope with > that. > > -- > > -- Tiberiu (Tibi) Stef-Praun, PhD Research Staff, Computation Institute 5640 S. Ellis Ave, #405 University of Chicago http://www-unix.mcs.anl.gov/~tiberius/ From benc at hawaga.org.uk Wed May 7 10:45:47 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 7 May 2008 15:45:47 +0000 (GMT) Subject: [Swift-user] Workflow handling a large parallelism ? In-Reply-To: References: Message-ID: On Wed, 7 May 2008, Tiberiu Stef-Praun wrote: > This worked, thank you ! > The simple_mapper is a bit unintuitive, I would call it > sequential_file_mapper or something like that no more renaming! > and remove the limitation that file names get only 4 digits generated > automatically (and zero-padded). Perhaps precision parameters that allow: padding to be specified (and to what width) or no padding. So you might say padding=4 to get 4 digits (the present behaviour) or padding=0 to get no padding at all. This is listed as bug 139 now. This is only necessary for output files. For input files, the names already exist, though there is a bug open about some other aspect of this causing a problem (bug 116). -- From jamalphd at gmail.com Sun May 11 23:28:54 2008 From: jamalphd at gmail.com (J A) Date: Mon, 12 May 2008 00:28:54 -0400 Subject: [Swift-user] format of multi lined functions in SWIFT In-Reply-To: <20080506174313.AZT54213@m4500-03.uchicago.edu> References: <20080506174313.AZT54213@m4500-03.uchicago.edu> Message-ID: Hi All:: I am new to Swift and have several questions 1. Can i run swift in a windows application? if yes, how? 2. Can i call other code (written in another language) from swift? 3. How can swift work with PBS? Thanks, Jamal -------------- next part -------------- An HTML attachment was scrubbed... URL: From jamalphd at gmail.com Sun May 11 23:29:43 2008 From: jamalphd at gmail.com (J A) Date: Mon, 12 May 2008 00:29:43 -0400 Subject: [Swift-user] Questions Message-ID: > > Hi All: > > I am new to Swift and have several questions > > 1. Can i run swift in a windows application? if yes, how? > 2. Can i call other code (written in another language) from swift? > 3. How can swift work with PBS? > > > Thanks, > Jamal > -------------- next part -------------- An HTML attachment was scrubbed... URL: From benc at hawaga.org.uk Mon May 12 03:52:01 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 12 May 2008 08:52:01 +0000 (GMT) Subject: [Swift-user] Questions In-Reply-To: References: Message-ID: On Mon, 12 May 2008, J A wrote: > > 1. Can i run swift in a windows application? if yes, how? You probably can run Swift on Windows as a client, but you will need a unix box to perform the actual execution of work. > > 2. Can i call other code (written in another language) from swift? Swift calls out to unix executables; if you can compile your code into a unix executable, then you can call your code that way. It will not be a JNI / function level link. It will be using Swift's file transfer+execute mechanism for calling executables: input data files are copied to whichever machine you will run on your code is executed output data files are copied back > > 3. How can swift work with PBS? There are a couple of mechanisms. If you are running on the head node of a PBS cluster, you can use the PBS execution provider. If you are submitting to a cluster which has Globus GRAM and PBS both instaleld, then you can use one of the GRAM execution providers. The choice of provider is specified in the site catalog, libexec/sites.xml. The user guide contains a section '16. The site catalog' which gives some details about configuring this. -- From jamalphd at gmail.com Mon May 12 17:38:03 2008 From: jamalphd at gmail.com (J A) Date: Mon, 12 May 2008 18:38:03 -0400 Subject: [Swift-user] Questions In-Reply-To: References: Message-ID: Hi Ben: Thanks for your reply. Does any one have an example on how to do item # 2 below? Thanks for your cooperation. Jamal On 5/12/08, Ben Clifford wrote: > > > On Mon, 12 May 2008, J A wrote: > > > > 1. Can i run swift in a windows application? if yes, how? > > You probably can run Swift on Windows as a client, but you will need a > unix box to perform the actual execution of work. > > > > 2. Can i call other code (written in another language) from swift? > > Swift calls out to unix executables; if you can compile your code into a > unix executable, then you can call your code that way. > > It will not be a JNI / function level link. It will be using Swift's file > transfer+execute mechanism for calling executables: > input data files are copied to whichever machine you will run on > your code is executed > output data files are copied back > > > > 3. How can swift work with PBS? > > There are a couple of mechanisms. If you are running on the head node of a > PBS cluster, you can use the PBS execution provider. If you are submitting > to a cluster which has Globus GRAM and PBS both instaleld, then you can > use one of the GRAM execution providers. The choice of provider is > specified in the site catalog, libexec/sites.xml. > > The user guide contains a section '16. The site catalog' which gives some > details about configuring this. > > -- > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From benc at hawaga.org.uk Tue May 13 11:09:07 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 13 May 2008 16:09:07 +0000 (GMT) Subject: [Swift-user] Questions In-Reply-To: References: Message-ID: > > > > 2. Can i call other code (written in another language) from swift? > > Swift calls out to unix executables; if you can compile your code into a > > unix executable, then you can call your code that way. > > It will not be a JNI / function level link. It will be using Swift's file > > transfer+execute mechanism for calling executables: > > input data files are copied to whichever machine you will run on > > your code is executed > > output data files are copied back > Does any one have an example on how to do item # 2 below? In section 3.4 of the tutorial at http://www.ci.uchicago.edu/swift/guides/tutorial.php there is an example of one procedure, greeting(), outputing some data (a text message) to a file and another procedure, countwords(), reading that data in and counting the words. If you are familiar with shell, the example is basically doing: echo hello from swift > q13greeting.txt wc -w q13greeting.txt Each of those two command lines can be run in different places and at different times by swift (and, if they didn't have a data dependency of q13greeting.txt between them, potentially in parallel). So all the pieces of your code that you want Swift to be able to execute separately, you need to make into its own separate command line program, like echo or wc above (but presumably much larger - for example, minutes of computation). -- From iraicu at cs.uchicago.edu Tue May 13 19:45:18 2008 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Tue, 13 May 2008 19:45:18 -0500 Subject: [Swift-user] Falkon and Swift talk at GlobusWorld08 Message-ID: <482A361E.60101@cs.uchicago.edu> Hi all, In case any of you are at GlobusWorld08 in Oakland California this week, I just wanted to point out that I will be giving a short talk tomorrow on Swift and Falkon. My slides are at http://people.cs.uchicago.edu/~iraicu/presentations/2008_Falkon_Swift_GlobusWorld08_5-14-08.pdf. If any of you are at the conference, it would be great to have you in the audience! Cheers, Ioan -- =================================================== Ioan Raicu Ph.D. Candidate =================================================== Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 =================================================== Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dev.globus.org/wiki/Incubator/Falkon http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page =================================================== =================================================== From benc at hawaga.org.uk Wed May 21 13:41:44 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 21 May 2008 18:41:44 +0000 (GMT) Subject: [Swift-user] does anyone use vdlc? Message-ID: At present there is a command bin/vdlc in the distribution that compiles but does not execute a workflow. I use it in some of the regression tests (the ones in the tests/language/ directory). Does anyone use it? I'd like to simplify it (perhaps remove it) in order to remove unused functionality; but I do not want to remove functionality that people actually use. -- From jamalphd at gmail.com Thu May 22 11:56:36 2008 From: jamalphd at gmail.com (J A) Date: Thu, 22 May 2008 12:56:36 -0400 Subject: [Swift-user] does anyone use vdlc? In-Reply-To: References: Message-ID: Hi All: I am trying to use Swift with PBS. I configured sites.xml based on the user guide but when i submit a job, it finishes execution but no output produced. I am executing first.swift workflow. I was reading the documents and came across some instructions that talk about chicago_sites.xml as a reference on how to configure sites.xml to use pbs but i couldn't find this file. Any help on how to setup swift with pbs will be really appreciated. Thanks, Jamal -------------- next part -------------- An HTML attachment was scrubbed... URL: From benc at hawaga.org.uk Fri May 23 12:07:21 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 23 May 2008 17:07:21 +0000 (GMT) Subject: [Swift-user] pbs In-Reply-To: References: Message-ID: On Thu, 22 May 2008, J A wrote: > I am trying to use Swift with PBS. I configured sites.xml based on the user > guide but when i submit a job, it finishes execution but no output produced. > I am executing first.swift workflow. What do you have in sites.xml? Have you modified first.swift? Are you trying to submit directly to PBS or through GRAM? Please put a log file of a run online somewhere. Here is an example direct PBS submission sites file: /home/benc -- From benc at hawaga.org.uk Sun May 25 05:32:29 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Sun, 25 May 2008 10:32:29 +0000 (GMT) Subject: [Swift-user] what is wrong with restart5.swift? Message-ID: Look at the below code and tell me if you think it should work or not (i.e. if its buggy or not). It fails for me with a 'multiple mappings point to the same file' error. helperA, B, and C all write the word 'foo' into the filename passed as argument (so most of the intermediate data ends up getting ignored deliberately). (I came up with this whilst working on testing for concurrent mapper and restarts, but it fails irrespective of restarts when I think it should work; however the reason for failure does not leap out at me) type file; (file t) a(file i) { app { helperA @filename(t); } } (file t) b(file i) { app { helperB @filename(t); } } (file t) c(file i) { app { helperC @filename(t); } } (file r) q(file i, int n) { file t; switch(n) { case 1: t=a(i); r=c(t); case 2: t=b(i); r=c(t); case 3: t=c(i); r=c(t); } } file J <"restart.in">; file X[]; file Y[]; foreach i in [1:3] { X[i] = q(J,i); } foreach x,j in X { Y[j] = q(x,j); } From iraicu at cs.uchicago.edu Sun May 25 08:08:08 2008 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Sun, 25 May 2008 08:08:08 -0500 Subject: [Swift-user] [Fwd: Falkon v1.0 release] Message-ID: <483964B8.2010402@cs.uchicago.edu> Hi Swift community, Just wanted to pass announcement on Falkon along. Cheers, Ioan -------------- next part -------------- An embedded message was scrubbed... From: Ioan Raicu Subject: Falkon v1.0 release Date: Sun, 25 May 2008 08:06:35 -0500 Size: 8771 URL: From hategan at mcs.anl.gov Sun May 25 11:56:44 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 25 May 2008 11:56:44 -0500 Subject: [Swift-user] what is wrong with restart5.swift? In-Reply-To: References: Message-ID: <1211734604.10072.0.camel@localhost> It doesn't look like it should fail. On Sun, 2008-05-25 at 10:32 +0000, Ben Clifford wrote: > Look at the below code and tell me if you think it should work or not > (i.e. if its buggy or not). It fails for me with a 'multiple mappings > point to the same file' error. > > helperA, B, and C all write the word 'foo' into the filename passed as > argument (so most of the intermediate data ends up getting ignored > deliberately). > > (I came up with this whilst working on testing for concurrent mapper and > restarts, but it fails irrespective of restarts when I think it should > work; however the reason for failure does not leap out at me) > > type file; > > (file t) a(file i) { > app { > helperA @filename(t); > } > } > > (file t) b(file i) { > app { > helperB @filename(t); > } > } > > (file t) c(file i) { > app { > helperC @filename(t); > } > } > > (file r) q(file i, int n) { > file t; > switch(n) { > case 1: t=a(i); r=c(t); > case 2: t=b(i); r=c(t); > case 3: t=c(i); r=c(t); > } > } > > file J <"restart.in">; > > file X[]; > > file Y[]; > > foreach i in [1:3] { > X[i] = q(J,i); > } > > foreach x,j in X { > Y[j] = q(x,j); > } > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user From hategan at mcs.anl.gov Sun May 25 12:09:21 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 25 May 2008 12:09:21 -0500 Subject: [Swift-user] what is wrong with restart5.swift? In-Reply-To: <1211734604.10072.0.camel@localhost> References: <1211734604.10072.0.camel@localhost> Message-ID: <1211735361.10072.2.camel@localhost> Looks like the channel for the second iteration is getting all values twice. On Sun, 2008-05-25 at 11:56 -0500, Mihael Hategan wrote: > It doesn't look like it should fail. > > On Sun, 2008-05-25 at 10:32 +0000, Ben Clifford wrote: > > Look at the below code and tell me if you think it should work or not > > (i.e. if its buggy or not). It fails for me with a 'multiple mappings > > point to the same file' error. > > > > helperA, B, and C all write the word 'foo' into the filename passed as > > argument (so most of the intermediate data ends up getting ignored > > deliberately). > > > > (I came up with this whilst working on testing for concurrent mapper and > > restarts, but it fails irrespective of restarts when I think it should > > work; however the reason for failure does not leap out at me) > > > > type file; > > > > (file t) a(file i) { > > app { > > helperA @filename(t); > > } > > } > > > > (file t) b(file i) { > > app { > > helperB @filename(t); > > } > > } > > > > (file t) c(file i) { > > app { > > helperC @filename(t); > > } > > } > > > > (file r) q(file i, int n) { > > file t; > > switch(n) { > > case 1: t=a(i); r=c(t); > > case 2: t=b(i); r=c(t); > > case 3: t=c(i); r=c(t); > > } > > } > > > > file J <"restart.in">; > > > > file X[]; > > > > file Y[]; > > > > foreach i in [1:3] { > > X[i] = q(J,i); > > } > > > > foreach x,j in X { > > Y[j] = q(x,j); > > } > > > > > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user From hategan at mcs.anl.gov Sun May 25 12:18:48 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 25 May 2008 12:18:48 -0500 Subject: [Swift-user] what is wrong with restart5.swift? In-Reply-To: <1211735361.10072.2.camel@localhost> References: <1211734604.10072.0.camel@localhost> <1211735361.10072.2.camel@localhost> Message-ID: <1211735928.10072.5.camel@localhost> On Sun, 2008-05-25 at 12:09 -0500, Mihael Hategan wrote: > Looks like the channel for the second iteration is getting all values > twice. > Probably because these get closed twice: 2008-05-25 11:57:56,222 INFO CloseDataset Closing org.griphyn.vdl.mapping.DataNode identifier tag:benc at ci.uchicago.edu,2008:swift:dataset:20080525-1157-jp27tr6a:720000000012 with no value at dataset=X path=[3] (not closed) 2008-05-25 11:57:56,223 INFO CloseDataset Closing org.griphyn.vdl.mapping.DataNode identifier tag:benc at ci.uchicago.edu,2008:swift:dataset:20080525-1157-jp27tr6a:720000000012 with no value at dataset=X path=[3] (closed) From hategan at mcs.anl.gov Sun May 25 12:36:25 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 25 May 2008 12:36:25 -0500 Subject: [Swift-user] what is wrong with restart5.swift? In-Reply-To: <1211735928.10072.5.camel@localhost> References: <1211734604.10072.0.camel@localhost> <1211735361.10072.2.camel@localhost> <1211735928.10072.5.camel@localhost> Message-ID: <1211736985.10072.7.camel@localhost> On Sun, 2008-05-25 at 12:18 -0500, Mihael Hategan wrote: > On Sun, 2008-05-25 at 12:09 -0500, Mihael Hategan wrote: > > Looks like the channel for the second iteration is getting all values > > twice. > > > > Probably because these get closed twice: > Once at the end of one of the helpers, and once at the end of q. > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user From hategan at mcs.anl.gov Sun May 25 12:45:53 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 25 May 2008 12:45:53 -0500 Subject: [Swift-user] what is wrong with restart5.swift? In-Reply-To: <1211736985.10072.7.camel@localhost> References: <1211734604.10072.0.camel@localhost> <1211735361.10072.2.camel@localhost> <1211735928.10072.5.camel@localhost> <1211736985.10072.7.camel@localhost> Message-ID: <1211737553.10072.10.camel@localhost> On Sun, 2008-05-25 at 12:36 -0500, Mihael Hategan wrote: > On Sun, 2008-05-25 at 12:18 -0500, Mihael Hategan wrote: > > On Sun, 2008-05-25 at 12:09 -0500, Mihael Hategan wrote: > > > Looks like the channel for the second iteration is getting all values > > > twice. > > > > > > > Probably because these get closed twice: > > > > Once at the end of one of the helpers, and once at the end of q. Should be fixed in r2004. However, I do have a question. Given that such a usage scenario is not uncommon, how come we've never seen this before? > > > > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user From benc at hawaga.org.uk Sun May 25 13:03:34 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Sun, 25 May 2008 18:03:34 +0000 (GMT) Subject: [Swift-user] what is wrong with restart5.swift? In-Reply-To: <1211737553.10072.10.camel@localhost> References: <1211734604.10072.0.camel@localhost> <1211735361.10072.2.camel@localhost> <1211735928.10072.5.camel@localhost> <1211736985.10072.7.camel@localhost> <1211737553.10072.10.camel@localhost> Message-ID: On Sun, 25 May 2008, Mihael Hategan wrote: > However, I do have a question. Given that such a usage scenario is not > uncommon, how come we've never seen this before? I'm not sure how common two foreach loops in a row are in real usage. I think I'd tend to fold the two loops into a single block if I was doing something for real. Brief playing about seems to suggest it happens with foreach loops where the input array is generated over time but not from a range specification eg [1:100] or from an input dataset. Plus it might be a regression. -- From hategan at mcs.anl.gov Sun May 25 13:58:03 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 25 May 2008 13:58:03 -0500 Subject: [Swift-user] what is wrong with restart5.swift? In-Reply-To: References: <1211734604.10072.0.camel@localhost> <1211735361.10072.2.camel@localhost> <1211735928.10072.5.camel@localhost> <1211736985.10072.7.camel@localhost> <1211737553.10072.10.camel@localhost> Message-ID: <1211741883.21337.0.camel@localhost> Hmm. 07511 fails for me: expecting 07511-fixed-array-mapper-input.*.expected 07511-fixed-array-mapper-input.*.expected does not exist On Sun, 2008-05-25 at 18:03 +0000, Ben Clifford wrote: > On Sun, 25 May 2008, Mihael Hategan wrote: > > > However, I do have a question. Given that such a usage scenario is not > > uncommon, how come we've never seen this before? > > I'm not sure how common two foreach loops in a row are in real usage. I > think I'd tend to fold the two loops into a single block if I was doing > something for real. > > Brief playing about seems to suggest it happens with foreach loops where > the input array is generated over time but not from a range specification > eg [1:100] or from an input dataset. > > Plus it might be a regression. > From benc at hawaga.org.uk Sun May 25 14:03:40 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Sun, 25 May 2008 19:03:40 +0000 (GMT) Subject: [Swift-user] what is wrong with restart5.swift? In-Reply-To: <1211741883.21337.0.camel@localhost> References: <1211734604.10072.0.camel@localhost> <1211735361.10072.2.camel@localhost> <1211735928.10072.5.camel@localhost> <1211736985.10072.7.camel@localhost> <1211737553.10072.10.camel@localhost> <1211741883.21337.0.camel@localhost> Message-ID: On Sun, 25 May 2008, Mihael Hategan wrote: > Hmm. 07511 fails for me: > expecting 07511-fixed-array-mapper-input.*.expected > 07511-fixed-array-mapper-input.*.expected does not exist That's not a fail, though its a bad way to express success. It is saying that the test case has not expected output files, so no output is being checked; I'll rephrase the message to something better. (here's what it looks like for me) $ ./run 07511-fixed-array-mapper-input Removing files from previous runs Running test 07511-fixed-array-mapper-input Swift svn swift-r2004 (Swift modified locally) cog-r2023 RunID: 20080525-2000-qgh25dzc Progress: echo started echo started echo started echo completed echo completed echo completed Final status: Finished successfully:3 expecting 07511-fixed-array-mapper-input.*.expected 07511-fixed-array-mapper-input.*.expected does not exist ----------===========================---------- All language behaviour tests passed -- From hategan at mcs.anl.gov Sun May 25 14:06:53 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 25 May 2008 14:06:53 -0500 Subject: [Swift-user] what is wrong with restart5.swift? In-Reply-To: References: <1211734604.10072.0.camel@localhost> <1211735361.10072.2.camel@localhost> <1211735928.10072.5.camel@localhost> <1211736985.10072.7.camel@localhost> <1211737553.10072.10.camel@localhost> <1211741883.21337.0.camel@localhost> Message-ID: <1211742413.21466.3.camel@localhost> My bad. There was a failure, but somewhere else. And that was because I didn't re-compile swift. On Sun, 2008-05-25 at 19:03 +0000, Ben Clifford wrote: > On Sun, 25 May 2008, Mihael Hategan wrote: > > > Hmm. 07511 fails for me: > > expecting 07511-fixed-array-mapper-input.*.expected > > 07511-fixed-array-mapper-input.*.expected does not exist > > That's not a fail, though its a bad way to express success. > > It is saying that the test case has not expected output files, so no > output is being checked; I'll rephrase the message to something better. > > (here's what it looks like for me) > > $ ./run 07511-fixed-array-mapper-input > Removing files from previous runs > Running test 07511-fixed-array-mapper-input > Swift svn swift-r2004 (Swift modified locally) cog-r2023 > > RunID: 20080525-2000-qgh25dzc > Progress: > echo started > echo started > echo started > echo completed > echo completed > echo completed > Final status: Finished successfully:3 > expecting 07511-fixed-array-mapper-input.*.expected > 07511-fixed-array-mapper-input.*.expected does not exist > ----------===========================---------- > All language behaviour tests passed > > From benc at hawaga.org.uk Sun May 25 14:22:57 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Sun, 25 May 2008 19:22:57 +0000 (GMT) Subject: [Swift-user] what is wrong with restart5.swift? In-Reply-To: <1211737553.10072.10.camel@localhost> References: <1211734604.10072.0.camel@localhost> <1211735361.10072.2.camel@localhost> <1211735928.10072.5.camel@localhost> <1211736985.10072.7.camel@localhost> <1211737553.10072.10.camel@localhost> Message-ID: On Sun, 25 May 2008, Mihael Hategan wrote: > Should be fixed in r2004. runs ok without restarts using r2004. I put it in tests/misc/restart5.sh, as three runs: i) runs without restart, expecing a succesful end. ii) run with B set to fail and A and C set to succeed, expecting a fail iii) restart the run from ii) with A set to fail and B and C set to succeed, expecting success. This should check that all of the A jobs get run the first time and that their output is logged for restart successfully (because A cannot be called in part iii). During my viewing of Eurovision last night, I was poking round with some of the stuff I talked about wrt scope identification. As far as I can tell, at the moment, the way that VDLFunction.getThreadPrefix works at the moment is roughly the "right thing" to do as far as that is concerned: New SwiftScript scopes are always represented in Karajan as a new thread; and those threads are labelled by their position in the source code (because they get numbered in accordance with their position in a block, so this is invariant wrt restarts) or, if a stack frame has a $ variable (the foreach iteration variable container) then the position in the array being iterated is used instead of the thread number. This is pretty much the behaviour I think is desirable (modulo the fact that this implementation probably doesn't work for iterate{} but I think that is easily fixable). So I think filename based concurrent mapping works with restarts (modulo iterate{}). I also think that is(was?) the only correctness problem with restarts at the moment. -- From hategan at mcs.anl.gov Mon May 26 11:55:16 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 26 May 2008 11:55:16 -0500 Subject: [Swift-user] what is wrong with restart5.swift? In-Reply-To: References: <1211734604.10072.0.camel@localhost> <1211735361.10072.2.camel@localhost> <1211735928.10072.5.camel@localhost> <1211736985.10072.7.camel@localhost> <1211737553.10072.10.camel@localhost> Message-ID: <1211820916.3593.2.camel@localhost> On Sun, 2008-05-25 at 19:22 +0000, Ben Clifford wrote: > On Sun, 25 May 2008, Mihael Hategan wrote: > So I think filename based concurrent mapping works with restarts > (modulo iterate{}). > > I also think that is(was?) the only correctness problem with restarts at > the moment. There was also the ability to persuade a mapper (in particular temp mappers) to use specific files instead of something of their choice. > From lixi at uchicago.edu Wed May 28 12:18:27 2008 From: lixi at uchicago.edu (lixi at uchicago.edu) Date: Wed, 28 May 2008 12:18:27 -0500 (CDT) Subject: [Swift-user] Swift finished with errors Message-ID: <20080528121827.BAQ66049@m4500-03.uchicago.edu> Hi, I just ran a simple workflow on multiple OSG sites. But it failed with errors several times and the command line output the similar errors: ... node failed Execution failed: Failed to link input file _concurrent/intermediatefile-9c469a2f-4d9f-47a0-a660- eb1634b97559- According to the log file, it seemed that this failed job was submitted to site "UCSDT2". However, both data transfer and globus job execution could be done successfully on this site. It seems very strange for me. The log file is on CI host: /home/lixi/newswift/latest/score/100/workflowtest- 20080528-1136-qovzbq70.log Could you please help me to find out the reason and solution? Thanks a lot! Xi From benc at hawaga.org.uk Wed May 28 17:54:11 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 28 May 2008 22:54:11 +0000 (GMT) Subject: [Swift-user] Swift finished with errors In-Reply-To: <20080528121827.BAQ66049@m4500-03.uchicago.edu> References: <20080528121827.BAQ66049@m4500-03.uchicago.edu> Message-ID: can you run examples/vdsk/first.swift on that site UCSDT2 (using the same sites.xml entry as you used for this workflow, without the other 7 sites in it) ? -- From lixi at uchicago.edu Wed May 28 18:31:39 2008 From: lixi at uchicago.edu (lixi at uchicago.edu) Date: Wed, 28 May 2008 18:31:39 -0500 (CDT) Subject: [Swift-user] Swift finished with errors Message-ID: <20080528183139.BAR28364@m4500-03.uchicago.edu> Yes, it can do that. But I think that first.swift doesn't produce any intermediate file. Thanks, Xi ---- Original message ---- >Date: Wed, 28 May 2008 22:54:11 +0000 (GMT) >From: Ben Clifford >Subject: Re: [Swift-user] Swift finished with errors >To: lixi at uchicago.edu >Cc: swift-user at ci.uchicago.edu, swift-devel at ci.uchicago.edu > >can you run examples/vdsk/first.swift on that site UCSDT2 (using the same >sites.xml entry as you used for this workflow, without the other 7 sites >in it) ? > >-- > > From benc at hawaga.org.uk Wed May 28 19:35:11 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 29 May 2008 00:35:11 +0000 (GMT) Subject: [Swift-user] Swift finished with errors In-Reply-To: <20080528183139.BAR28364@m4500-03.uchicago.edu> References: <20080528183139.BAR28364@m4500-03.uchicago.edu> Message-ID: On Wed, 28 May 2008, lixi at uchicago.edu wrote: > Yes, it can do that. But I think that first.swift doesn't > produce any intermediate file. There's a test, tests/language-behaviour/062-two-in-a-row.swift that has an intermediate file, if you specifically want to test that (as of swift svn r2000) -- From benc at hawaga.org.uk Wed May 28 19:53:02 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 29 May 2008 00:53:02 +0000 (GMT) Subject: [Swift-user] Swift finished with errors In-Reply-To: References: <20080528121827.BAQ66049@m4500-03.uchicago.edu> Message-ID: are you running this with replication enabled? if so, that's broken. don't use it. -- From benc at hawaga.org.uk Wed May 28 20:36:31 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 29 May 2008 01:36:31 +0000 (GMT) Subject: [Swift-user] Swift finished with errors In-Reply-To: References: <20080528121827.BAQ66049@m4500-03.uchicago.edu> Message-ID: I get a similar error running the site tests (tests/sites/run-site) using your site definition. -- From hategan at mcs.anl.gov Wed May 28 20:45:35 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 28 May 2008 20:45:35 -0500 Subject: [Swift-devel] Re: [Swift-user] Swift finished with errors In-Reply-To: References: <20080528121827.BAQ66049@m4500-03.uchicago.edu> Message-ID: <1212025535.11944.0.camel@localhost> On Thu, 2008-05-29 at 00:53 +0000, Ben Clifford wrote: > are you running this with replication enabled? if so, that's broken. don't > use it. How so? From benc at hawaga.org.uk Wed May 28 20:54:16 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 29 May 2008 01:54:16 +0000 (GMT) Subject: [Swift-devel] Re: [Swift-user] Swift finished with errors In-Reply-To: <1212025535.11944.0.camel@localhost> References: <20080528121827.BAQ66049@m4500-03.uchicago.edu> <1212025535.11944.0.camel@localhost> Message-ID: On Wed, 28 May 2008, Mihael Hategan wrote: > On Thu, 2008-05-29 at 00:53 +0000, Ben Clifford wrote: > > are you running this with replication enabled? if so, that's broken. don't > > use it. > > How so? this: http://mail.ci.uchicago.edu/pipermail/swift-devel/2008-May/003140.html is an artificial reconstruction of a problem that Xi seemed to encounter in real life, where jobs seem to get overreplicated. -- From hategan at mcs.anl.gov Wed May 28 21:14:56 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 28 May 2008 21:14:56 -0500 Subject: [Swift-devel] Re: [Swift-user] Swift finished with errors In-Reply-To: References: <20080528121827.BAQ66049@m4500-03.uchicago.edu> <1212025535.11944.0.camel@localhost> Message-ID: <1212027296.12318.2.camel@localhost> On Thu, 2008-05-29 at 01:54 +0000, Ben Clifford wrote: > On Wed, 28 May 2008, Mihael Hategan wrote: > > > On Thu, 2008-05-29 at 00:53 +0000, Ben Clifford wrote: > > > are you running this with replication enabled? if so, that's broken. don't > > > use it. > > > > How so? > > this: > > http://mail.ci.uchicago.edu/pipermail/swift-devel/2008-May/003140.html I had a feeling that r2004 should have solved that, but I see that it might not. > > is an artificial reconstruction of a problem that Xi seemed to encounter > in real life, where jobs seem to get overreplicated. > From benc at hawaga.org.uk Wed May 28 21:23:37 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 29 May 2008 02:23:37 +0000 (GMT) Subject: [Swift-devel] Re: [Swift-user] Swift finished with errors In-Reply-To: <1212027296.12318.2.camel@localhost> References: <20080528121827.BAQ66049@m4500-03.uchicago.edu> <1212025535.11944.0.camel@localhost> <1212027296.12318.2.camel@localhost> Message-ID: On Wed, 28 May 2008, Mihael Hategan wrote: > I had a feeling that r2004 should have solved that, but I see that it > might not. It doesn't work for me in a run I just tried. -- From hategan at mcs.anl.gov Wed May 28 22:06:34 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 28 May 2008 22:06:34 -0500 Subject: [Swift-devel] Re: [Swift-user] Swift finished with errors In-Reply-To: References: <20080528121827.BAQ66049@m4500-03.uchicago.edu> <1212025535.11944.0.camel@localhost> <1212027296.12318.2.camel@localhost> Message-ID: <1212030394.13449.0.camel@localhost> On Thu, 2008-05-29 at 02:23 +0000, Ben Clifford wrote: > On Wed, 28 May 2008, Mihael Hategan wrote: > > > I had a feeling that r2004 should have solved that, but I see that it > > might not. > > It doesn't work for me in a run I just tried. Ok. Good to know. > From lixi at uchicago.edu Thu May 29 16:08:51 2008 From: lixi at uchicago.edu (lixi at uchicago.edu) Date: Thu, 29 May 2008 16:08:51 -0500 (CDT) Subject: [Swift-user] A weird result Message-ID: <20080529160851.BAS36890@m4500-03.uchicago.edu> Hi, Just now I encountered a weird result when running a Swift workflow. The command line output is as follows: ... Progress: Stage out:1 Finished successfully:1000 node completed Final status: Finished successfully:1001 It seemed that all jobs have been finished successfully, but the all of things stoppted at this time. The execution doesn't return. In the log file, the last line is: 2008-05-29 15:20:24,689-0500 INFO vdl:cleanup END dir=workflowtest-20080529-1145-n44o2cj1 host=AGLT2 Normally, it should be the last but one line and the last line should be "...Swift finished with no errors". The log file is on CI: /home/lixi/newswift/latest/score/1000/workflowtest- 20080529-1145-n44o2cj1.log It is the first time for me to encounter such a situation. I don't know why. Thanks for attention. Xi From benc at hawaga.org.uk Thu May 29 17:42:09 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 29 May 2008 22:42:09 +0000 (GMT) Subject: [Swift-user] A weird result In-Reply-To: <20080529160851.BAS36890@m4500-03.uchicago.edu> References: <20080529160851.BAS36890@m4500-03.uchicago.edu> Message-ID: There are 8 vdl:cleanups started (to clean up the sites) but only 7 are logged as finishing. UFlorida-HPC is not logged as finishing. It looks like that site scored fairly poorly all along too - it was only given 8 jobs to run in the workflow. --