From aespinosa at cs.uchicago.edu Mon May 3 15:05:07 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Mon, 3 May 2010 15:05:07 -0500 Subject: [Swift-devel] ADEM script updates Message-ID: Hi, I reimplemented the ADEM scripts and merged them into a unified script interface. Unit tests were also added. I took the liberty of mirroring the repository to github (http://github.com/aespinosa/adem) in hopes that some random developer in the wild will contribute some patches as well. == Usage adem command [options] Examples: adem config --display adem sites --update adem app --avail Further help: adem config -h/--help Configure ADEM adem sites -h/--help Manipulate the site list adem app -h/--help Application installation adem help This help message so far the config, app and help commands work. the sites command has some basic functionality already like generating a configuration needed to install an application. other stuff todo: 1. create a gem package (i.e. 'gem install adem') 2. clean the sites command interface 3. more documentation 4. example configurations and usage For now the script I made is useful enough for me to install apps on sites that I directly specified (at least on Firefly) -Allan -- Allan M. Espinosa PhD student, Computer Science University of Chicago From aespinosa at cs.uchicago.edu Tue May 4 09:35:07 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Tue, 04 May 2010 09:35:07 -0500 Subject: [Swift-devel] failed jobs while other jobs were being staged out Message-ID: <1272983707.12927.2.camel@origin> Hi, I noticed this while running my workflow: swift-r3288 cog-r2750 ... :436 Finished in previous run:20811 Finished successfully:1821 Failed but can retry:1 Progress: Stage in:131 Submitting:7 Submitted:34 Active:5 Checking status:38 Stage out :436 Finished in previous run:20811 Finished successfully:1821 Failed but can retry:1 Progress: Stage in:131 Submitting:6 Submitted:35 Active:5 Checking status:38 Stage out :436 Finished in previous run:20811 Finished successfully:1821 Failed but can retry:1 Progress: Stage in:131 Submitting:5 Submitted:36 Active:5 Checking status:38 Stage out :436 Finished in previous run:20811 Finished successfully:1821 Failed but can retry:1 Execution failed: Progress: Stage in:130 Submitting:5 Submitted:37 Active:5 Checking status:30 Stage out:444 Failed:1 Finished in previous run:20811 Finished successfully:1821 Exception in surfeis_rspectra: Arguments: [simulation_out_pointsX=2, simulation_out_pointsY=1, surfseis_rspectra_seismogram_units=cmpersec, surfseis_rspectra_output_units=cmpersec2, surfseis_rspectra_output_type=aa, surfseis_rspectra_apply_byteswap=no, simulation_out_timesamples=3000, simulation_out_timeskip=0.1, surfseis_rspectra_period=all, surfseis_rspectra_apply_filter_highHZ=5, in=panfs/panasas/CMS/data/engage/swift/219/175/Seismogram_TEST_219_175_0029.grm, out=panfs/panasas/CMS/data/engage/swift/219/175/PeakVals_TEST_219_175_0029.bsa] Host: FIREFLY When a job completely fails after several retries shouldn't swift wait for other jobs to be finished before a nonzero exit? Thanks, -Allan From benc at hawaga.org.uk Tue May 4 10:55:29 2010 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 4 May 2010 15:55:29 +0000 (GMT) Subject: [Swift-devel] failed jobs while other jobs were being staged out In-Reply-To: <1272983707.12927.2.camel@origin> References: <1272983707.12927.2.camel@origin> Message-ID: > When a job completely fails after several retries shouldn't swift wait > for other jobs to be finished before a nonzero exit? That was switchable - the lazy errors setting in the config file. Having it set one was makes it fail as soon as any job has exhausted its retries (this was good for debugging workflows) and the other makes it keep going trying all the jobs it can (this is good for production runs where you want to get as far as possible and know you're going to restart the workflow on failure). -- From aespinosa at cs.uchicago.edu Tue May 4 15:05:15 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Tue, 04 May 2010 15:05:15 -0500 Subject: [Swift-devel] CDM tests on OSG Message-ID: <1273003515.12927.82.camel@origin> Hi, Here's the result of testing CDM features on two OSG sites. the policy file rule .*TEST_f[x|y]_644.sgt DIRECT /osg/storage/data/engage/scec/SgtFiles/TEST rule .*TEST_f[x|y]_644.sgt DIRECT /panfs/panasas/CMS/data/engage/scec/SgtFiles/TEST rule .*/[0-9]+/[0-9]+/.*.txt.variation.* DIRECT /osg/storage/data/engage/scec/RuptureVariations rule .*/[0-9]+/[0-9]+/.*.txt.variation.* DIRECT /panfs/panasas/CMS/data/engage/scec/RuptureVariations rule .* DEFAULT sample wrapper log on host HARVARD: Progress 2010-05-04 15:55:55.081637000-0400 LOG_START _____________________________________________________________________________ Wrapper _____________________________________________________________________________ Progress 2010-05-04 15:55:55.086896000-0400 SOURCE_CDM_LIB /osg/storage/data/engage/swift/postproc-tw Job directory mode is: link on shared filesystem DIR=jobs/d/extract-d5i5khrj EXEC=/osg/storage/app/engage/JBSim3d/bin/jbsim3d STDIN= STDOUT=stdout.txt STDERR=stderr.txt DIRS=TEST|158/8|panfs/panasas/CMS/data/engage/swift/158/8 INF=TEST/TEST_fy_644.sgt|TEST/TEST_fx_644.sgt| 158/8/158_8.txt.variation-s0001-h0000 OUTF=panfs/panasas/CMS/data/engage/swift/158/8/TEST_158_8_subfy.sgt| panfs/panasas/CMS/data/engage/swift KICKSTART= CDM_FILE=shared/fs.data ARGS=stat=TEST extract_sgt=1 slon=-118.286 slat=34.0192 rupmodfile=158/8/158_8.txt.variation-s0001-h000 ARGC=9 Progress 2010-05-04 15:55:55.089785000-0400 CREATE_JOBDIR Created job directory: jobs/d/extract-d5i5khrj Progress 2010-05-04 15:55:55.092773000-0400 CREATE_INPUTDIR Created output directory: jobs/d/extract-d5i5khrj/TEST Created output directory: jobs/d/extract-d5i5khrj/158/8 Created output directory: jobs/d/extract-d5i5khrj/panfs/panasas/CMS/data/engage/swift/158/8 Progress 2010-05-04 15:55:55.100016000-0400 LINK_INPUTS CDM_POLICY: TEST/TEST_fy_644.sgt -> DIRECT /panfs/panasas/CMS/data/engage/scec/SgtFiles CDM_ACTION: jobs/d/extract-d5i5khrj INPUT TEST/TEST_fy_644.sgt DIRECT /panfs/panasas/CMS/data/engage/sc CDM[DIRECT]: Linking to /panfs/panasas/CMS/data/engage/scec/SgtFiles/TEST/TEST_fy_644.sgt via jobs/d/ex CDM[DIRECT]: /panfs/panasas/CMS/data/engage/scec/SgtFiles/TEST/TEST_fy_644.sgt does not exist! _____________________________________________________________________________ It cannot find the *.sgt files because it followed the first rule that matched. wrapper log on FIREFLY: Progress 2010-05-04 14:55:48.953957000-0500 LOG_START _____________________________________________________________________________ Wrapper _____________________________________________________________________________ Progress 2010-05-04 14:55:49.219389000-0500 SOURCE_CDM_LIB /panfs/panasas/CMS/data/engage/swift/postp Job directory mode is: link on shared filesystem DIR=jobs/5/extract-55i5khrj EXEC=/panfs/panasas/CMS/app/engage/JBSim3d/bin/jbsim3d STDIN= STDOUT=stdout.txt STDERR=stderr.txt DIRS=TEST|158/5|panfs/panasas/CMS/data/engage/swift/158/5 INF=TEST/TEST_fy_644.sgt|TEST/TEST_fx_644.sgt| 158/5/158_5.txt.variation-s0001-h0000 OUTF=panfs/panasas/CMS/data/engage/swift/158/5/TEST_158_5_subfy.sgt| panfs/panasas/CMS/data/engage/swift KICKSTART= CDM_FILE=shared/fs.data ARGS=stat=TEST extract_sgt=1 slon=-118.286 slat=34.0192 rupmodfile=158/5/158_5.txt.variation-s0001-h000 ARGC=9 Progress 2010-05-04 14:55:49.543697000-0500 CREATE_JOBDIR Created job directory: jobs/5/extract-55i5khrj Progress 2010-05-04 14:55:51.058719000-0500 CREATE_INPUTDIR Created output directory: jobs/5/extract-55i5khrj/TEST Created output directory: jobs/5/extract-55i5khrj/158/5 Created output directory: jobs/5/extract-55i5khrj/panfs/panasas/CMS/data/engage/swift/158/5 Progress 2010-05-04 14:55:51.141540000-0500 LINK_INPUTS CDM_POLICY: TEST/TEST_fy_644.sgt -> DIRECT /panfs/panasas/CMS/data/engage/scec/SgtFiles CDM_ACTION: jobs/5/extract-55i5khrj INPUT TEST/TEST_fy_644.sgt DIRECT /panfs/panasas/CMS/data/engage/sc CDM[DIRECT]: Linking to /panfs/panasas/CMS/data/engage/scec/SgtFiles/TEST/TEST_fy_644.sgt via jobs/5/ex CDM_POLICY: TEST/TEST_fx_644.sgt -> DIRECT /panfs/panasas/CMS/data/engage/scec/SgtFiles CDM_ACTION: jobs/5/extract-55i5khrj INPUT TEST/TEST_fx_644.sgt DIRECT /panfs/panasas/CMS/data/engage/sc CDM[DIRECT]: Linking to /panfs/panasas/CMS/data/engage/scec/SgtFiles/TEST/TEST_fx_644.sgt via jobs/5/ex Could not locate input file: 158/5/158_5.txt.variation-s0001-h0000 In this job, I have two input files. since it already found the first one, it did not go to the other input files which also has a DIRECT policy. From aespinosa at cs.uchicago.edu Tue May 4 15:31:10 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Tue, 04 May 2010 15:31:10 -0500 Subject: [Swift-devel] CDM tests on OSG In-Reply-To: <1273003515.12927.82.camel@origin> References: <1273003515.12927.82.camel@origin> Message-ID: <1273005070.12927.84.camel@origin> Oh, i placed the wrong policy on my *variation* files. I fixed them all jobs in FIREFLY executed successfully. However, the problem at the HARVARD site still remain. -Allan On Mar, 2010-05-04 at 15:05 -0500, Allan Espinosa wrote: > Hi, > > Here's the result of testing CDM features on two OSG sites. > > the policy file > rule .*TEST_f[x|y]_644.sgt > DIRECT /osg/storage/data/engage/scec/SgtFiles/TEST > rule .*TEST_f[x|y]_644.sgt > DIRECT /panfs/panasas/CMS/data/engage/scec/SgtFiles/TEST > rule .*/[0-9]+/[0-9]+/.*.txt.variation.* > DIRECT /osg/storage/data/engage/scec/RuptureVariations > rule .*/[0-9]+/[0-9]+/.*.txt.variation.* > DIRECT /panfs/panasas/CMS/data/engage/scec/RuptureVariations > rule .* DEFAULT > > > sample wrapper log on host HARVARD: > Progress 2010-05-04 15:55:55.081637000-0400 LOG_START > > _____________________________________________________________________________ > > Wrapper > _____________________________________________________________________________ > > Progress 2010-05-04 15:55:55.086896000-0400 > SOURCE_CDM_LIB /osg/storage/data/engage/swift/postproc-tw > Job directory mode is: link on shared filesystem > DIR=jobs/d/extract-d5i5khrj > EXEC=/osg/storage/app/engage/JBSim3d/bin/jbsim3d > STDIN= > STDOUT=stdout.txt > STDERR=stderr.txt > DIRS=TEST|158/8|panfs/panasas/CMS/data/engage/swift/158/8 > INF=TEST/TEST_fy_644.sgt|TEST/TEST_fx_644.sgt| > 158/8/158_8.txt.variation-s0001-h0000 > OUTF=panfs/panasas/CMS/data/engage/swift/158/8/TEST_158_8_subfy.sgt| > panfs/panasas/CMS/data/engage/swift > KICKSTART= > CDM_FILE=shared/fs.data > ARGS=stat=TEST extract_sgt=1 slon=-118.286 slat=34.0192 > rupmodfile=158/8/158_8.txt.variation-s0001-h000 > ARGC=9 > Progress 2010-05-04 15:55:55.089785000-0400 CREATE_JOBDIR > Created job directory: jobs/d/extract-d5i5khrj > Progress 2010-05-04 15:55:55.092773000-0400 CREATE_INPUTDIR > Created output directory: jobs/d/extract-d5i5khrj/TEST > Created output directory: jobs/d/extract-d5i5khrj/158/8 > Created output directory: > jobs/d/extract-d5i5khrj/panfs/panasas/CMS/data/engage/swift/158/8 > Progress 2010-05-04 15:55:55.100016000-0400 LINK_INPUTS > CDM_POLICY: TEST/TEST_fy_644.sgt -> > DIRECT /panfs/panasas/CMS/data/engage/scec/SgtFiles > CDM_ACTION: jobs/d/extract-d5i5khrj INPUT TEST/TEST_fy_644.sgt > DIRECT /panfs/panasas/CMS/data/engage/sc > CDM[DIRECT]: Linking > to /panfs/panasas/CMS/data/engage/scec/SgtFiles/TEST/TEST_fy_644.sgt via > jobs/d/ex > CDM[DIRECT]: /panfs/panasas/CMS/data/engage/scec/SgtFiles/TEST/TEST_fy_644.sgt does not exist! > > _____________________________________________________________________________ > > From wozniak at mcs.anl.gov Tue May 4 15:52:23 2010 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Tue, 4 May 2010 15:52:23 -0500 (Central Daylight Time) Subject: [Swift-devel] CDM tests on OSG In-Reply-To: <1273005070.12927.84.camel@origin> References: <1273003515.12927.82.camel@origin> <1273005070.12927.84.camel@origin> Message-ID: Ok, I think the big picture here is that you want to match on not just the file name but also the job location. That's something I have not totally thought through but now's a great opportunity to do that. What if the pattern match syntax also took into account the pool handle from sites.xml? Would that do the job? On Tue, 4 May 2010, Allan Espinosa wrote: > Oh, i placed the wrong policy on my *variation* files. I fixed them all > jobs in FIREFLY executed successfully. However, the problem at the > HARVARD site still remain. > > -Allan > > On Mar, 2010-05-04 at 15:05 -0500, Allan Espinosa wrote: >> Hi, >> >> Here's the result of testing CDM features on two OSG sites. >> >> the policy file >> rule .*TEST_f[x|y]_644.sgt >> DIRECT /osg/storage/data/engage/scec/SgtFiles/TEST >> rule .*TEST_f[x|y]_644.sgt >> DIRECT /panfs/panasas/CMS/data/engage/scec/SgtFiles/TEST >> rule .*/[0-9]+/[0-9]+/.*.txt.variation.* >> DIRECT /osg/storage/data/engage/scec/RuptureVariations >> rule .*/[0-9]+/[0-9]+/.*.txt.variation.* >> DIRECT /panfs/panasas/CMS/data/engage/scec/RuptureVariations >> rule .* DEFAULT >> >> >> sample wrapper log on host HARVARD: >> Progress 2010-05-04 15:55:55.081637000-0400 LOG_START >> >> _____________________________________________________________________________ >> >> Wrapper >> _____________________________________________________________________________ >> >> Progress 2010-05-04 15:55:55.086896000-0400 >> SOURCE_CDM_LIB /osg/storage/data/engage/swift/postproc-tw >> Job directory mode is: link on shared filesystem >> DIR=jobs/d/extract-d5i5khrj >> EXEC=/osg/storage/app/engage/JBSim3d/bin/jbsim3d >> STDIN= >> STDOUT=stdout.txt >> STDERR=stderr.txt >> DIRS=TEST|158/8|panfs/panasas/CMS/data/engage/swift/158/8 >> INF=TEST/TEST_fy_644.sgt|TEST/TEST_fx_644.sgt| >> 158/8/158_8.txt.variation-s0001-h0000 >> OUTF=panfs/panasas/CMS/data/engage/swift/158/8/TEST_158_8_subfy.sgt| >> panfs/panasas/CMS/data/engage/swift >> KICKSTART= >> CDM_FILE=shared/fs.data >> ARGS=stat=TEST extract_sgt=1 slon=-118.286 slat=34.0192 >> rupmodfile=158/8/158_8.txt.variation-s0001-h000 >> ARGC=9 >> Progress 2010-05-04 15:55:55.089785000-0400 CREATE_JOBDIR >> Created job directory: jobs/d/extract-d5i5khrj >> Progress 2010-05-04 15:55:55.092773000-0400 CREATE_INPUTDIR >> Created output directory: jobs/d/extract-d5i5khrj/TEST >> Created output directory: jobs/d/extract-d5i5khrj/158/8 >> Created output directory: >> jobs/d/extract-d5i5khrj/panfs/panasas/CMS/data/engage/swift/158/8 >> Progress 2010-05-04 15:55:55.100016000-0400 LINK_INPUTS >> CDM_POLICY: TEST/TEST_fy_644.sgt -> >> DIRECT /panfs/panasas/CMS/data/engage/scec/SgtFiles >> CDM_ACTION: jobs/d/extract-d5i5khrj INPUT TEST/TEST_fy_644.sgt >> DIRECT /panfs/panasas/CMS/data/engage/sc >> CDM[DIRECT]: Linking >> to /panfs/panasas/CMS/data/engage/scec/SgtFiles/TEST/TEST_fy_644.sgt via >> jobs/d/ex >> CDM[DIRECT]: /panfs/panasas/CMS/data/engage/scec/SgtFiles/TEST/TEST_fy_644.sgt does not exist! >> >> _____________________________________________________________________________ >> >> > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -- Justin M Wozniak From aespinosa at cs.uchicago.edu Tue May 4 16:01:00 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Tue, 04 May 2010 16:01:00 -0500 Subject: [Swift-devel] CDM tests on OSG In-Reply-To: References: <1273003515.12927.82.camel@origin> <1273005070.12927.84.camel@origin> Message-ID: <1273006860.12927.87.camel@origin> I was initially thinking of having the policy description inside the handle. But having pool dependent entries on fs.data should work as well. Thanks Justin! -Allan On Mar, 2010-05-04 at 15:52 -0500, Justin M Wozniak wrote: > Ok, I think the big picture here is that you want to match on not just the > file name but also the job location. That's something I have not totally > thought through but now's a great opportunity to do that. > > What if the pattern match syntax also took into account the pool handle > from sites.xml? Would that do the job? > > On Tue, 4 May 2010, Allan Espinosa wrote: > > > Oh, i placed the wrong policy on my *variation* files. I fixed them all > > jobs in FIREFLY executed successfully. However, the problem at the > > HARVARD site still remain. > > > > -Allan > > > > On Mar, 2010-05-04 at 15:05 -0500, Allan Espinosa wrote: > >> Hi, > >> > >> Here's the result of testing CDM features on two OSG sites. > >> > >> the policy file > >> rule .*TEST_f[x|y]_644.sgt > >> DIRECT /osg/storage/data/engage/scec/SgtFiles/TEST > >> rule .*TEST_f[x|y]_644.sgt > >> DIRECT /panfs/panasas/CMS/data/engage/scec/SgtFiles/TEST > >> rule .*/[0-9]+/[0-9]+/.*.txt.variation.* > >> DIRECT /osg/storage/data/engage/scec/RuptureVariations > >> rule .*/[0-9]+/[0-9]+/.*.txt.variation.* > >> DIRECT /panfs/panasas/CMS/data/engage/scec/RuptureVariations > >> rule .* DEFAULT > >> > >> > >> sample wrapper log on host HARVARD: > >> Progress 2010-05-04 15:55:55.081637000-0400 LOG_START > >> > >> _____________________________________________________________________________ > >> > >> Wrapper > >> _____________________________________________________________________________ > >> > >> Progress 2010-05-04 15:55:55.086896000-0400 > >> SOURCE_CDM_LIB /osg/storage/data/engage/swift/postproc-tw > >> Job directory mode is: link on shared filesystem > >> DIR=jobs/d/extract-d5i5khrj > >> EXEC=/osg/storage/app/engage/JBSim3d/bin/jbsim3d > >> STDIN= > >> STDOUT=stdout.txt > >> STDERR=stderr.txt > >> DIRS=TEST|158/8|panfs/panasas/CMS/data/engage/swift/158/8 > >> INF=TEST/TEST_fy_644.sgt|TEST/TEST_fx_644.sgt| > >> 158/8/158_8.txt.variation-s0001-h0000 > >> OUTF=panfs/panasas/CMS/data/engage/swift/158/8/TEST_158_8_subfy.sgt| > >> panfs/panasas/CMS/data/engage/swift > >> KICKSTART= > >> CDM_FILE=shared/fs.data > >> ARGS=stat=TEST extract_sgt=1 slon=-118.286 slat=34.0192 > >> rupmodfile=158/8/158_8.txt.variation-s0001-h000 > >> ARGC=9 > >> Progress 2010-05-04 15:55:55.089785000-0400 CREATE_JOBDIR > >> Created job directory: jobs/d/extract-d5i5khrj > >> Progress 2010-05-04 15:55:55.092773000-0400 CREATE_INPUTDIR > >> Created output directory: jobs/d/extract-d5i5khrj/TEST > >> Created output directory: jobs/d/extract-d5i5khrj/158/8 > >> Created output directory: > >> jobs/d/extract-d5i5khrj/panfs/panasas/CMS/data/engage/swift/158/8 > >> Progress 2010-05-04 15:55:55.100016000-0400 LINK_INPUTS > >> CDM_POLICY: TEST/TEST_fy_644.sgt -> > >> DIRECT /panfs/panasas/CMS/data/engage/scec/SgtFiles > >> CDM_ACTION: jobs/d/extract-d5i5khrj INPUT TEST/TEST_fy_644.sgt > >> DIRECT /panfs/panasas/CMS/data/engage/sc > >> CDM[DIRECT]: Linking > >> to /panfs/panasas/CMS/data/engage/scec/SgtFiles/TEST/TEST_fy_644.sgt via > >> jobs/d/ex > >> CDM[DIRECT]: /panfs/panasas/CMS/data/engage/scec/SgtFiles/TEST/TEST_fy_644.sgt does not exist! > >> > >> _____________________________________________________________________________ From wozniak at mcs.anl.gov Tue May 4 16:01:46 2010 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Tue, 4 May 2010 16:01:46 -0500 (Central Daylight Time) Subject: [Swift-devel] CDM tests on OSG In-Reply-To: <1273003515.12927.82.camel@origin> References: <1273003515.12927.82.camel@origin> Message-ID: On Tue, 4 May 2010, Allan Espinosa wrote: > Here's the result of testing CDM features on two OSG sites. Also, maybe we can add a Globus.org method to CDM. (Background: Allan is using an external script to stage in the data before launching Swift. Then he connects the Swift job to the data using the CDM DIRECT method.) We could add a GLOBUS.ORG method to CDM that would automate some of this. At submit time, the VDL Karajan logic would call into the external Globus.org functionality instead of the normal staging steps. If you want to see what I mean, for an example of similar functionality, take a look at the BROADCAST method. On the BG/P, this method calls out to the f2cn tool to rapidly send data to the compute nodes. (This method is somewhat complex because I tap into a Coasters callback to keep track of what is stored on which compute node, the GLOBUS.ORG method could be simpler if we don't do this.) -- Justin M Wozniak From wilde at mcs.anl.gov Tue May 4 16:42:27 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 4 May 2010 16:42:27 -0500 (CDT) Subject: [Swift-devel] NPE in coaster block processor Message-ID: <12031136.877421273009347214.JavaMail.root@zimbra> I see this in one of Aashish's runs: Exception caught in block processor java.lang.NullPointerException All files and logs are in: /home/aashish/CASP/T0517/run.raptorloops.1258 Traceback below. - Mike Exception caught in block processor java.lang.NullPointerException at org.globus.cog.abstraction.coaster.service.job.manager.Block.fits(Block.java:128) at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.fits(BlockQueueProcessor.java:182) at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.queueToExistingBlocks(BlockQueueProcessor.jav a:202) at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:439) at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:78) Exception caught in block processor java.lang.NullPointerException at org.globus.cog.abstraction.coaster.service.job.manager.Block.fits(Block.java:128) at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.fits(BlockQueueProcessor.java:182) at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.queueToExistingBlocks(BlockQueueProcessor.jav a:202) at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:439) at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:78) Cleaning up... Shutting down service at https://172.5.86.6:50002 Got channel MetaChannel: 1361940173 -> null +Progress: Submitted:228 Active:8 Finished successfully:65 Failed but can retry:1 -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From aespinosa at cs.uchicago.edu Sun May 9 14:48:53 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Sun, 9 May 2010 14:48:53 -0500 Subject: [Swift-devel] swift-plot-log parsing errors Message-ID: <20100509194853.GA23683@origin> Hi, I get these parsing errors: $swift-plot-log ~aespinosa/workflows/cybershake/postproc-3sites_TEST.log ... ... (standard_in) 1: parse error (standard_in) 1: parse error (standard_in) 1: parse error (standard_in) 1: parse error (standard_in) 1: parse error (standard_in) 1: parse error (standard_in) 1: parse error (standard_in) 1: parse error (standard_in) 1: parse error (standard_in) 1: parse error (standard_in) 1: parse error (standard_in) 1: parse error (standard_in) 1: parse error (standard_in) 1: parse error (standard_in) 1: parse error (standard_in) 1: parse error (standard_in) 1: parse error (standard_in) 1: parse error (standard_in) 1: parse error (standard_in) 1: parse error (standard_in) 1: parse error (standard_in) 1: parse error kickstarts-to-event > kickstart.event execution-summaries > execution-counts.txt table-jobs-sites > jobs-sites.html per-site-execute2-durations > site-duration.txt cat execute2.transitions | swap-and-sort |last-transition-line > execute2.last cat execute2.last | sed 's/^\([^ ]*\) \([^ ]*\) \([^ ]*\)\(.*\)/\3/' | sort | uniq -c > execute2.lastsummary cat execute.transitions | swap-and-sort |last-transition-line > execute.last cat execute.last | sed 's/^\([^ ]*\) \([^ ]*\) \([^ ]*\)\(.*\)/\3/' | sort | uniq -c > execute.lastsummary cat execute2.event | cut -f 5 -d ' ' | sort | uniq -c | sort | sed 's/^ *\(.*\) .*$/\1/' | uniq -c > jobs.retrycount.summary cat execute.event | cut -f 5 -d ' ' | sort | uniq -c > trname-summary m4 -I/home/aespinosa/swift/current/bin/../libexec/log-processing/ /home/aespinosa/swift/current/bin/../libexec/log-processing//index.html.template > index.html m4 -I/home/aespinosa/swift/current/bin/../libexec/log-processing/ /home/aespinosa/swift/current/bin/../libexec/log-processing//execute2.html.template > execute2.html m4 -I/home/aespinosa/swift/current/bin/../libexec/log-processing/ /home/aespino sa/swift/current/bin/../libexec/log-processing//execute.html.template > execute .html m4 -I/home/aespinosa/swift/current/bin/../libexec/log-processing/ /home/aespino sa/swift/current/bin/../libexec/log-processing//karajan.html.template > karajan .html m4 -I/home/aespinosa/swift/current/bin/../libexec/log-processing/ /home/aespino sa/swift/current/bin/../libexec/log-processing//info.html.template > info.html m4 -I/home/aespinosa/swift/current/bin/../libexec/log-processing/ /home/aespino sa/swift/current/bin/../libexec/log-processing//assorted.html.template > assort ed.html m4 -I/home/aespinosa/swift/current/bin/../libexec/log-processing/ /home/aespino sa/swift/current/bin/../libexec/log-processing//kickstart.html.template > kicks tart.html /home/aespinosa/swift/current/bin/../libexec/log-processing//kickstart.html.tem plate:23: m4: Cannot open kickstart.stats: No such file or directory m4 -I/home/aespinosa/swift/current/bin/../libexec/log-processing/ /home/aespino sa/swift/current/bin/../libexec/log-processing//falkon.html.template > falkon.h tml m4 -I/home/aespinosa/swift/current/bin/../libexec/log-processing/ /home/aespinosa/swift/current/bin/../libexec/log-processing//scheduler.html.template > scheduler.html m4 -I/home/aespinosa/swift/current/bin/../libexec/log-processing/ /home/aespinosa/swift/current/bin/../libexec/log-processing//coasters.html.template > coasters.html rm karatasks.FILE_OPERATION.sorted-start.event karatasks.JOB_SUBMISSION.Active.event karatasks.JOB_SUBMISSION.Queue.sorted-start.event karatasks.last dostageout.sorted-start.event karatasks.JOB_SUBMISSION.seenstates karatasks.FILE_TRANSFER.sorted-start.event execute.last karatasks.JOB_SUBMISSION.Queue.eip initshareddir.event karatasks.JOB_SUBMISSION.Active.sorted-start.event execute.seenstates karatasks.FILE_OPERATION.seenstates karatasks.JOB_SUBMISSION.eip karatasks.JOB_SUBMISSION.Queue.sorted-by-duration karatasks.FILE_OPERATION.eip createdirset.event karatasks.JOB_SUBMISSION.event dostageout.sorted-by-duration karatasks.FILE_TRANSFER.seenstates karatasks.FILE_OPERATION.event karatasks.JOB_SUBMISSION.Queue.event execute2.seenstates karatasks.FILE_TRANSFER.eip dostagein.sorted-by-duration karatasks.JOB_SUBMISSION.Active.sorted-by-duration karatasks.JOB_SUBMISSION.Active.eip karatasks.FILE_TRANSFER.event execute2.last execute.sorted-start.event karatasks.JOB_SUBMISSION.sorted-start.event cp: cannot stat `clean': No such file or directory cp: cannot stat `webpage.kara': No such file or directory cp: cannot stat `webpage': No such file or directory I was thinking it was because of some swift.properties that we don't typically use.: $ diff /home/aespinosa/swift/current/etc/swift.properties swift.properties 44c44 < lazy.errors=false --- > lazy.errors=true 99c99 < clustering.enabled=false --- > clustering.enabled=true 114c114 < clustering.min.time=60 --- > clustering.min.time=360 152c152 < wrapperlog.always.transfer=false --- > wrapperlog.always.transfer=true 241c241 < sitedir.keep=false --- > sitedir.keep=true 247c247 < execution.retries=2 --- > execution.retries=0 263c263 < replication.min.queue.time=60 --- > replication.min.queue.time=180 322c322 < foreach.max.threads=16384 --- > foreach.max.threads=1024 352c352 < use.provider.staging=false \ No newline at end of file --- > use.provider.staging=false I enabled job replicas as well before but disabled them back again so I can isolate what's really going wrong. -Allan From benc at hawaga.org.uk Sun May 9 14:54:05 2010 From: benc at hawaga.org.uk (Ben Clifford) Date: Sun, 9 May 2010 19:54:05 +0000 (GMT) Subject: [Swift-devel] swift-plot-log parsing errors In-Reply-To: <20100509194853.GA23683@origin> References: <20100509194853.GA23683@origin> Message-ID: > I get these parsing errors: > $swift-plot-log ~aespinosa/workflows/cybershake/postproc-3sites_TEST.log > ... > ... > (standard_in) 1: parse error Look for the make target that this is part of. i.e. read backwards in the '...' above until you find which file is being parsed. At least as I left it a year ago, the log plotting code was pretty fragile wrt unexpected things, but you should be able to track down whats going on with some poking. -- From aespinosa at cs.uchicago.edu Sun May 9 20:41:03 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Sun, 9 May 2010 20:41:03 -0500 Subject: [Swift-devel] swift-plot-log parsing errors In-Reply-To: References: <20100509194853.GA23683@origin> Message-ID: <20100510014103.GA1975@origin> I forgot to log the output of the build previoisly so I had to rerun the process again. Anyway, here's the target: ... cat dostagein.event | sort -n -k 2 > dostagein.sorted-by-duration? plot-duration-histogram dostagein.sorted-by-duration dostagein-duration-histogram.png? cat dostageout.event | sort -n -k 2 > dostageout.sorted-by-duration? plot-duration-histogram dostageout.sorted-by-duration dostageout-duration-histogram.png? cat execute.transitions | sed 's/[^ ]* *[^ ]* \([^ ]*\).*/\1/' | sort | uniq > execute.seenstates? trail execute? cat execute2.transitions | sed 's/[^ ]* *[^ ]* \([^ ]*\).*/\1/' | sort | uniq > execute2.seenstates? trail execute2? info-and-karajan-actives /home/aespinosa/workflows/cybershake/postproc-3sites_TEST.d/? (standard_in) 1: parse error? (standard_in) 1: parse error? (standard_in) 1: parse error? (standard_in) 1: parse error? (standard_in) 1: parse error? (standard_in) 1: parse error? (standard_in) 1: parse error? (standard_in) 1: parse error? (standard_in) 1: parse error? (standard_in) 1: parse error? (standard_in) 1: parse error? (standard_in) 1: parse error? (standard_in) 1: parse error? (standard_in) 1: parse error? ... i guess from here the series of targets are webpage -> pngs -> info-and-actives.png -> execute2.event, karatasks.transitions On Sun, May 09, 2010 at 07:54:05PM +0000, Ben Clifford wrote: > > > I get these parsing errors: > > Look for the make target that this is part of. > > i.e. read backwards in the '...' above until you find which file is being > parsed. > > At least as I left it a year ago, the log plotting code was pretty fragile > wrt unexpected things, but you should be able to track down whats going on > with some poking. > > -- > > > From benc at hawaga.org.uk Mon May 10 02:35:52 2010 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 10 May 2010 07:35:52 +0000 (GMT) Subject: [Swift-devel] swift-plot-log parsing errors In-Reply-To: <20100510014103.GA1975@origin> References: <20100509194853.GA23683@origin> <20100510014103.GA1975@origin> Message-ID: > info-and-karajan-actives That's right before the errors. Here's what its documented as doing (at least in r3264) at the top: # for every job which has an info file, # create two columns, one being the start time of the job according # to the info file and the other being the Active time according to # karajan It needs both info files and a successfully generated execute2.event file, by the looks of it (and possibly other files). This file may be irrelevant to you and so you can perhaps stop it being generated. The purpose of it was to get some feeling for how well the live-reported job-active notifications match up with the post-run collected info files. Depending on the execution path from CoG to the worker node (eg gram and the queueing system), there could be very large time differences (in either direction). The -info files are generally more accurate but you don't know them during the run. The live reported notifications were used to update eg. the console status ticker - but if the underlying notifications are wrong then the ticker is also wrong. -- http://www.hawaga.org.uk/ben/ From aespinosa at cs.uchicago.edu Mon May 10 17:24:25 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Mon, 10 May 2010 17:24:25 -0500 Subject: [Swift-devel] Request: doStage(In|Out) plots colored per site Message-ID: <20100510222425.GC4588@origin> Hi, I would like for swift-plot-log to generate transfer time statistics per site so I added the following modifications to the trunk tree diff --git a/libexec/log-processing/log-to-dostagein-transitions b/libexec/log- index 16d0547..f4e8707 100755 --- a/libexec/log-processing/log-to-dostagein-transitions +++ b/libexec/log-processing/log-to-dostagein-transitions @@ -12,7 +12,7 @@ grep ' vdl:dostagein ' | iso-to-secs | \ grep -E '^[^ ]+ +[^ ]+ +vdl:dostagein ' | \ sed 's/\([^ ]*\) [^ ]* *vdl:dostagein FILE_STAGE_IN_.*$//' | \ -sed 's/\([^ ]*\) INFO vdl:dostagein \([^ ]*\) jobid=\([^ ]*\) -.*$/\1 \3 \2/' +sed 's/\([^ ]*\) INFO vdl:dostagein \([^ ]*\) jobid=\([^ ]*\) host=\([^ ]*\) grep -E '^[^$]' exit 0 diff --git a/libexec/log-processing/log-to-dostageout-transitions b/libexec/log index 07e98e0..5653527 100755 --- a/libexec/log-processing/log-to-dostageout-transitions +++ b/libexec/log-processing/log-to-dostageout-transitions @@ -12,7 +12,7 @@ grep ' vdl:dostageout ' | iso-to-secs | \ grep -E '^[^ ]+ +[^ ]+ +vdl:dostageout ' | \ sed 's/\([^ ]*\) [^ ]* *vdl:dostageout FILE_STAGE_OUT_.*$//' | \ -sed 's/\([^ ]*\) INFO vdl:dostageout \([^ ]*\) jobid=\([^ ]*\) -.*$/\1 \3 \2/ +sed 's/\([^ ]*\) INFO vdl:dostageout \([^ ]*\) jobid=\([^ ]*\) host=\([^ ]*\) grep -E '^[^$]' exit 0 diff --git a/libexec/vdl-int.k b/libexec/vdl-int.k index 440617c..aa78e7d 100644 --- a/libexec/vdl-int.k +++ b/libexec/vdl-int.k @@ -295,7 +295,7 @@ namespace("vdl" doStageinFile(provider=provider, srchost=srchos srcdir=srcdir, desthost=host, d ) - log(LOG:INFO, "END jobid={jobid} - Staging in finished" + log(LOG:INFO, "END jobid={jobid} host={host}- Staging i ) By the looks of the swift-plot-log tree. Some colour-dostage(in|out) scripts should be made to parse these. I was wondering if there are other solutions to getting site transfer stats aside from this proposed hack like querying provenance information or doing site query per jobid to existing *.event logs so I can reused my old runs. -Allan From benc at hawaga.org.uk Tue May 11 01:47:37 2010 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 11 May 2010 06:47:37 +0000 (GMT) Subject: [Swift-devel] Request: doStage(In|Out) plots colored per site In-Reply-To: <20100510222425.GC4588@origin> References: <20100510222425.GC4588@origin> Message-ID: > I was wondering if there are other solutions to getting site transfer stats > aside from this proposed hack like querying provenance information or doing > site query per jobid to existing *.event logs so I can reused my old runs. I think the jobid in the dostage lines imply the site. They should correspond with the job id in this line: log(LOG:DEBUG, "THREAD_ASSOCIATION jobid={jobid} thread={#thread} host={rhost} replicationGroup={replicationGroup}") in vdl-int.k Something does per-site coloured graphs already but I don't remember what. Whatever that does, you can probably adapt to this. -- From aespinosa at cs.uchicago.edu Tue May 11 20:00:00 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Tue, 11 May 2010 20:00:00 -0500 Subject: [Swift-devel] Request: doStage(In|Out) plots colored per site In-Reply-To: References: <20100510222425.GC4588@origin> Message-ID: <20100512010000.GA7213@origin> The line "log(LOG:DEBUG, "THREAD_ASSOCIATION jobid={jobid} thread={#thread} host={rhost}" replicationGroup={replicationGroup}") in vdl-int.k is inside an execute2 element so the log-to-dostagein-transitions script ignores this. After poking around, I'm going to test the modification below: diff --git a/libexec/log-processing/log-to-dostagein-transitions b/libexec/log-p index 16d0547..7e42bf6 100755 --- a/libexec/log-processing/log-to-dostagein-transitions +++ b/libexec/log-processing/log-to-dostagein-transitions @@ -9,10 +9,11 @@ # 1193169216.993 DEBUG vdl:dostagein FILE_STAGE_IN_START file=file://localhost/ -grep ' vdl:dostagein ' | iso-to-secs | \ +perl -wn -e '/vdl:dostagein|vdl:execute2/ and print' | iso-to-secs | \ grep -E '^[^ ]+ +[^ ]+ +vdl:dostagein ' | \ sed 's/\([^ ]*\) [^ ]* *vdl:dostagein FILE_STAGE_IN_.*$//' | \ -sed 's/\([^ ]*\) INFO vdl:dostagein \([^ ]*\) jobid=\([^ ]*\) -.*$/\1 \3 \2/' +sed 's/^\(.*\) DEBUG vdl:execute2 THREAD_ASSOCIATION jobid=\([^ ]*\) thread=\([ +sed 's/\([^ ]*\) INFO vdl:dostagein \([^ ]*\) jobid=\([^ ]*\) host=\([^ ]*\) - grep -E '^[^$]' exit 0 Take note that I have yet to test this :) -Allan On Tue, May 11, 2010 at 06:47:37AM +0000, Ben Clifford wrote: > > > I was wondering if there are other solutions to getting site transfer stats > > aside from this proposed hack like querying provenance information or doing > > site query per jobid to existing *.event logs so I can reused my old runs. > > I think the jobid in the dostage lines imply the site. They should > correspond with the job id in this line: > > log(LOG:DEBUG, "THREAD_ASSOCIATION > jobid={jobid} thread={#thread} host={rhost} > replicationGroup={replicationGroup}") > > in vdl-int.k > > Something does per-site coloured graphs already but I don't remember what. > Whatever that does, you can probably adapt to this. > > -- > > From aespinosa at cs.uchicago.edu Tue May 11 20:36:46 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Tue, 11 May 2010 20:36:46 -0500 Subject: [Swift-devel] Request: doStage(In|Out) plots colored per site In-Reply-To: <20100512010000.GA7213@origin> References: <20100510222425.GC4588@origin> <20100512010000.GA7213@origin> Message-ID: <20100512013646.GA13552@origin> Ok that didn't work. But i think i got it this time: diff --git a/libexec/log-processing/log-to-dostagein-transitions b/libexec/log-p index 16d0547..b82be5f 100755 --- a/libexec/log-processing/log-to-dostagein-transitions +++ b/libexec/log-processing/log-to-dostagein-transitions @@ -9,8 +9,9 @@ # 1193169216.993 DEBUG vdl:dostagein FILE_STAGE_IN_START file=file://localhost/ -grep ' vdl:dostagein ' | iso-to-secs | \ -grep -E '^[^ ]+ +[^ ]+ +vdl:dostagein ' | \ +grep -E ' (vdl:dostagein|vdl:execute2) ' | iso-to-secs | \ +grep -E '^[^ ]+ +[^ ]+ +(vdl:dostagein |vdl:execute2 THREAD_ASSOCIATION)' | \ +sed 's/^\(.*\) DEBUG vdl:execute2 THREAD_ASSOCIATION jobid=\([^ ]*\) thread=\([ sed 's/\([^ ]*\) [^ ]* *vdl:dostagein FILE_STAGE_IN_.*$//' | \ sed 's/\([^ ]*\) INFO vdl:dostagein \([^ ]*\) jobid=\([^ ]*\) -.*$/\1 \3 \2/' grep -E '^[^$]' sample output: $ tail dostagein.event 1269611776.560 0.0420000553131104 surfeis_rspectra-j8pe0mpj END 0-10-7-13-1-4-1 FIREFLY y7pe0mpj 1269611811.671 0.0469999313354492 surfeis_rspectra-k8pe0mpj END 0-10-7-13-2-4-1 FIREFLY 08pe0mpj 1269611811.676 0.0789999961853027 surfeis_rspectra-l8pe0mpj END 0-10-5-13-1-4-1 FIREFLY 28pe0mpj 1269611823.316 0.065000057220459 surfeis_rspectra-m8pe0mpj END 0-10-9-13-2-4-1 FIREFLY 48pe0mpj 1269611867.418 0.0410001277923584 surfeis_rspectra-n8pe0mpj END 0-10-5-13-2-4-1 FIREFLY 98pe0mpj 1269611871.688 0.0490000247955322 surfeis_rspectra-o8pe0mpj END 0-10-9-13-1-4-1 FIREFLY b8pe0mpj 1269611871.689 0.0870001316070557 surfeis_rspectra-p8pe0mpj END 0-10-2-13-1-4-1 FIREFLY f8pe0mpj 1269611871.696 0.122999906539917 surfeis_rspectra-q8pe0mpj END 0-10-2-13-2-4-1 FIREFLY h8pe0mpj 1269611031.646 0.064000129699707 surfeis_rspectra-v7pe0mpj END 0-10-6-13-2-4-1 FIREFLY 67pe0mpj 1269611038.059 0.0799999237060547 surfeis_rspectra-z7pe0mpj END 0-10-6-13-1-4-1 FIREFLY 87pe0mpj Site info at the second to the last column. yay! :) -Allan On Tue, May 11, 2010 at 08:00:00PM -0500, Allan Espinosa wrote: > The line "log(LOG:DEBUG, "THREAD_ASSOCIATION jobid={jobid} thread={#thread} > host={rhost}" replicationGroup={replicationGroup}") in vdl-int.k is inside an > execute2 element so the log-to-dostagein-transitions script ignores this. > > After poking around, I'm going to test the modification below: > > diff --git a/libexec/log-processing/log-to-dostagein-transitions b/libexec/log-p > index 16d0547..7e42bf6 100755 > --- a/libexec/log-processing/log-to-dostagein-transitions > +++ b/libexec/log-processing/log-to-dostagein-transitions > @@ -9,10 +9,11 @@ > > # 1193169216.993 DEBUG vdl:dostagein FILE_STAGE_IN_START file=file://localhost/ > > -grep ' vdl:dostagein ' | iso-to-secs | \ > +perl -wn -e '/vdl:dostagein|vdl:execute2/ and print' | iso-to-secs | \ > grep -E '^[^ ]+ +[^ ]+ +vdl:dostagein ' | \ > sed 's/\([^ ]*\) [^ ]* *vdl:dostagein FILE_STAGE_IN_.*$//' | \ > -sed 's/\([^ ]*\) INFO vdl:dostagein \([^ ]*\) jobid=\([^ ]*\) -.*$/\1 \3 \2/' > +sed 's/^\(.*\) DEBUG vdl:execute2 THREAD_ASSOCIATION jobid=\([^ ]*\) thread=\([ > +sed 's/\([^ ]*\) INFO vdl:dostagein \([^ ]*\) jobid=\([^ ]*\) host=\([^ ]*\) - > grep -E '^[^$]' From benc at hawaga.org.uk Wed May 12 01:58:25 2010 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 12 May 2010 06:58:25 +0000 (GMT) Subject: [Swift-devel] Request: doStage(In|Out) plots colored per site In-Reply-To: <20100512010000.GA7213@origin> References: <20100510222425.GC4588@origin> <20100512010000.GA7213@origin> Message-ID: > After poking around, I'm going to test the modification below: [...] That will probably work. In my original vague plan, I would have done it differently though: I would have made sure that the host data is available in the execute2.events file (I think it is), and then I would have done a join based on the job ID between the existing stagein.events and execute2.events (like in that info-vs-karajan actives file you were looking at the other day). -- From aespinosa at cs.uchicago.edu Mon May 17 17:41:55 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Mon, 17 May 2010 17:41:55 -0500 Subject: [Swift-devel] provider-condor tempfiles Message-ID: Hi, I am trying to figure out what was wrong with the submit files the condor provider generated. stderr.txt: stdout.txt: ---- Caused by: Cannot submit job: Could not submit job (condor_submit reported an exit code of 1). Submitting job(s) ERROR: Failed to parse command file (line 3). Unfortunately, the ~/.globus/scripts/condor*.submit files get deleted right away. I checked the condor provider source tree and it seems that the call to File.deleteOnExit() isn't being done in the provider. Is there any other place I should be checking where it deletes the submit files? Thanks, -Allan -- Allan M. Espinosa PhD student, Computer Science University of Chicago From aespinosa at cs.uchicago.edu Mon May 17 19:14:29 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Mon, 17 May 2010 19:14:29 -0500 Subject: [Swift-devel] provider-condor submit file generation Message-ID: I was poking around the provider-condor source tree today. provider-condor/src/org/globus/cog/abstraction/impl/execution/condor/DescriptionFileGenerator.java:33+: ... private static void constructDescriptionFile(File descriptionFile, Task task) throws IOException { JobSpecification specification = (JobSpecification) task .getSpecification(); FileWriter fileWriter = new FileWriter(descriptionFile); fileWriter.write("#####################################\n"); fileWriter.write("# Task ID: " + task.getIdentity().toString() + "\n"); fileWriter.write("#####################################\n\n"); String executable = specification.getExecutable(); if (executable != null) { fileWriter.write("Executable = " + executable + "\n"); } String argumentString = specification.getArgumentsAsString(); argumentString = argumentString.replaceAll("\\\"", "\\\\\""); if (argumentString != null) { fileWriter.write("Arguments = " + argumentString + "\n"); } .. But when I looked at a generated condor-g job in my workflow i got: universe = grid grid_resource = gt2 ff-grid.unl.edu/jobmanager stream_output = False stream_error = False Transfer_Executable = false output = /home/aespinosa/workflows/cybershake/condorg/res_testjob.submit.stdout error = /home/aespinosa/workflows/cybershake/condorg/res_testjob.submit.stderr log = /home/aespinosa/workflows/cybershake/condorg/res_testjob.submit.log remote_initialdir = /panfs/panasas/CMS/data/engage-aespinosa/swift/postproc-fireflyg_small executable = /bin/bash arguments = /panfs/panasas/CMS/data/engage-aespinosa/swift/postproc-fireflyg_small/shared/_swiftwrap extract-mo6464sj -jobdir m -scratch -e /panfs/panasas/CMS/app/engage-aespinosa/JBSim3d/bin/jbsim3d -out stdout.txt -err stderr.txt -i -d TEST|158/0|panfs/panasas/CMS/data/engage-aespinosa/swift/158/0 -if TEST/TEST_fy_644.sgt|TEST/TEST_fx_644.sgt|158/0/158_0.txt.variation-s0001-h0000 -of panfs/panasas/CMS/data/engage-aespinosa/swift/158/0/TEST_158_0_subfy.sgt|panfs/panasas/CMS/data/engage-aespinosa/swift/158/0/TEST_158_0_subfx.sgt -k -cdmfile fs.data -status files -a stat=TEST extract_sgt=1 slon=-118.286 slat=34.0192 rupmodfile=158/0/158_0.txt.variation-s0001-h0000 sgt_xfile=TEST/TEST_fx_644.sgt sgt_yfile=TEST/TEST_fy_644.sgt extract_sgt_xfile=panfs/panasas/CMS/data/engage-aespinosa/swift/158/0/TEST_158_0_subfx.sgt extract_sgt_yfile=panfs/panasas/CMS/data/engage-aespinosa/swift/158/0/TEST_158_0_subfy.sgt notification = Never leave_in_queue = TRUE queue I was at least expecting to the the line to start with '##### ...\n # Task : ..." . Is there another place I should poke around to figure out the jobspec to condor submit file? Like where does "jobType=grid" get translated to "Universe=grid"? Thanks! -Allan From aespinosa at cs.uchicago.edu Mon May 17 19:40:26 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Mon, 17 May 2010 19:40:26 -0500 Subject: [Swift-devel] Re: provider-condor submit file generation In-Reply-To: References: Message-ID: Just to confirm, the provider does the job removal itself? > leave_in_queue = TRUE -Allan 2010/5/17 Allan Espinosa : > I was poking around the provider-condor source tree today. > > provider-condor/src/org/globus/cog/abstraction/impl/execution/condor/DescriptionFileGenerator.java:33+: > ... /TEST_158_0_subfx.sgt > extract_sgt_yfile=panfs/panasas/CMS/data/engage-aespinosa/swift/158/0/TEST_158_0_subfy.sgt > notification = Never > leave_in_queue = TRUE > queue > > I was at least expecting to the the line to start with '##### ...\n # > Task : ..." . ?Is there another place I should poke around to figure > out the jobspec to condor submit file? Like where does "jobType=grid" > get translated to "Universe=grid"? > From hategan at mcs.anl.gov Mon May 17 19:56:36 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 17 May 2010 19:56:36 -0500 Subject: [Swift-devel] provider-condor submit file generation In-Reply-To: References: Message-ID: <1274144196.24742.9.camel@localhost> I would strongly encourage the use of a debugger in such cases. In the case below an entirely different provider is used. It's in provider-localscheduler. The relevant files are in org.globus.cog.abstraction.impl.scheduler.condor. The part that deletes the script file is in org.globus.cog.abstraction.impl.scheduler.common. Somewhere in start(). I'm not sure, but I think Mike has added a condition around file removal, which disables it if you say "debug=true" in provider-condor.properties. On Mon, 2010-05-17 at 19:14 -0500, Allan Espinosa wrote: > I was poking around the provider-condor source tree today. > > provider-condor/src/org/globus/cog/abstraction/impl/execution/condor/DescriptionFileGenerator.java:33+: > ... > private static void constructDescriptionFile(File descriptionFile, > Task task) > throws IOException { > JobSpecification specification = (JobSpecification) task > .getSpecification(); > FileWriter fileWriter = new FileWriter(descriptionFile); > fileWriter.write("#####################################\n"); > fileWriter.write("# Task ID: " + task.getIdentity().toString() + "\n"); > fileWriter.write("#####################################\n\n"); > > String executable = specification.getExecutable(); > if (executable != null) { > fileWriter.write("Executable = " + executable + "\n"); > } > > String argumentString = specification.getArgumentsAsString(); > argumentString = argumentString.replaceAll("\\\"", "\\\\\""); > if (argumentString != null) { > fileWriter.write("Arguments = " + argumentString + "\n"); > } > .. > > > But when I looked at a generated condor-g job in my workflow i got: > universe = grid > grid_resource = gt2 ff-grid.unl.edu/jobmanager > stream_output = False > stream_error = False > Transfer_Executable = false > output = /home/aespinosa/workflows/cybershake/condorg/res_testjob.submit.stdout > error = /home/aespinosa/workflows/cybershake/condorg/res_testjob.submit.stderr > log = /home/aespinosa/workflows/cybershake/condorg/res_testjob.submit.log > > remote_initialdir = > /panfs/panasas/CMS/data/engage-aespinosa/swift/postproc-fireflyg_small > executable = /bin/bash > arguments = /panfs/panasas/CMS/data/engage-aespinosa/swift/postproc-fireflyg_small/shared/_swiftwrap > extract-mo6464sj -jobdir m -scratch -e > /panfs/panasas/CMS/app/engage-aespinosa/JBSim3d/bin/jbsim3d -out > stdout.txt -err stderr.txt -i -d > TEST|158/0|panfs/panasas/CMS/data/engage-aespinosa/swift/158/0 -if > TEST/TEST_fy_644.sgt|TEST/TEST_fx_644.sgt|158/0/158_0.txt.variation-s0001-h0000 > -of panfs/panasas/CMS/data/engage-aespinosa/swift/158/0/TEST_158_0_subfy.sgt|panfs/panasas/CMS/data/engage-aespinosa/swift/158/0/TEST_158_0_subfx.sgt > -k -cdmfile fs.data -status files -a stat=TEST extract_sgt=1 > slon=-118.286 slat=34.0192 > rupmodfile=158/0/158_0.txt.variation-s0001-h0000 > sgt_xfile=TEST/TEST_fx_644.sgt sgt_yfile=TEST/TEST_fy_644.sgt > extract_sgt_xfile=panfs/panasas/CMS/data/engage-aespinosa/swift/158/0/TEST_158_0_subfx.sgt > extract_sgt_yfile=panfs/panasas/CMS/data/engage-aespinosa/swift/158/0/TEST_158_0_subfy.sgt > notification = Never > leave_in_queue = TRUE > queue > > I was at least expecting to the the line to start with '##### ...\n # > Task : ..." . Is there another place I should poke around to figure > out the jobspec to condor submit file? Like where does "jobType=grid" > get translated to "Universe=grid"? > > Thanks! > -Allan > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From hategan at mcs.anl.gov Mon May 17 20:00:42 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 17 May 2010 20:00:42 -0500 Subject: [Swift-devel] Re: provider-condor submit file generation In-Reply-To: References: Message-ID: <1274144442.24742.14.camel@localhost> On Mon, 2010-05-17 at 19:40 -0500, Allan Espinosa wrote: > Just to confirm, the provider does the job removal itself? > > > leave_in_queue = TRUE There's a bunch of relevant stuff in QueuePoller.removeDoneJob. There's something else in CondorExecutor: if ("true".equals(spec.getAttribute("holdIsFailure"))) { wr.write("periodic_remove = JobStatus == 5\n"); } Which may perhaps be extended. The thing with letting condor remove the job automatically is that the exit code may not be detected. On the other had there may have been some attempts to use condor log files to process job information rather than polling the queue. I'm not sure to what extent those are in SVN. > > -Allan > > 2010/5/17 Allan Espinosa : > > I was poking around the provider-condor source tree today. > > > > provider-condor/src/org/globus/cog/abstraction/impl/execution/condor/DescriptionFileGenerator.java:33+: > > ... > > /TEST_158_0_subfx.sgt > > extract_sgt_yfile=panfs/panasas/CMS/data/engage-aespinosa/swift/158/0/TEST_158_0_subfy.sgt > > notification = Never > > leave_in_queue = TRUE > > queue > > > > I was at least expecting to the the line to start with '##### ...\n # > > Task : ..." . Is there another place I should poke around to figure > > out the jobspec to condor submit file? Like where does "jobType=grid" > > get translated to "Universe=grid"? > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From aespinosa at cs.uchicago.edu Mon May 17 20:31:32 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Mon, 17 May 2010 20:31:32 -0500 Subject: [Swift-devel] Re: provider-condor submit file generation In-Reply-To: <1274144442.24742.14.camel@localhost> References: <1274144442.24742.14.camel@localhost> Message-ID: Ah looking at the provider-localscheduler tree, everything makes sense now :) I wonder how long before swift starts to remove the completed jobs now? $ condor_q -- Submitter: communicado.ci.uchicago.edu : <128.135.125.17:44838> : communicado.ci.uchicago.edu ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 11.0 aespinosa 5/17 20:05 0+00:04:14 C 0 1.0 bash /panfs/panasa 12.0 aespinosa 5/17 20:05 0+00:03:57 C 0 1.0 bash /panfs/panasa 0 jobs; 0 idle, 0 running, 0 held According to my logs, swift has been polling for around 20 minutes no 2010-05-17 20:05:13,841-0500 DEBUG TaskImpl Task(type=JOB_SUBMISSION, identity=urn:0-12-1-6-1-1 2010-05-17 20:05:14,092-0500 INFO AbstractQueuePoller Active: 0, New: 2, Done: 0 2010-05-17 20:05:19,163-0500 INFO AbstractQueuePoller Active: 2, New: 0, Done: 0 ... ... 2010-05-17 20:25:42,473-0500 INFO AbstractQueuePoller Active: 2, New: 0, Done: 0 2010-05-17 20:25:47,548-0500 INFO AbstractQueuePoller Active: 2, New: 0, Done: 0 A snippet from the *info logs suggests that the jobs have finished much much earlier: ... Progress 2010-05-17 20:05:36.239957000-0500 EXECUTE Moving back to workflow directory /panfs/panasas/CMS/data/engage-aespinosa/swift/postproc-fireflyg_small Progress 2010-05-17 20:07:09.642855000-0500 EXECUTE_DONE Job ran successfully Progress 2010-05-17 20:07:09.655935000-0500 MOVING_OUTPUTS panfs/panasas/CMS/data/engage-aespinosa/swift/158/0/TEST_158_0_subfy.sgt|panfs/panasas/CMS/data/engage-aespinosa/swift/158/0/TEST_158_0_subfx.sgt ... The gridmanager finished around two minutes after. This is probably the time condor_q reported a 'DONE' status on the jobs: ... 5/17 20:09:47 [18799] (11.0) doEvaluateState called: gmState GM_CHECK_OUTPUT, globusState 8 05/17 20:09:49 [18799] (11.0) doEvaluateState called: gmState GM_DONE_SAVE, globusState 8 05/17 20:09:49 [18799] (11.0) doEvaluateState called: gmState GM_DONE_COMMIT, globusState 8 05/17 20:09:54 [18799] No jobs left, shutting down 05/17 20:09:54 [18799] Got SIGTERM. Performing graceful shutdown. 05/17 20:09:54 [18799] **** condor_gridmanager (condor_GRIDMANAGER) pid 18799 EXITING WITH STATUS 0 2010/5/17 Mihael Hategan : > On Mon, 2010-05-17 at 19:40 -0500, Allan Espinosa wrote: >> Just to confirm, the provider does the job removal itself? >> >> > leave_in_queue = TRUE > > There's a bunch of relevant stuff in QueuePoller.removeDoneJob. > > There's something else in CondorExecutor: > if ("true".equals(spec.getAttribute("holdIsFailure"))) { > ? ? ? ?wr.write("periodic_remove = JobStatus == 5\n"); > } > > Which may perhaps be extended. The thing with letting condor remove the > job automatically is that the exit code may not be detected. On the > other had there may have been some attempts to use condor log files to > process job information rather than polling the queue. I'm not sure to > what extent those are in SVN. > > >> >> -Allan >> >> 2010/5/17 Allan Espinosa : >> > I was poking around the provider-condor source tree today. >> > >> > provider-condor/src/org/globus/cog/abstraction/impl/execution/condor/DescriptionFileGenerator.java:33+: >> > ... >> >> /TEST_158_0_subfx.sgt >> > extract_sgt_yfile=panfs/panasas/CMS/data/engage-aespinosa/swift/158/0/TEST_158_0_subfy.sgt >> > notification = Never >> > leave_in_queue = TRUE >> > queue >> > >> > I was at least expecting to the the line to start with '##### ...\n # >> > Task : ..." . ?Is there another place I should poke around to figure >> > out the jobspec to condor submit file? Like where does "jobType=grid" >> > get translated to "Universe=grid"? >> > From hategan at mcs.anl.gov Mon May 17 21:24:21 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 17 May 2010 21:24:21 -0500 Subject: [Swift-devel] Re: provider-condor submit file generation In-Reply-To: References: <1274144442.24742.14.camel@localhost> Message-ID: <1274149461.26791.3.camel@localhost> On Mon, 2010-05-17 at 20:31 -0500, Allan Espinosa wrote: > Ah looking at the provider-localscheduler tree, everything makes sense now :) > > I wonder how long before swift starts to remove the completed jobs now? As soon as the queue is polled and it figures out that the job is done. So a minimum of zero and a maximum of the poll interval (of 5 seconds by default) plus whatever time it takes to run condor_q. If not, something ain't right. The way it removes jobs is to set LeaveJobInQueue to "FALSE". > > $ condor_q > > -- Submitter: communicado.ci.uchicago.edu : <128.135.125.17:44838> : > communicado.ci.uchicago.edu > ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD > 11.0 aespinosa 5/17 20:05 0+00:04:14 C 0 1.0 bash /panfs/panasa > 12.0 aespinosa 5/17 20:05 0+00:03:57 C 0 1.0 bash /panfs/panasa > > 0 jobs; 0 idle, 0 running, 0 held > > According to my logs, swift has been polling for around 20 minutes no > > 2010-05-17 20:05:13,841-0500 DEBUG TaskImpl Task(type=JOB_SUBMISSION, > identity=urn:0-12-1-6-1-1 > 2010-05-17 20:05:14,092-0500 INFO AbstractQueuePoller Active: 0, New: > 2, Done: 0 > 2010-05-17 20:05:19,163-0500 INFO AbstractQueuePoller Active: 2, New: > 0, Done: 0 > ... > ... > 2010-05-17 20:25:42,473-0500 INFO AbstractQueuePoller Active: 2, New: > 0, Done: 0 > 2010-05-17 20:25:47,548-0500 INFO AbstractQueuePoller Active: 2, New: > 0, Done: 0 > > > A snippet from the *info logs suggests that the jobs have finished > much much earlier: > ... > Progress 2010-05-17 20:05:36.239957000-0500 EXECUTE > Moving back to workflow directory > /panfs/panasas/CMS/data/engage-aespinosa/swift/postproc-fireflyg_small > Progress 2010-05-17 20:07:09.642855000-0500 EXECUTE_DONE > Job ran successfully > Progress 2010-05-17 20:07:09.655935000-0500 MOVING_OUTPUTS > panfs/panasas/CMS/data/engage-aespinosa/swift/158/0/TEST_158_0_subfy.sgt|panfs/panasas/CMS/data/engage-aespinosa/swift/158/0/TEST_158_0_subfx.sgt > ... > > The gridmanager finished around two minutes after. This is probably > the time condor_q reported a 'DONE' status on the jobs: > ... > 5/17 20:09:47 [18799] (11.0) doEvaluateState called: gmState > GM_CHECK_OUTPUT, globusState 8 > 05/17 20:09:49 [18799] (11.0) doEvaluateState called: gmState > GM_DONE_SAVE, globusState 8 > 05/17 20:09:49 [18799] (11.0) doEvaluateState called: gmState > GM_DONE_COMMIT, globusState 8 > 05/17 20:09:54 [18799] No jobs left, shutting down > 05/17 20:09:54 [18799] Got SIGTERM. Performing graceful shutdown. > 05/17 20:09:54 [18799] **** condor_gridmanager (condor_GRIDMANAGER) > pid 18799 EXITING WITH STATUS 0 > > > > 2010/5/17 Mihael Hategan : > > On Mon, 2010-05-17 at 19:40 -0500, Allan Espinosa wrote: > >> Just to confirm, the provider does the job removal itself? > >> > >> > leave_in_queue = TRUE > > > > There's a bunch of relevant stuff in QueuePoller.removeDoneJob. > > > > There's something else in CondorExecutor: > > if ("true".equals(spec.getAttribute("holdIsFailure"))) { > > wr.write("periodic_remove = JobStatus == 5\n"); > > } > > > > Which may perhaps be extended. The thing with letting condor remove the > > job automatically is that the exit code may not be detected. On the > > other had there may have been some attempts to use condor log files to > > process job information rather than polling the queue. I'm not sure to > > what extent those are in SVN. > > > > > >> > >> -Allan > >> > >> 2010/5/17 Allan Espinosa : > >> > I was poking around the provider-condor source tree today. > >> > > >> > provider-condor/src/org/globus/cog/abstraction/impl/execution/condor/DescriptionFileGenerator.java:33+: > >> > ... > >> > >> /TEST_158_0_subfx.sgt > >> > extract_sgt_yfile=panfs/panasas/CMS/data/engage-aespinosa/swift/158/0/TEST_158_0_subfy.sgt > >> > notification = Never > >> > leave_in_queue = TRUE > >> > queue > >> > > >> > I was at least expecting to the the line to start with '##### ...\n # > >> > Task : ..." . Is there another place I should poke around to figure > >> > out the jobspec to condor submit file? Like where does "jobType=grid" > >> > get translated to "Universe=grid"? > >> > From aespinosa at cs.uchicago.edu Tue May 18 15:15:38 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Tue, 18 May 2010 15:15:38 -0500 Subject: [Swift-devel] condor_q not being parsed properly by provider-localscheduler Message-ID: Hi, I activated debugging for *.common and *.condor packages in log4j.properties of swift 2010-05-18 15:05:25,052-0500 INFO AbstractQueuePoller Active: 2, New: 0, Done: 0 2010-05-18 15:05:30,054-0500 DEBUG AbstractQueuePoller Polling queue 2010-05-18 15:05:30,054-0500 DEBUG AbstractQueuePoller Poll command: [condor_q, -format, %s, ClusterId, -format, %d, JobStatus, -format, %d, ExitCode] 2010-05-18 15:05:30,060-0500 DEBUG QueuePoller Processing condor_q stdout 2010-05-18 15:05:30,139-0500 DEBUG AbstractQueuePoller Stderr from poll command: 2010-05-18 15:05:30,139-0500 INFO AbstractQueuePoller Active: 2, New: 0, Done: 0 2010-05-18 15:05:35,141-0500 DEBUG AbstractQueuePoller Polling queue Looking at the DEBUG statements, it looks like nothing in the case statement in condor/QueuePoller.java gets satisfied. I ran the condor command it produced manually: $ condor_q -format %s ClusterID -format %d JobStatus -format %d ExitCode 20402140$ normal condor_q output: ]$ condor_q -- Submitter: communicado.ci.uchicago.edu : <128.135.125.17:44838> : communicado.ci.uchicago.edu ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 20.0 aespinosa 5/18 14:57 0+00:02:54 C 0 1.0 bash /panfs/panasa 21.0 aespinosa 5/18 14:57 0+00:03:10 C 0 1.0 bash /panfs/panasa 0 jobs; 0 idle, 0 running, 0 held I'll poke around some more if I can get some virtual cookies :) -Allan -- Allan M. Espinosa PhD student, Computer Science University of Chicago From aespinosa at cs.uchicago.edu Tue May 18 17:50:37 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Tue, 18 May 2010 17:50:37 -0500 Subject: [Swift-devel] Patch: provider-localscheduler condor_submit job id parsing Message-ID: AbstractExecutor.getOutput() adds a newline '\n' character. So the job id that gets passed onto createJob is '##.\n' just added a line as a workaround to the newline in the CondorExecutor command parser. diff --git a/modules/provider-localscheduler/src/org/globus/cog/abstraction/impl index 7e67d64..8b257f1 100644 --- a/modules/provider-localscheduler/src/org/globus/cog/abstraction/impl/schedu +++ b/modules/provider-localscheduler/src/org/globus/cog/abstraction/impl/schedu @@ -236,6 +236,7 @@ public class CondorExecutor extends AbstractExecutor { } protected String parseSubmitCommandOutput(String out) throws IOException + out = out.trim(); if (out.endsWith(".")) { out = out.substring(0, out.length() - 1); } -- Allan M. Espinosa PhD student, Computer Science University of Chicago From hategan at mcs.anl.gov Tue May 18 20:14:32 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 18 May 2010 20:14:32 -0500 Subject: [Swift-devel] condor_q not being parsed properly by provider-localscheduler In-Reply-To: References: Message-ID: <1274231672.5295.5.camel@localhost> On Tue, 2010-05-18 at 15:15 -0500, Allan Espinosa wrote: > I'll poke around some more if I can get some virtual cookies :) I personally guarantee an amount of virtual cookies commensurate with the effort. From hategan at mcs.anl.gov Tue May 18 20:18:26 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 18 May 2010 20:18:26 -0500 Subject: [Swift-devel] Patch: provider-localscheduler condor_submit job id parsing In-Reply-To: References: Message-ID: <1274231906.5295.6.camel@localhost> cog trunk/r2752 and one virtual cookie. Mihael On Tue, 2010-05-18 at 17:50 -0500, Allan Espinosa wrote: > AbstractExecutor.getOutput() adds a newline '\n' character. So the > job id that gets passed onto createJob is '##.\n' > > just added a line as a workaround to the newline in the CondorExecutor > command parser. > > diff --git a/modules/provider-localscheduler/src/org/globus/cog/abstraction/impl > index 7e67d64..8b257f1 100644 > --- a/modules/provider-localscheduler/src/org/globus/cog/abstraction/impl/schedu > +++ b/modules/provider-localscheduler/src/org/globus/cog/abstraction/impl/schedu > @@ -236,6 +236,7 @@ public class CondorExecutor extends AbstractExecutor { > } > > protected String parseSubmitCommandOutput(String out) throws IOException > + out = out.trim(); > if (out.endsWith(".")) { > out = out.substring(0, out.length() - 1); > } > > From aespinosa at cs.uchicago.edu Mon May 24 16:05:04 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Mon, 24 May 2010 16:05:04 -0500 Subject: [Swift-devel] computing job queue time. Message-ID: Hi, I just wanted to make sure my assumptions are correct: execute2 = dostagein time + queue time + running time + dostageout time So to get queue time, i just subtract dostagein and dostageout from logs and execution timestamps from *info files right? of course there's always overheads for state transitions of jobs which is much much lower because of the constrainst of the storage and computation resources themselves. thanks, -Allan -- Allan M. Espinosa PhD student, Computer Science University of Chicago From hategan at mcs.anl.gov Mon May 24 16:41:03 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 24 May 2010 16:41:03 -0500 Subject: [Swift-devel] computing job queue time. In-Reply-To: References: Message-ID: <1274737263.14449.4.camel@blabla2.none> On Mon, 2010-05-24 at 16:05 -0500, Allan Espinosa wrote: > Hi, > > I just wanted to make sure my assumptions are correct: > > execute2 = dostagein time + queue time + running time + dostageout time pretty much > > So to get queue time, i just subtract dostagein and dostageout from > logs and execution timestamps from *info files right? to get queuing time look at the difference between Submitted and Active events. Unless running with coasters in which case queuing time does not reflect the time a job spends in the LRM queue. > > of course there's always overheads for state transitions of jobs which > is much much lower because of the constrainst of the storage and > computation resources themselves. Right. Either way. From hategan at mcs.anl.gov Mon May 24 16:42:08 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 24 May 2010 16:42:08 -0500 Subject: [Swift-devel] computing job queue time. In-Reply-To: References: Message-ID: <1274737328.14449.5.camel@blabla2.none> On Mon, 2010-05-24 at 16:05 -0500, Allan Espinosa wrote: > Hi, > > I just wanted to make sure my assumptions are correct: > > execute2 = dostagein time + queue time + running time + dostageout time Pretty much. > > So to get queue time, i just subtract dostagein and dostageout from > logs and execution timestamps from *info files right? You should look at the difference between Submitted and Active. Though I'm not sure to what extent different providers keep the timestamp of original events, or to what extent the middleware itself provides these timestamps. But the ideal case would be that in which you can extract the exact time at which an event occurred (not when it was received). > > of course there's always overheads for state transitions of jobs which > is much much lower because of the constrainst of the storage and > computation resources themselves. Right. Though see above comment. From hategan at mcs.anl.gov Mon May 24 16:43:59 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 24 May 2010 16:43:59 -0500 Subject: [Swift-devel] computing job queue time. In-Reply-To: <1274737263.14449.4.camel@blabla2.none> References: <1274737263.14449.4.camel@blabla2.none> Message-ID: <1274737439.14680.0.camel@blabla2.none> Ah, so when the window disappeared, I pressed the shortcut for "send email"... On Mon, 2010-05-24 at 16:41 -0500, Mihael Hategan wrote: > On Mon, 2010-05-24 at 16:05 -0500, Allan Espinosa wrote: > > Hi, > > > > I just wanted to make sure my assumptions are correct: > > > > execute2 = dostagein time + queue time + running time + dostageout time > > pretty much > > > > > So to get queue time, i just subtract dostagein and dostageout from > > logs and execution timestamps from *info files right? > > to get queuing time look at the difference between Submitted and Active > events. Unless running with coasters in which case queuing time does not > reflect the time a job spends in the LRM queue. > > > > > > of course there's always overheads for state transitions of jobs which > > is much much lower because of the constrainst of the storage and > > computation resources themselves. > > Right. Either way. > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From aespinosa at cs.uchicago.edu Mon May 24 19:58:53 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Mon, 24 May 2010 19:58:53 -0500 Subject: [Swift-devel] coaster worker syntax errors Message-ID: I would like to use trunk to be able to use the CDM features when running on OSG resources. In the meantime i'll tryout cog-stable and swift-trunk. swift-r3288 cog-r2752 RunID: 20100524-1955-h9to7jt4 Progress: Progress: Selecting site:1 Initializing site shared directory:1 Stage in:1 Progress: Submitting:2 Submitted:1 Progress: Submitted:2 Active:1 Failed to transfer wrapper log from 066-many-20100524-1955-h9to7jt4/info/n on TERAPORT Progress: Submitted:2 Failed:1 Execution failed: Exception in sleep: Arguments: [300] Host: TERAPORT Directory: 066-many-20100524-1955-h9to7jt4/jobs/n/sleep-nivmsfsj stderr.txt: stdout.txt: ---- Caused by: Task failed: 0524-550712-000000 Block task ended prematurely Use of uninitialized value in concatenation (.) or string at /home/osgvo/engage/.globus/coasters/cscript2747241842007159708.pl line 192. Failed to connect: Illegal seek at /home/osgvo/engage/.globus/coasters/cscript2747241842007159708.pl line 169. Cleaning up... Shutting down service at https://128.135.125.118:57300 Got channel MetaChannel: 988943951 -> GSSSChannel-01884231335(1) + Done -- Allan M. Espinosa PhD student, Computer Science University of Chicago From aespinosa at cs.uchicago.edu Mon May 24 20:04:08 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Mon, 24 May 2010 20:04:08 -0500 Subject: [Swift-devel] swift tree getting build options from cog? Message-ID: Hi, I'm building swift-trunk together with cog-stable but get these build errors: [javac] ^ [javac] /autonfs/home/aespinosa/swift/cogkit/modules/swift/src/org/globus/swift/data/policy/Local.java:11: annotations are not supported in -source 1.4 [javac] (try -source 1.5 to enable annotations) [javac] @Override [javac] ^ [javac] /autonfs/home/aespinosa/swift/cogkit/modules/swift/src/org/globus/swift/data/policy/Local.java:12: generics are not supported in -source 1.4 [javac] (try -source 1.5 to enable generics) [javac] public void settings(List tokens) { [javac] ^ [javac] /autonfs/home/aespinosa/swift/cogkit/modules/swift/src/org/globus/swift/data/policy/Policy.java:13: generics are not supported in -source 1.4 [javac] (try -source 1.5 to enable generics) [javac] public abstract void settings(List tokens); [javac] ^ [javac] /autonfs/home/aespinosa/swift/cogkit/modules/swift/src/org/griphyn/vdl/karajan/lib/SiteProfile.java:53: generics are not supported in -source 1.4 [javac] (try -source 1.5 to enable generics) [javac] private static final Set DEFAULTS_NAMES; [javac] ^ [javac] /autonfs/home/aespinosa/swift/cogkit/modules/swift/src/org/griphyn/vdl/karajan/lib/swiftscript/Java.java:62: for-each loops are not supported in -source 1.4 [javac] (try -source 1.5 to enable for-each loops) [javac] for (Method m : methods) { [javac] ^ [javac] 16 errors BUILD FAILED /autonfs/home/aespinosa/swift/cogkit/modules/swift/build.xml:73: The following error occurred while executing this line: /autonfs/home/aespinosa/swift/cogkit/mbuild.xml:464: The following error occurred while executing this line: /autonfs/home/aespinosa/swift/cogkit/mbuild.xml:228: Compile failed; see the compiler error output for details. Total time: 26 seconds Is my guess correct that the swift tree is picking-up build options from the cogtree aside from the classpaths for necessary jars? Thanks, -Allan -- Allan M. Espinosa PhD student, Computer Science University of Chicago From hategan at mcs.anl.gov Mon May 24 20:31:59 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 24 May 2010 20:31:59 -0500 Subject: [Swift-devel] swift tree getting build options from cog? In-Reply-To: References: Message-ID: <1274751119.18333.2.camel@blabla2.none> Re: [Swift-devel] swift tree getting build options from cog? That's right. Edit modules/../mbuild.xml and change force.java.version to 1.5. On Mon, 2010-05-24 at 20:04 -0500, Allan Espinosa wrote: [...] From aespinosa at cs.uchicago.edu Mon May 24 20:48:15 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Mon, 24 May 2010 20:48:15 -0500 Subject: [Swift-devel] swift tree getting build options from cog? In-Reply-To: <1274751119.18333.2.camel@blabla2.none> References: <1274751119.18333.2.camel@blabla2.none> Message-ID: ahhh.. thanks! 2010/5/24 Mihael Hategan : > Re: [Swift-devel] swift tree getting build options from cog? > > That's right. Edit modules/../mbuild.xml and change force.java.version > to 1.5. > > On Mon, 2010-05-24 at 20:04 -0500, Allan Espinosa wrote: > [...] > > From aespinosa at cs.uchicago.edu Mon May 24 20:52:06 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Mon, 24 May 2010 20:52:06 -0500 Subject: [Swift-devel] swift tree getting build options from cog? In-Reply-To: <1274751119.18333.2.camel@blabla2.none> References: <1274751119.18333.2.camel@blabla2.none> Message-ID: It seems that swift-trunk depends on specific cog-trunk coasters classes that are not compatible and thus still breaks the build: [echo] [swift]: COMPILE [mkdir] Created dir: /autonfs/home/aespinosa/swift/cogkit/modules/swift/buil d [javac] Compiling 357 source files to /autonfs/home/aespinosa/swift/cogkit/m odules/swift/build [javac] /autonfs/home/aespinosa/swift/cogkit/modules/swift/src/org/globus/sw ift/data/policy/AllocationHook.java:6: cannot find symbol [javac] symbol : class Hook [javac] location: package org.globus.cog.abstraction.coaster.service.job.man ager [javac] import org.globus.cog.abstraction.coaster.service.job.manager.Hook; [javac] ^ [javac] /autonfs/home/aespinosa/swift/cogkit/modules/swift/src/org/globus/swift/data/policy/AllocationHook.java:13: cannot find symbol [javac] symbol: class Hook [javac] public class AllocationHook extends Hook [javac] ^ [javac] /autonfs/home/aespinosa/swift/cogkit/modules/swift/src/org/griphyn/vdl/karajan/lib/Execute.java:37: cannot find symbol [javac] symbol : variable C_STAGEIN [javac] location: class org.griphyn.vdl.karajan.lib.Execute [javac] A_REPLICATION_CHANNEL, A_JOBID, C_STAGEIN, C_STAGEOUT, C_CLEANUP }); [javac] ^ [javac] /autonfs/home/aespinosa/swift/cogkit/modules/swift/src/org/griphyn/vdl/karajan/lib/Execute.java:37: cannot find symbol [javac] symbol : variable C_STAGEOUT [javac] location: class org.griphyn.vdl.karajan.lib.Execute [javac] A_REPLICATION_CHANNEL, A_JOBID, C_STAGEIN, C_STAGEOUT, C_CLEANUP }); [javac] ^ [javac] /autonfs/home/aespinosa/swift/cogkit/modules/swift/src/org/griphyn/vdl/karajan/lib/Execute.java:37: cannot find symbol [javac] symbol : variable C_CLEANUP [javac] location: class org.griphyn.vdl.karajan.lib.Execute [javac] A_REPLICATION_CHANNEL, A_JOBID, C_STAGEIN, C_STAGEOUT, C_CLEANUP }); [javac] ^ [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. 2010/5/24 Mihael Hategan : > Re: [Swift-devel] swift tree getting build options from cog? > > That's right. Edit modules/../mbuild.xml and change force.java.version > to 1.5. > > On Mon, 2010-05-24 at 20:04 -0500, Allan Espinosa wrote: > [...] From hategan at mcs.anl.gov Mon May 24 20:58:35 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 24 May 2010 20:58:35 -0500 Subject: [Swift-devel] swift tree getting build options from cog? In-Reply-To: References: <1274751119.18333.2.camel@blabla2.none> Message-ID: <1274752715.18987.5.camel@blabla2.none> Now clearly when you try to breed a giraffe with a pig, you are unlikely to get a pony. But you could do a diff between trunk and branch and try to select only the necessary features. You will need the hook(s) for the CDM stuff, but you can probably add noops for the stage-in/out code (i.e. just copy the channel argument definitions or remove their use in swift trunk). On Mon, 2010-05-24 at 20:52 -0500, Allan Espinosa wrote: > It seems that swift-trunk depends on specific cog-trunk coasters > classes that are not compatible and thus still breaks the build: > > [echo] [swift]: COMPILE > [mkdir] Created dir: /autonfs/home/aespinosa/swift/cogkit/modules/swift/buil > d > [javac] Compiling 357 source files to /autonfs/home/aespinosa/swift/cogkit/m > odules/swift/build > [javac] /autonfs/home/aespinosa/swift/cogkit/modules/swift/src/org/globus/sw > ift/data/policy/AllocationHook.java:6: cannot find symbol > [javac] symbol : class Hook > [javac] location: package org.globus.cog.abstraction.coaster.service.job.man > ager > [javac] import org.globus.cog.abstraction.coaster.service.job.manager.Hook; > [javac] ^ > [javac] /autonfs/home/aespinosa/swift/cogkit/modules/swift/src/org/globus/swift/data/policy/AllocationHook.java:13: > cannot find symbol > [javac] symbol: class Hook > [javac] public class AllocationHook extends Hook > [javac] ^ > [javac] /autonfs/home/aespinosa/swift/cogkit/modules/swift/src/org/griphyn/vdl/karajan/lib/Execute.java:37: > cannot find symbol > [javac] symbol : variable C_STAGEIN > [javac] location: class org.griphyn.vdl.karajan.lib.Execute > [javac] A_REPLICATION_CHANNEL, > A_JOBID, C_STAGEIN, C_STAGEOUT, C_CLEANUP }); > [javac] ^ > [javac] /autonfs/home/aespinosa/swift/cogkit/modules/swift/src/org/griphyn/vdl/karajan/lib/Execute.java:37: > cannot find symbol > [javac] symbol : variable C_STAGEOUT > [javac] location: class org.griphyn.vdl.karajan.lib.Execute > [javac] A_REPLICATION_CHANNEL, > A_JOBID, C_STAGEIN, C_STAGEOUT, C_CLEANUP }); > [javac] > ^ > [javac] /autonfs/home/aespinosa/swift/cogkit/modules/swift/src/org/griphyn/vdl/karajan/lib/Execute.java:37: > cannot find symbol > [javac] symbol : variable C_CLEANUP > [javac] location: class org.griphyn.vdl.karajan.lib.Execute > [javac] A_REPLICATION_CHANNEL, > A_JOBID, C_STAGEIN, C_STAGEOUT, C_CLEANUP }); > [javac] > ^ > [javac] Note: Some input files use or override a deprecated API. > [javac] Note: Recompile with -Xlint:deprecation for details. > [javac] Note: Some input files use unchecked or unsafe operations. > [javac] Note: Recompile with -Xlint:unchecked for details. > > > 2010/5/24 Mihael Hategan : > > Re: [Swift-devel] swift tree getting build options from cog? > > > > That's right. Edit modules/../mbuild.xml and change force.java.version > > to 1.5. > > > > On Mon, 2010-05-24 at 20:04 -0500, Allan Espinosa wrote: > > [...] From benc at hawaga.org.uk Tue May 25 01:20:24 2010 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 25 May 2010 06:20:24 +0000 (GMT) Subject: [Swift-devel] computing job queue time. In-Reply-To: <1274737328.14449.5.camel@blabla2.none> References: <1274737328.14449.5.camel@blabla2.none> Message-ID: > You should look at the difference between Submitted and Active. > > Though I'm not sure to what extent different providers keep the > timestamp of original events, or to what extent the middleware itself > provides these timestamps. But the ideal case would be that in which you > can extract the exact time at which an event occurred (not when it was > received). For GRAM2 and GRAM4, Active notifications were both pretty poor (a minute error or more would not be surprising). There's a log plot graph or two somewhere to compare info start time with Active notification time to show that. -- From wilde at mcs.anl.gov Tue May 25 18:50:19 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 25 May 2010 18:50:19 -0500 (CDT) Subject: [Swift-devel] Welcome 2010 Summer Students In-Reply-To: <32143197.79111274811922068.JavaMail.root@zimbra> Message-ID: <15794390.99011274831419238.JavaMail.root@zimbra> I want to welcome the 2010 summer students that will be working on Swift: * Arjun Comar, Rose Hulman Institute, IN, will be working on Swift multisite execution * David Kelly, Shippensburg University, PA, will be working on Swift tutorials * Jon Monette, UTexas Austin, TX, will be working on the Montage application in Swift * Thiago Silva, Campina Grande, Brazil, will be working on the Mosa distributed filesystem * Dennis Touchet, UTexas Brownsville, TX, will be working on Swift testing Jon will work on site at the CI and Argonne; Arjun, David, Thiago, and Dennis will work remotely. Arjun, being located near Argonne will come to Argonne roughly once a week. David and Thiago are GSoC students, Jon is an Argonne SULI student, Arjun is working for the CI, and Dennis is working for UTB. The intent is that the students will work as part of an interlocking and extended effort to enhance and apply Swift. I see the efforts of David, Arjun, and Dennis as interlocking very tightly, with Jon working as a Swift user and Thiago as providing a new component for Swift usage on high performance systems. Dan Katz will supervise Jon's work on Montage abd Justin will supervise Thiago's work in coordination with Matei Ripeanu. We'll start using the Swift lists now for almost all our collaboration on these projects: swift-devel for development issues swift-user for "how to use Swift" questions There's also a list "swft at ci.uchicago.edu" list for internal group issues specific to the UChicago/Argonne community like local machine resources, etc. This is seldom needed. - Mike From wilde at mcs.anl.gov Tue May 25 19:18:57 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 25 May 2010 19:18:57 -0500 (CDT) Subject: [Swift-devel] Notes on maintaining the swift web and docs Message-ID: <5124436.99271274833137828.JavaMail.root@zimbra> To help David in creating a procedure for test copies of the Swift web and for generating the Swift docs with docbook, I placed notes gleaned from swift-devel at: http://www.ci.uchicago.edu/wiki/bin/view/SWFT/MaintainingSwiftWebContent If anyone has tips to refine those instructions, please update that page or post suggestions here for David to try and then post for general use. As suggested on that page, most of the notes should wind up in README files at www/ and docs/ - Mike From wilde at mcs.anl.gov Tue May 25 19:34:29 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 25 May 2010 19:34:29 -0500 (CDT) Subject: [Swift-devel] Re: Welcome to the CI and Swift group In-Reply-To: <9997105.99321274833613083.JavaMail.root@zimbra> Message-ID: <29924550.99461274834069675.JavaMail.root@zimbra> Ben, Mihael, Justin: I've sent the following getting-started info to Dennis Touchet of UTB to help get him started on the summer project of Swift testing: - extend local language tests - extend testing to do good cluster and grid tests - revive and extend continuous NMI Build & Test processing If you have additional pointers or advice to supplement what I listed below, please provide. Thanks, Mike --- Things to start reading up on and experimenting with: - try Swift tutorials. These will be updated soon by David Kelly, who will welcome your feedback. - download Swift source from SVN and try running the language test suite - look at the structure of the tests to learn how to add new tests - Read up on the NSF NMI "Build and Test System" called "Metronome" from UWisconsin: https://nmi.cs.wisc.edu/ You should request a Metronome user login from that web site, and read their many documents. Some Swift tests were placed under Metronome. The person who created and maintained this (and all Swift testing), Ben Clifford, left the group a year ago, and so these tests and the whole test effort stopped. With your help this summer, we want to revive that and start running and growing our tests once again. Hope this little summary helps you get started in thinking, reading, and experimenting. Lets do a phone call when you are ready to start, and then have you join regular (hopefully daily) short group phone calls to coordinate our efforts. I will echo the essence of this email to the Swift-devel list so that people can chime in with hopefully helpful advice. Regards, Mike From wilde at mcs.anl.gov Tue May 25 19:35:31 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 25 May 2010 19:35:31 -0500 (CDT) Subject: [Swift-devel] Startup pointers on Swift testing procedures In-Reply-To: <29924550.99461274834069675.JavaMail.root@zimbra> Message-ID: <2880641.99491274834131816.JavaMail.root@zimbra> [resending with a proper subject] Ben, Mihael, Justin: I've sent the following getting-started info to Dennis Touchet of UTB to help get him started on the summer project of Swift testing: - extend local language tests - extend testing to do good cluster and grid tests - revive and extend continuous NMI Build & Test processing If you have additional pointers or advice to supplement what I listed below, please provide. Thanks, Mike --- Things to start reading up on and experimenting with: - try Swift tutorials. These will be updated soon by David Kelly, who will welcome your feedback. - download Swift source from SVN and try running the language test suite - look at the structure of the tests to learn how to add new tests - Read up on the NSF NMI "Build and Test System" called "Metronome" from UWisconsin: https://nmi.cs.wisc.edu/ You should request a Metronome user login from that web site, and read their many documents. Some Swift tests were placed under Metronome. The person who created and maintained this (and all Swift testing), Ben Clifford, left the group a year ago, and so these tests and the whole test effort stopped. With your help this summer, we want to revive that and start running and growing our tests once again. Hope this little summary helps you get started in thinking, reading, and experimenting. Lets do a phone call when you are ready to start, and then have you join regular (hopefully daily) short group phone calls to coordinate our efforts. I will echo the essence of this email to the Swift-devel list so that people can chime in with hopefully helpful advice. Regards, Mike From wilde at mcs.anl.gov Wed May 26 10:02:14 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 26 May 2010 10:02:14 -0500 (CDT) Subject: [Swift-devel] Tools for checking grid sites Message-ID: <28776944.111211274886134010.JavaMail.root@zimbra> As Arjun gets started on the multi-site execution project, it would be helpful to point him to good tools to check out site functionality for GRAM and GridFTP. Many people have written scripts for this. Can you send your suggestions on such tools to the list? Which work well, which dont? Thanks, - Mike From aespinosa at cs.uchicago.edu Wed May 26 10:08:29 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Wed, 26 May 2010 10:08:29 -0500 Subject: [Swift-devel] Tools for checking grid sites In-Reply-To: <28776944.111211274886134010.JavaMail.root@zimbra> References: <28776944.111211274886134010.JavaMail.root@zimbra> Message-ID: The OSG docs shows how to poke their BDII using ldapsearch and condor_status: https://twiki.grid.iu.edu/bin/view/Documentation/FindAvailableResource I rewrote some of the ADEM code that uses the above methods (http://github.com/aespinosa/adem) 2010/5/26 Michael Wilde : > As Arjun gets started on the multi-site execution project, it would be helpful to point him to good tools to check out site functionality for GRAM and GridFTP. Many people have written scripts for this. > > Can you send your suggestions on such tools to the list? Which work well, which dont? > > Thanks, > > - Mike From benc at hawaga.org.uk Wed May 26 10:22:00 2010 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 26 May 2010 15:22:00 +0000 (GMT) Subject: [Swift-devel] Tools for checking grid sites In-Reply-To: <28776944.111211274886134010.JavaMail.root@zimbra> References: <28776944.111211274886134010.JavaMail.root@zimbra> Message-ID: > As Arjun gets started on the multi-site execution project, it would be > helpful to point him to good tools to check out site functionality for > GRAM and GridFTP. Many people have written scripts for this. > > Can you send your suggestions on such tools to the list? Which work > well, which dont? The engage VO used to have a pretty good setup for monitorin gsites for suitability for Engage apps - they took info from OSG and ran their own probe-based filtering on top. Mats Rynge was heavily involved in that, but probably not now as he works for someone else these days. -- From iraicu at cs.uchicago.edu Wed May 26 10:24:17 2010 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Wed, 26 May 2010 10:24:17 -0500 Subject: [Swift-devel] Call for Participation at ACM ScienceCloud2010 Message-ID: <4BFD3D21.5020505@cs.uchicago.edu> Dear all, We are less than a month away from the 1st ACM Workshop on Scientific Cloud Computing (ScienceCloud2010, http://dsl.cs.uchicago.edu/ScienceCloud2010/), which is co-located with the ACM International Symposium on High Performance Distributed Computing (HPDC, http://hpdc2010.eecs.northwestern.edu/), and will take place in Chicago Illinois on Monday June 21st. I wanted to bring a few things to your attention: 1. We have Dr. Dennis Gannon from Microsoft Research giving the keynote talk (http://dsl.cs.uchicago.edu/ScienceCloud2010/program.htm#Keynote)! 2. The complete program is now online at http://dsl.cs.uchicago.edu/ScienceCloud2010/program.htm, including all session chairs and accepted papers. 3. We have a panel discussion organized as a conclusion to the workshop; more information on the panel can be found at http://dsl.cs.uchicago.edu/ScienceCloud2010/program.htm#Panel_on_Scientific_Cloud_Computing. The panel discussion moderator is Dr. Pete Beckman (Argonne National Laboratory & University of Chicago), and the five panelists are Dr. Manish Parashar (Rutgers University), Dr. Dennis Gannon (Microsoft Research), Dr. Kate Keahey (Argonne National Laboratory & University of Chicago), Dr. Peter Dinda (Northwestern University), and Dr. Bob Grossman (University of Illinois at Chicago). Please submit questions for the panel discussion you would like to see answered during the panel to iraicu at eecs.northwestern.edu with a subject "ScienceCloud2010: panel discussion questions". 4. With the generous sponsorship of Microsoft Research and Indiana University, we will be organizing and hosting a general workshop reception/dinner on Monday evening, for which all registered HPDC and/or workshops attendees will be able to attend free of charge. The specifics around this reception/dinner are still in the works, but if you will be in Chicago on Monday evening, keep your calendar open. Don't forget that the HPDC/workshop early registration ends June 1st, after which the registration fees will increase. We look forward to seeing you in Chicago next month! Regards, Dr. Ioan Raicu, Dr. Pete Beckman, and Dr. Ian Foster ACM ScienceCloud2010 Chairs http://dsl.cs.uchicago.edu/ScienceCloud2010/ -- ================================================================= Ioan Raicu, Ph.D. NSF/CRA Computing Innovation Fellow ================================================================= Center for Ultra-scale Computing and Information Security (CUCIS) Department of Electrical Engineering and Computer Science Northwestern University 2145 Sheridan Rd, Tech M384 Evanston, IL 60208-3118 ================================================================= Cel: 1-847-722-0876 Tel: 1-847-491-8163 Email: iraicu at eecs.northwestern.edu Web: http://www.eecs.northwestern.edu/~iraicu/ https://wiki.cucis.eecs.northwestern.edu/ ================================================================= ================================================================= -------------- next part -------------- An HTML attachment was scrubbed... URL: From iraicu at cs.uchicago.edu Wed May 26 11:50:21 2010 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Wed, 26 May 2010 11:50:21 -0500 Subject: [Swift-devel] Call for Participation: 19th ACM International Symposium on High Performance Distributed Computing (HPDC) 2010 Message-ID: <4BFD514D.3020209@cs.uchicago.edu> HPDC 2010 Call For Participation 19^th ACM International Symposium on High Performance Distributed Computing Chicago, Illinois, USA June 20-25, 2010 http://hpdc2010.eecs.northwestern.edu/ Overview The ACM International Symposium on High Performance Distributed Computing (HPDC) is the premier venue for presenting the latest research on the design, implementation, evaluation, and use of parallel and distributed systems for high performance and high end computing. The 19th installment of HPDC will take place in the heart of Chicago, Illinois, the third largest city in the United States and a major technological and cultural capital located on the shore of Lake Michigan, one of the largest freshwater lakes in the world. Highlights # 3 Keynotes: Guy Steele (Sun/Oracle), Randal Bryant (CMU), Robert Harrison (ORNL) # Single track presentation of 23 full papers (25% acceptance rate) and 22 posters / Workflows, Resources and Clouds, Map Reduce and Debugging, Data Centers and Virtualization, Storage and I/O, Applications and Provenance, Communication and Scheduling, Best Papers/ # Panel on expanding parallel programming beyond the usual suspects # Industry session /(email hpdc2010-industry-session at presciencelab.org for participation)/ # Wild and crazy ideas session / (email hpdc2010-wild at presciencelab.org for participation)/ # 8 co-located workshops , including 8 keynotes, 5 panels, and 62 paper presentations / *ECMLS*: Emerging Computational Methods for the Life Sciences; *LSAP*: Large-Scale System and Application Performance; *MDQCS*: Managing Data Quality for Collaborative Science; *ScienceCloud*: Workshop on Scientific Cloud Computing; *CLADE*: Challenges of Large Applications in Distributed Environments; *DIDC*: Data Intensive Distributed Computing; *MAPREDUCE*: Map Reduce and its Applications; *VTDC*: Virtualization Technologies for Distributed Computing/ # Co-located Open Grid Forum meeting (OGF29) # Full program available from web site (hpdc.org) Venue, Registration, Student Travel Grants, and Sponsorship All events will take place at the Doubletree Hotel Chicago Magnificent Mile, located within easy walking distance of numerous downtown Chicago attractions and Lake Michigan (including beaches), with easy access to the entire Chicago area via subway. Discounted registration is available through May 31. For more information and to register, please visit http://hpdc2010.eecs.northwestern.edu/. HPDC is pleased to acknowledge support or sponsorship from ACM, NSF, Intel, Google, NVIDIA, Microsoft Research, the Computation Institute at the University of Chicago, Argonne National Lab, the Digital Science Center at Indiana University, the NSF Center for Autonomic Computing at the University of Arizona, and the Department of Electrical Engineering and Computer Science at Northwestern University. -- ================================================================= Ioan Raicu, Ph.D. NSF/CRA Computing Innovation Fellow ================================================================= Center for Ultra-scale Computing and Information Security (CUCIS) Department of Electrical Engineering and Computer Science Northwestern University 2145 Sheridan Rd, Tech M384 Evanston, IL 60208-3118 ================================================================= Cel: 1-847-722-0876 Tel: 1-847-491-8163 Email: iraicu at eecs.northwestern.edu Web: http://www.eecs.northwestern.edu/~iraicu/ https://wiki.cucis.eecs.northwestern.edu/ ================================================================= ================================================================= -------------- next part -------------- An HTML attachment was scrubbed... URL: From dk0966 at cs.ship.edu Thu May 27 01:39:21 2010 From: dk0966 at cs.ship.edu (David Kelly) Date: Thu, 27 May 2010 02:39:21 -0400 Subject: [Swift-devel] Re: Notes on maintaining the swift web and docs Message-ID: Hello all, I have been trying to generate the Swift docs with docbook, but am having only partial success. If anyone happens to be familiar with this process, any help would be appreciated. I have read the README and wiki link that Michael sent, but there are still a few issues. Here is an overview of everything I have done thus far: After making some changes to the documentation, I created a symlink to docbooks. I have formatting/docbooks pointing to /home/hategan/docbook/docbook-xsl. I have tried all of the other docbook directories mentioned in the readme and wiki, but they all seem to behave the same Next when I attempted to run buildguides.sh I got an error message that fop/fop.sh was not found. I found /home/hategan/fop.tar.gz and uncompressed it into the doc directory. That got rid of the fop errors, but then I received "Error: JAVA_HOME is not defined correctly" Then I followed a maze of symlinks to what I believe is the correct JAVA_HOME on this system (login.ci.uchicago.edu) at /usr/lib/jvm/jre-1.4.2-gcj Then I received the following errors: [INFO] Using org.apache.xerces.parsers.SAXParser as SAX2 Parser [Fatal Error] refentry.xsl:314:58: The string "--" is not permitted within comments. [ERROR] javax.xml.transform.TransformerConfigurationException: javax.xml.transform.TransformerException: org.xml.sax.SAXParseException: The string "--" is not permitted within comments. make: *** [userguide.pdf] Error 2 I tracked refentry.xsl down and found it in /usr/share/apps/ksgmltools2/docbook/xsl/html/refentry.xsl. I could try to modify the comments but I do not have permission. The error is preventing pdf files from being created. After this section fails, it moves to create the PHP files. The PHP files get created, and works for the most part except for one nagging problem. Next to each section number (1.1, 1.2, etc) a diamond with a question mark inside of it appears. I double checked that they do not exist in the xml file, only the php file I think this is some strange character map issue. The PHP file sets the character set to UTF-8. When I change firefox to render the page in ISO8859-15 it works fine. I have tried changing the character set of the php file, and also tried setting the character set through .htaccess with no luck (the pages are being served out of my public_html directory for testing) Any other ideas and suggestions would be appreciated. Thanks! Regards, David Message: 2 > Date: Tue, 25 May 2010 19:18:57 -0500 (CDT) > From: Michael Wilde > Subject: [Swift-devel] Notes on maintaining the swift web and docs > To: Swift Devel > Message-ID: <5124436.99271274833137828. > JavaMail.root at zimbra> > Content-Type: text/plain; charset=utf-8 > > To help David in creating a procedure for test copies of the Swift web and > for generating the Swift docs with docbook, I placed notes gleaned from > swift-devel at: > > http://www.ci.uchicago.edu/wiki/bin/view/SWFT/MaintainingSwiftWebContent > > If anyone has tips to refine those instructions, please update that page or > post suggestions here for David to try and then post for general use. > > As suggested on that page, most of the notes should wind up in README files > at www/ and docs/ > > - Mike > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Thu May 27 09:49:43 2010 From: wilde at mcs.anl.gov (wilde at mcs.anl.gov) Date: Thu, 27 May 2010 09:49:43 -0500 (CDT) Subject: [Swift-devel] Re: Notes on maintaining the swift web and docs In-Reply-To: <12039214.151151274971099042.JavaMail.root@zimbra> Message-ID: <337985.151961274971783218.JavaMail.root@zimbra> David, here's what I think may fix the diamond problem: On Oct 13, 2009, at 11:25 PM, Michael Wilde wrote: > Rob, I think I found the param that makes it go away (ie gens the > html pages in UTF-8 encoding. > > It was: > > com$ pwd > /ci/www/projects/osgedu/build/docbook-xsl/html > com$ diff chunker.xsl chunker.xsl.orig > 28,29c28 > < select="'ISO-8859-1'"/> > < > --- > > > com$ > > Does this fix it on all of the pages? > > gridcolombia looks OK to me now as did other things I spot checked. > > After changing the file chunker.xsl above, I just did /ci/www/ > projects/osgedu/update.sh - Mike ----- "David Kelly" wrote: > Hello all, > > I have been trying to generate the Swift docs with docbook, but am > having only partial success. If anyone happens to be familiar with > this process, any help would be appreciated. > > I have read the README and wiki link that Michael sent, but there are > still a few issues. Here is an overview of everything I have done thus > far: > > After making some changes to the documentation, I created a symlink to > docbooks. > I have formatting/docbooks pointing to > /home/hategan/docbook/docbook-xsl. > I have tried all of the other docbook directories mentioned in the > readme and wiki, but they all seem to behave the same > Next when I attempted to run buildguides.sh I got an error message > that fop/fop.sh was not found. > I found /home/hategan/fop.tar.gz and uncompressed it into the doc > directory. > That got rid of the fop errors, but then I received "Error: JAVA_HOME > is not defined correctly" > Then I followed a maze of symlinks to what I believe is the correct > JAVA_HOME on this system ( login.ci.uchicago.edu ) at > /usr/lib/jvm/jre-1.4.2-gcj > Then I received the following errors: > > [INFO] Using org.apache.xerces.parsers.SAXParser as SAX2 Parser > [Fatal Error] refentry.xsl:314:58: The string "--" is not permitted > within comments. > [ERROR] javax.xml.transform.TransformerConfigurationException: > javax.xml.transform.TransformerException: > org.xml.sax.SAXParseException: The string "--" is not permitted within > comments. > make: *** [userguide.pdf] Error 2 > > I tracked refentry.xsl down and found it in > /usr/share/apps/ksgmltools2/docbook/xsl/html/refentry.xsl. > I could try to modify the comments but I do not have permission. The > error is preventing pdf files from being created. > > After this section fails, it moves to create the PHP files. The PHP > files get created, and works for the most part except for one nagging > problem. > Next to each section number (1.1, 1.2, etc) a diamond with a question > mark inside of it appears. > I double checked that they do not exist in the xml file, only the php > file > I think this is some strange character map issue. The PHP file sets > the character set to UTF-8. When I change firefox to render the page > in ISO8859-15 it works fine. > I have tried changing the character set of the php file, and also > tried setting the character set through .htaccess with no luck (the > pages are being served out of my public_html directory for testing) > > Any other ideas and suggestions would be appreciated. > > Thanks! > > Regards, > David > > > > > Message: 2 > Date: Tue, 25 May 2010 19:18:57 -0500 (CDT) > From: Michael Wilde < wilde at mcs.anl.gov > > Subject: [Swift-devel] Notes on maintaining the swift web and docs > To: Swift Devel < swift-devel at ci.uchicago.edu > > Message-ID: <5124436.99271274833137828. > JavaMail.root at zimbra> > Content-Type: text/plain; charset=utf-8 > > To help David in creating a procedure for test copies of the Swift web > and for generating the Swift docs with docbook, I placed notes gleaned > from swift-devel at: > > http://www.ci.uchicago.edu/wiki/bin/view/SWFT/MaintainingSwiftWebContent > > If anyone has tips to refine those instructions, please update that > page or post suggestions here for David to try and then post for > general use. > > As suggested on that page, most of the notes should wind up in README > files at www/ and docs/ > > - Mike > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Thu May 27 12:43:01 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 27 May 2010 12:43:01 -0500 Subject: [Swift-devel] Re: Notes on maintaining the swift web and docs In-Reply-To: References: Message-ID: <1274982181.6126.2.camel@blabla2.none> On Thu, 2010-05-27 at 02:39 -0400, David Kelly wrote: > Then I followed a maze of symlinks to what I believe is the correct > JAVA_HOME on this system (login.ci.uchicago.edu) > at /usr/lib/jvm/jre-1.4.2-gcj > Then I received the following errors: gcj is almost never the correct JAVA_HOME. You should find a proper JVM, such as the one from sun. I believe that softenv has one of the Sun JVMs, but I forget the proper incantation. Does anybody else know it? Mihael From wilde at mcs.anl.gov Thu May 27 13:09:08 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 27 May 2010 13:09:08 -0500 (CDT) Subject: [Swift-devel] Re: Notes on maintaining the swift web and docs In-Reply-To: <1274982181.6126.2.camel@blabla2.none> Message-ID: <13507461.162241274983748198.JavaMail.root@zimbra> Put this line in your ~/.soft file (on CI machines): +java-sun This gives you Sun Java 1.6 Sorry, I missed the gcj thing. - Mike ----- "Mihael Hategan" wrote: > On Thu, 2010-05-27 at 02:39 -0400, David Kelly wrote: > > Then I followed a maze of symlinks to what I believe is the correct > > JAVA_HOME on this system (login.ci.uchicago.edu) > > at /usr/lib/jvm/jre-1.4.2-gcj > > Then I received the following errors: > > gcj is almost never the correct JAVA_HOME. > You should find a proper JVM, such as the one from sun. > > I believe that softenv has one of the Sun JVMs, but I forget the > proper > incantation. Does anybody else know it? > > Mihael > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From dk0966 at cs.ship.edu Thu May 27 13:24:39 2010 From: dk0966 at cs.ship.edu (David Kelly) Date: Thu, 27 May 2010 14:24:39 -0400 Subject: [Swift-devel] Re: Notes on maintaining the swift web and docs In-Reply-To: <13507461.162241274983748198.JavaMail.root@zimbra> References: <1274982181.6126.2.camel@blabla2.none> <13507461.162241274983748198.JavaMail.root@zimbra> Message-ID: Thanks, that did the trick. PDFs are now being created. The diamond issue in the PHPs was also corrected by your patch. Everything looks good now. I'll update the wiki with these instructions. Regards, David -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Fri May 28 11:23:23 2010 From: wilde at mcs.anl.gov (wilde at mcs.anl.gov) Date: Fri, 28 May 2010 11:23:23 -0500 (CDT) Subject: [Swift-devel] Fwd: [Swift-user] Question: Setting environment variables In-Reply-To: <18475368.195651275063756583.JavaMail.root@zimbra> Message-ID: <2789507.195711275063803967.JavaMail.root@zimbra> David, this user question is a good example of a "app note" that we should add to the user guide and the tutorial. As you see questions on the list, can you update the docs as appropriate so that the next user with the same question is more likely to find the answer? Or, keep a list of such updates? (Thats in a sense what the SwiftCookBook page on the SWFT wiki is): http://www.ci.uchicago.edu/wiki/bin/view/SWFT/SwiftCookBook - Mike ----- Forwarded Message ----- From: wilde at mcs.anl.gov To: "Taleena R Sines" Cc: swift-user at ci.uchicago.edu Sent: Friday, May 28, 2010 11:15:55 AM GMT -06:00 US/Canada Central Subject: Re: [Swift-user] Question: Setting environment variables Taleena, You set env vars for a given app in the tc.data file using an "Env profile": http://www.ci.uchicago.edu/swift/guides/userguide.php#profile.env For example: localhost myapp /usr/bin/env none none ENV::myenvvar="something"; Thats not real clear in the User Guide - we will fix that. - Mike ----- "Taleena R Sines" wrote: > Hello, > In swift, how would I set an environment variable? > For example, in a c-script: > setenv WKDIR /user/bin > > would translate to swift how? > Thank you > > T. R. Sines > > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory _______________________________________________ Swift-user mailing list Swift-user at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-user -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Fri May 28 12:10:05 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 28 May 2010 12:10:05 -0500 (CDT) Subject: [Swift-devel] Fwd: [Swift-user] Question: Setting environment variables In-Reply-To: <9132289.197581275066273288.JavaMail.root@zimbra> Message-ID: <19692190.197791275066605226.JavaMail.root@zimbra> David, we should fix this whitespace issue in parsing tc.data. Its a common problem that new users (and old) encounter. We held off on this because the original code to parse that file comes from the older VDS system and is now I think supported by Pegasus. Either way, we should fix it: first see if Pegasus has it fixed; else fix it ourselves and post them a fix. Can you use this as an exercise to look for similar bug reports in Swift bugzilla, and file one and/or update the existing one, and fix the problem? Thanks, Mike ----- Forwarded Message ----- From: "Michael Wilde" To: "Taleena R Sines" Cc: "Swift User" Sent: Friday, May 28, 2010 12:04:33 PM GMT -06:00 US/Canada Central Subject: Re: [Swift-user] Question: Setting environment variables Did you make sure that all the fields are separated by a tab character, not spaces? If that was not the cause, I'll need to double-check and test my syntax. (please send all your replies back to the swift-user list). - Mike ----- "Taleena R Sines" wrote: > I inserted that line and it is giving me the error: > unexpected token: / > > > Taleena R. Sines > Major: Computer Science > > > > -----Original Message----- > From: Michael Wilde [ mailto:wilde at mcs.anl.gov ] > Sent: Fri 5/28/2010 12:48 PM > To: Taleena R Sines > Cc: Swift User > Subject: Re: [Swift-user] Question: Setting environment variables > > For: > > setenv CODE_HOME /trsines/code/my_code > > the example: > > localhost myapp /usr/bin/env none none ENV::myenvvar="something"; > > would become: > > localhost myapp /usr/bin/env none none ENV::CODE_HOME > ="/trsines/code/my_code"; > > - Mike > > ----- "Taleena R Sines" wrote: > > > If my code in c-script is : > > setenv CODE_HOME /trsines/code/my_code > > > > What would actually be typed in the swift-script in place of that? > > thank you > > > > T. R. Sines > > > > > > > > > > -----Original Message----- > > From: wilde at mcs.anl.gov [ mailto:wilde at mcs.anl.gov ] > > Sent: Fri 5/28/2010 12:15 PM > > To: Taleena R Sines > > Cc: swift-user at ci.uchicago.edu > > Subject: Re: [Swift-user] Question: Setting environment variables > > > > Taleena, > > > > You set env vars for a given app in the tc.data file using an "Env > > profile": > > > > http://www.ci.uchicago.edu/swift/guides/userguide.php#profile.env > > > > For example: > > > > localhost myapp /usr/bin/env none none ENV::myenvvar="something"; > > > > Thats not real clear in the User Guide - we will fix that. > > > > - Mike > > > > > > ----- "Taleena R Sines" wrote: > > > > > Hello, > > > In swift, how would I set an environment variable? > > > For example, in a c-script: > > > setenv WKDIR /user/bin > > > > > > would translate to swift how? > > > Thank you > > > > > > T. R. Sines > > > > > > > > > > > > _______________________________________________ > > > Swift-user mailing list > > > Swift-user at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory _______________________________________________ Swift-user mailing list Swift-user at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-user -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Fri May 28 12:17:18 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 28 May 2010 12:17:18 -0500 Subject: [Swift-devel] Fwd: [Swift-user] Question: Setting environment variables In-Reply-To: <19692190.197791275066605226.JavaMail.root@zimbra> References: <19692190.197791275066605226.JavaMail.root@zimbra> Message-ID: <1275067038.22631.1.camel@blabla2.none> On Fri, 2010-05-28 at 12:10 -0500, Michael Wilde wrote: > David, we should fix this whitespace issue in parsing tc.data. Its a > common problem that new users (and old) encounter. > > We held off on this because the original code to parse that file comes > from the older VDS system and is now I think supported by Pegasus. > Either way, we should fix it: first see if Pegasus has it fixed; else > fix it ourselves and post them a fix. A while ago I removed the VDS dependency and extracted only the few files that we were using from there. They are now in the swift svn. So I don't think we have to bother with going through Pegasus, though a patch for them may be the nice thing to do. Mihael From dk0966 at cs.ship.edu Fri May 28 12:28:02 2010 From: dk0966 at cs.ship.edu (David Kelly) Date: Fri, 28 May 2010 13:28:02 -0400 Subject: [Swift-devel] Possible bug with iteration in stable branch Message-ID: Hello, I believe there may be a bug with iteration in the stable branch of Swift. Below is the code I am using (which came from the swift tutorial): iterate.swift ----------------- type counterfile; (counterfile t) echo(string m) { app { echo m stdout=@filename(t); } } (counterfile t) countstep(counterfile i) { app { wcl @filename(i) @filename(t); } } counterfile a[] ; a[0] = echo("793578934574893"); iterate v { a[v+1] = countstep(a[v]); trace("extract int value ", at extractint(a[v+1])); } until (@extractint(a[v+1]) <= 1); ---------------- wcl --------- #!/bin/bash echo -n $(wc -c < $1) > $2 --------- Using the development version of swift the script works correctly, with the following output: Swift svn swift-r3335 cog-r2752 RunID: 20100528-1243-6on02joa Progress: SwiftScript trace: extract int value , 16.0 SwiftScript trace: extract int value , 2.0 SwiftScript trace: extract int value , 1.0 Final status: Finished successfully:4 However, when I use the stable branch (either using the tar.gz or by downloading from svn) I get: Could not start execution. variable a has multiple writers. -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Fri May 28 13:13:45 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 28 May 2010 13:13:45 -0500 (CDT) Subject: [Swift-devel] Fwd: raptor-loop model problem In-Reply-To: <4BFFE5D8.5050609@mcs.anl.gov> Message-ID: <18652265.201131275070425061.JavaMail.root@zimbra> Wenjun, in the attached log, I see 5 boost-threader jobs starting but not finishing. Then *I think* the coasters start timing out with nothing else to do. Mihael, can you take a look at this log and work with Wenjun to pinpoint the problem? Thanks, - Mike ----- Forwarded Message ----- From: "wenjun wu" To: "Michael Wilde" Cc: "Thomas D. Uram" Sent: Friday, May 28, 2010 10:48:40 AM GMT -06:00 US/Canada Central Subject: Re: raptor-loop model problem Hi Mike, After I fixed the File uploader in the portal, the old problem ""cannot open seq file PREPROCESSED/SEQ/T0411D1.seq for read!" is gone. But the portal still can't get the results from the "BoosterThread" result. The error in the workflow run log is: 2010-05-27 13:25:52,842-0500 WARN RequestHandler org.globus.cog.karajan.workflow.service.channels.IrrecoverableException: Coaster service ended. Reason: null stdout: stderr: at org.globus.cog.abstraction.impl.execution.coaster.ServiceManager.statusChanged(ServiceManager.java:230) at org.globus.cog.abstraction.impl.common.task.TaskImpl.notifyListeners(TaskImpl.java:236) at org.globus.cog.abstraction.impl.common.task.TaskImpl.setStatus(TaskImpl.java:224) at org.globus.cog.abstraction.impl.common.task.TaskImpl.setStatus(TaskImpl.java:253) at org.globus.cog.abstraction.impl.ssh.execution.JobSubmissionTaskHandler.SSHTaskStatusChanged(JobSubmissionTaskHandler.java:193) at org.globus.cog.abstraction.impl.ssh.SSHRunner.notifyListeners(SSHRunner.java:84) at org.globus.cog.abstraction.impl.ssh.SSHRunner.run(SSHRunner.java:43) at java.lang.Thread.run(Thread.java:595) 2010-05-27 13:25:52,843-0500 INFO AbstractStreamKarajanChannel 1427072207: Channel shut down java.lang.Throwable at org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.close(AbstractTCPChannel.java:97) at org.globus.cog.karajan.workflow.service.channels.MetaChannel.close(MetaChannel.java:87) at org.globus.cog.abstraction.impl.execution.coaster.ServiceManager.statusChanged(ServiceManager.java:232) at org.globus.cog.abstraction.impl.common.task.TaskImpl.notifyListeners(TaskImpl.java:236) at org.globus.cog.abstraction.impl.common.task.TaskImpl.setStatus(TaskImpl.java:224) at org.globus.cog.abstraction.impl.common.task.TaskImpl.setStatus(TaskImpl.java:253) at org.globus.cog.abstraction.impl.ssh.execution.JobSubmissionTaskHandler.SSHTaskStatusChanged(JobSubmissionTaskHandler.java:193) at org.globus.cog.abstraction.impl.ssh.SSHRunner.notifyListeners(SSHRunner.java:84) at org.globus.cog.abstraction.impl.ssh.SSHRunner.run(SSHRunner.java:43) at java.lang.Thread.run(Thread.java:595) 2010-05-27 13:25:52,843-0500 INFO ConnectionProtocol Freeing channel 4 [Unnamed Channel] 2010-05-27 13:25:58,866-0500 INFO AbstractStreamKarajanChannel$Multiplexer No streams 2010-05-27 13:26:08,883-0500 INFO AbstractStreamKarajanChannel$Multiplexer No streams 2010-05-27 13:26:18,905-0500 INFO AbstractStreamKarajanChannel$Multiplexer No streams 2010-05-27 13:26:28,920-0500 INFO AbstractStreamKarajanChannel$Multiplexer No streams 2010-05-27 13:26:38,924-0500 INFO AbstractStreamKarajanChannel$Multiplexer No streams 2010-05-27 13:26:48,934-0500 INFO AbstractStreamKarajanChannel$Multiplexer No streams 2010-05-27 13:26:56,456-0500 INFO TransportProtocolCommon Sending SSH_MSG_DISCONNECT 2010-05-27 13:26:56,456-0500 INFO Service ssh-connection thread is exiting It seems something went wrong after the BoostThreader job is finished. So the swift engine never gets the result back and run into endless waiting. And in the coaster.log, there also are some error messages : 2010-05-27 13:25:46,741-0500 WARN Command Command(38, JOBSTATUS): handling reply timeout; sendReqTime=100527-132346.736, sendTime=100527-132346.737, now=100527-132546.741 2010-05-27 13:25:46,741-0500 WARN Command Command(38, JOBSTATUS)fault was: Reply timeout org.globus.cog.karajan.workflow.service.ReplyTimeoutException at org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.java:269) at org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:274) at java.util.TimerThread.mainLoop(Timer.java:512) at java.util.TimerThread.run(Timer.java:462) 2010-05-27 13:25:46,851-0500 WARN Command Command(39, JOBSTATUS): handling reply timeout; sendReqTime=100527-132346.847, sendTime=100527-132346.848, now=100527-132546.851 2010-05-27 13:25:46,851-0500 WARN Command Command(39, JOBSTATUS)fault was: Reply timeout org.globus.cog.karajan.workflow.service.ReplyTimeoutException at org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.java:269) at org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:274) at java.util.TimerThread.mainLoop(Timer.java:512) at java.util.TimerThread.run(Timer.java:462) 2010-05-27 13:25:47,335-0500 WARN Command Command(40, JOBSTATUS): handling reply timeout; sendReqTime=100527-132347.331, sendTime=100527-132347.332, now=100527-132547.335 2010-05-27 13:25:47,335-0500 WARN Command Command(40, JOBSTATUS)fault was: Reply timeout org.globus.cog.karajan.workflow.service.ReplyTimeoutException at org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.java:269) at org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:274) at java.util.TimerThread.mainLoop(Timer.java:512) at java.util.TimerThread.run(Timer.java:462) 2010-05-27 13:25:47,478-0500 INFO Cpu 0527-011146-000000:3 pull 2010-05-27 13:25:47,804-0500 INFO BlockQueueProcessor Updated allocsize: 6.519897787948322 2010-05-27 13:25:47,805-0500 INFO BlockQueueProcessor allocsize = 6.519897787948322, queuedsize = 0.0, qsz = 0 2010-05-27 13:25:47,805-0500 INFO BlockQueueProcessor Plan time: 1 2010-05-27 13:25:48,084-0500 WARN Command Command(41, JOBSTATUS): handling reply timeout; sendReqTime=100527-132348.080, sendTime=100527-132348.081, now=100527-132548.084 2010-05-27 13:25:48,084-0500 WARN Command Command(41, JOBSTATUS)fault was: Reply timeout org.globus.cog.karajan.workflow.service.ReplyTimeoutException at org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.java:269) at org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:274) at java.util.TimerThread.mainLoop(Timer.java:512) at java.util.TimerThread.run(Timer.java:462) 2010-05-27 13:25:48,480-0500 INFO Cpu 0527-011146-000000:0 pull 2010-05-27 13:25:49,482-0500 INFO Cpu 0527-011146-000000:5 pull 2010-05-27 13:25:49,734-0500 WARN Command Command(42, JOBSTATUS): handling reply timeout; sendReqTime=100527-132349.730, sendTime=100527-132349.731, now=100527-132549.734 2010-05-27 13:25:49,734-0500 WARN Command Command(42, JOBSTATUS)fault was: Reply timeout org.globus.cog.karajan.workflow.service.ReplyTimeoutException at org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.java:269) at org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:274) at java.util.TimerThread.mainLoop(Timer.java:512) at java.util.TimerThread.run(Timer.java:462) 2010-05-27 13:25:50,007-0500 INFO Block Shutting down block Block 0527-011146-000000 (6x4200.000s) 2010-05-27 13:25:50,009-0500 INFO BlockQueueProcessor Cleaned 1 done blocks 2010-05-27 13:25:50,009-0500 INFO BlockQueueProcessor Updated allocsize: 6.519849641186395 The complete log is attached. Wenjun > Hi Mike, > My raptorloop script doesn't work from portal but worked when I run > from the attached shell script. > After digging up swift and raptor log files, > I found out the error message: "cannot open seq file > PREPROCESSED/SEQ/T0411D1.seq for read!" > BoostThreader is supposed to untar the prepared tar ball to PREPROCESSED. > I got a feeling that it was caused by input arguments. So from the > portal, the input arguments look like: > > -target=T0411D1 -maxLoopModel=1 -minLoopModelScore=1.0 -minLoopSize=3 > -maxLoopsPerModel=10 -templateList=20 > -prepTar=/gpfs/pads/oops/scienceportal/apache-tomcat-5.5.27/webapps/SIDGridPortal//temp/AE00A497C18DB8885C24D04862A0909A/t0411d1.prep.tar.gz > > -seqFile=/gpfs/pads/oops/scienceportal/apache-tomcat-5.5.27/webapps/SIDGridPortal//temp/1D8389428752F99E3C0A14789C07F55C/t0411d1.fasta > -templatesPerJob=4 > -nModels=10 > > > And the successful raptor run has the following arguments: > -target=T0411D1 \ > -seqFile=/home/aashish/testPrep/T0411D1.fasta \ > -prepTar=/gpfs/pads/oops/scienceportal/apache-tomcat-5.5.27/webapps/SIDGridPortal/temp/AE00A497C18DB8885C24D04862A0909A/t0411d1.prep.tar.gz > \ > -templatesPerJob=4 -templateList=20 -nModels=10 -nSim=4 \ > -loopRunParamFile=$(pwd)/loopmodels.param \ > -maxLoopModels=1 \ > -minLoopModelScore=1.0 \ > -minLoopSize=3 \ > -maxLoopsPerModel=10 > > Could you figure out any reason for this problem? > > Wenjun -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: oops-20100527-1051-78q7905g.log URL: From hategan at mcs.anl.gov Fri May 28 13:37:48 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 28 May 2010 13:37:48 -0500 Subject: [Swift-devel] Fwd: raptor-loop model problem In-Reply-To: <18652265.201131275070425061.JavaMail.root@zimbra> References: <18652265.201131275070425061.JavaMail.root@zimbra> Message-ID: <1275071868.24300.1.camel@blabla2.none> What version of swift/coasters is this and can you post the coaster log (on the remote site in ~/.globus/coasters)? Mihael On Fri, 2010-05-28 at 13:13 -0500, Michael Wilde wrote: > Wenjun, in the attached log, I see 5 boost-threader jobs starting but not finishing. > > Then *I think* the coasters start timing out with nothing else to do. > > Mihael, can you take a look at this log and work with Wenjun to pinpoint the problem? > > Thanks, > > - Mike > > ----- Forwarded Message ----- > From: "wenjun wu" > To: "Michael Wilde" > Cc: "Thomas D. Uram" > Sent: Friday, May 28, 2010 10:48:40 AM GMT -06:00 US/Canada Central > Subject: Re: raptor-loop model problem > > Hi Mike, > After I fixed the File uploader in the portal, the old problem > ""cannot open seq file PREPROCESSED/SEQ/T0411D1.seq for read!" is gone. > But the portal still can't get the results from the "BoosterThread" > result. > The error in the workflow run log is: > 2010-05-27 13:25:52,842-0500 WARN RequestHandler > org.globus.cog.karajan.workflow.service.channels.IrrecoverableException: > Coaster service ended. Reason: null > stdout: > stderr: > at > org.globus.cog.abstraction.impl.execution.coaster.ServiceManager.statusChanged(ServiceManager.java:230) > at > org.globus.cog.abstraction.impl.common.task.TaskImpl.notifyListeners(TaskImpl.java:236) > at > org.globus.cog.abstraction.impl.common.task.TaskImpl.setStatus(TaskImpl.java:224) > at > org.globus.cog.abstraction.impl.common.task.TaskImpl.setStatus(TaskImpl.java:253) > at > org.globus.cog.abstraction.impl.ssh.execution.JobSubmissionTaskHandler.SSHTaskStatusChanged(JobSubmissionTaskHandler.java:193) > at > org.globus.cog.abstraction.impl.ssh.SSHRunner.notifyListeners(SSHRunner.java:84) > at org.globus.cog.abstraction.impl.ssh.SSHRunner.run(SSHRunner.java:43) > at java.lang.Thread.run(Thread.java:595) > 2010-05-27 13:25:52,843-0500 INFO AbstractStreamKarajanChannel > 1427072207: Channel shut down > java.lang.Throwable > at > org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.close(AbstractTCPChannel.java:97) > at > org.globus.cog.karajan.workflow.service.channels.MetaChannel.close(MetaChannel.java:87) > at > org.globus.cog.abstraction.impl.execution.coaster.ServiceManager.statusChanged(ServiceManager.java:232) > at > org.globus.cog.abstraction.impl.common.task.TaskImpl.notifyListeners(TaskImpl.java:236) > at > org.globus.cog.abstraction.impl.common.task.TaskImpl.setStatus(TaskImpl.java:224) > at > org.globus.cog.abstraction.impl.common.task.TaskImpl.setStatus(TaskImpl.java:253) > at > org.globus.cog.abstraction.impl.ssh.execution.JobSubmissionTaskHandler.SSHTaskStatusChanged(JobSubmissionTaskHandler.java:193) > at > org.globus.cog.abstraction.impl.ssh.SSHRunner.notifyListeners(SSHRunner.java:84) > at org.globus.cog.abstraction.impl.ssh.SSHRunner.run(SSHRunner.java:43) > at java.lang.Thread.run(Thread.java:595) > 2010-05-27 13:25:52,843-0500 INFO ConnectionProtocol Freeing channel 4 > [Unnamed Channel] > 2010-05-27 13:25:58,866-0500 INFO > AbstractStreamKarajanChannel$Multiplexer No streams > 2010-05-27 13:26:08,883-0500 INFO > AbstractStreamKarajanChannel$Multiplexer No streams > 2010-05-27 13:26:18,905-0500 INFO > AbstractStreamKarajanChannel$Multiplexer No streams > 2010-05-27 13:26:28,920-0500 INFO > AbstractStreamKarajanChannel$Multiplexer No streams > 2010-05-27 13:26:38,924-0500 INFO > AbstractStreamKarajanChannel$Multiplexer No streams > 2010-05-27 13:26:48,934-0500 INFO > AbstractStreamKarajanChannel$Multiplexer No streams > 2010-05-27 13:26:56,456-0500 INFO TransportProtocolCommon Sending > SSH_MSG_DISCONNECT > 2010-05-27 13:26:56,456-0500 INFO Service ssh-connection thread is exiting > > It seems something went wrong after the BoostThreader job is finished. > So the swift engine never gets the result back and run into endless waiting. > And in the coaster.log, there also are some error messages : > > 2010-05-27 13:25:46,741-0500 WARN Command Command(38, JOBSTATUS): > handling reply timeout; sendReqTime=100527-132346.736, > sendTime=100527-132346.737, now=100527-132546.741 > 2010-05-27 13:25:46,741-0500 WARN Command Command(38, JOBSTATUS)fault > was: Reply timeout > org.globus.cog.karajan.workflow.service.ReplyTimeoutException > at > org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.java:269) > at > org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:274) > at java.util.TimerThread.mainLoop(Timer.java:512) > at java.util.TimerThread.run(Timer.java:462) > 2010-05-27 13:25:46,851-0500 WARN Command Command(39, JOBSTATUS): > handling reply timeout; sendReqTime=100527-132346.847, > sendTime=100527-132346.848, now=100527-132546.851 > 2010-05-27 13:25:46,851-0500 WARN Command Command(39, JOBSTATUS)fault > was: Reply timeout > org.globus.cog.karajan.workflow.service.ReplyTimeoutException > at > org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.java:269) > at > org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:274) > at java.util.TimerThread.mainLoop(Timer.java:512) > at java.util.TimerThread.run(Timer.java:462) > 2010-05-27 13:25:47,335-0500 WARN Command Command(40, JOBSTATUS): > handling reply timeout; sendReqTime=100527-132347.331, > sendTime=100527-132347.332, now=100527-132547.335 > 2010-05-27 13:25:47,335-0500 WARN Command Command(40, JOBSTATUS)fault > was: Reply timeout > org.globus.cog.karajan.workflow.service.ReplyTimeoutException > at > org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.java:269) > at > org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:274) > at java.util.TimerThread.mainLoop(Timer.java:512) > at java.util.TimerThread.run(Timer.java:462) > 2010-05-27 13:25:47,478-0500 INFO Cpu 0527-011146-000000:3 pull > 2010-05-27 13:25:47,804-0500 INFO BlockQueueProcessor Updated > allocsize: 6.519897787948322 > 2010-05-27 13:25:47,805-0500 INFO BlockQueueProcessor allocsize = > 6.519897787948322, queuedsize = 0.0, qsz = 0 > 2010-05-27 13:25:47,805-0500 INFO BlockQueueProcessor Plan time: 1 > 2010-05-27 13:25:48,084-0500 WARN Command Command(41, JOBSTATUS): > handling reply timeout; sendReqTime=100527-132348.080, > sendTime=100527-132348.081, now=100527-132548.084 > 2010-05-27 13:25:48,084-0500 WARN Command Command(41, JOBSTATUS)fault > was: Reply timeout > org.globus.cog.karajan.workflow.service.ReplyTimeoutException > at > org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.java:269) > at > org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:274) > at java.util.TimerThread.mainLoop(Timer.java:512) > at java.util.TimerThread.run(Timer.java:462) > 2010-05-27 13:25:48,480-0500 INFO Cpu 0527-011146-000000:0 pull > 2010-05-27 13:25:49,482-0500 INFO Cpu 0527-011146-000000:5 pull > 2010-05-27 13:25:49,734-0500 WARN Command Command(42, JOBSTATUS): > handling reply timeout; sendReqTime=100527-132349.730, > sendTime=100527-132349.731, now=100527-132549.734 > 2010-05-27 13:25:49,734-0500 WARN Command Command(42, JOBSTATUS)fault > was: Reply timeout > org.globus.cog.karajan.workflow.service.ReplyTimeoutException > at > org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.java:269) > at > org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:274) > at java.util.TimerThread.mainLoop(Timer.java:512) > at java.util.TimerThread.run(Timer.java:462) > 2010-05-27 13:25:50,007-0500 INFO Block Shutting down block Block > 0527-011146-000000 (6x4200.000s) > 2010-05-27 13:25:50,009-0500 INFO BlockQueueProcessor Cleaned 1 done blocks > 2010-05-27 13:25:50,009-0500 INFO BlockQueueProcessor Updated > allocsize: 6.519849641186395 > > The complete log is attached. > > Wenjun > > Hi Mike, > > My raptorloop script doesn't work from portal but worked when I run > > from the attached shell script. > > After digging up swift and raptor log files, > > I found out the error message: "cannot open seq file > > PREPROCESSED/SEQ/T0411D1.seq for read!" > > BoostThreader is supposed to untar the prepared tar ball to PREPROCESSED. > > I got a feeling that it was caused by input arguments. So from the > > portal, the input arguments look like: > > > > -target=T0411D1 -maxLoopModel=1 -minLoopModelScore=1.0 -minLoopSize=3 > > -maxLoopsPerModel=10 -templateList=20 > > -prepTar=/gpfs/pads/oops/scienceportal/apache-tomcat-5.5.27/webapps/SIDGridPortal//temp/AE00A497C18DB8885C24D04862A0909A/t0411d1.prep.tar.gz > > > > -seqFile=/gpfs/pads/oops/scienceportal/apache-tomcat-5.5.27/webapps/SIDGridPortal//temp/1D8389428752F99E3C0A14789C07F55C/t0411d1.fasta > > -templatesPerJob=4 > > -nModels=10 > > > > > > And the successful raptor run has the following arguments: > > -target=T0411D1 \ > > -seqFile=/home/aashish/testPrep/T0411D1.fasta \ > > -prepTar=/gpfs/pads/oops/scienceportal/apache-tomcat-5.5.27/webapps/SIDGridPortal/temp/AE00A497C18DB8885C24D04862A0909A/t0411d1.prep.tar.gz > > \ > > -templatesPerJob=4 -templateList=20 -nModels=10 -nSim=4 \ > > -loopRunParamFile=$(pwd)/loopmodels.param \ > > -maxLoopModels=1 \ > > -minLoopModelScore=1.0 \ > > -minLoopSize=3 \ > > -maxLoopsPerModel=10 > > > > Could you figure out any reason for this problem? > > > > Wenjun > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From wilde at mcs.anl.gov Sat May 29 10:39:06 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 29 May 2010 10:39:06 -0500 (CDT) Subject: [Swift-devel] Check tutorial/guide edits in and let student(s) know? Message-ID: <8982478.217121275147546168.JavaMail.root@zimbra> David, since the current online tutorial is already broken, and yours is likely a big improvement, you should go ahead and check it in and make sure it gets posted to the live guides. (might need to add you to some unix group for that to work) Then work with the other summer students and the student that posted to the list on Friday, to get some feedback as to whether the tutorial is now error-free. Another thing that occurs: as the set of posted examples/tutorials grows, can we create a "recipe index" that indexes the examples by a categorized outline of commonly needed techniques and FAQs? Lastly: We may want to separate out examples (mainly for enhancing the user guide) from tutorials, where the latter would be mainly formatted as exercises that one could actually walk through, as opposed to examples that one simple reads, copies and tries. The latter (tutorials) are much more work and harder to test, and thus we could simply enhance the tutorials that already exist while we focus more on writing a larger set of tested and annotated examples. - Mike From dk0966 at cs.ship.edu Mon May 31 06:43:50 2010 From: dk0966 at cs.ship.edu (David Kelly) Date: Mon, 31 May 2010 07:43:50 -0400 Subject: [Swift-devel] Fwd: [Swift-user] Question: Setting environment variables In-Reply-To: <1275067038.22631.1.camel@blabla2.none> References: <19692190.197791275066605226.JavaMail.root@zimbra> <1275067038.22631.1.camel@blabla2.none> Message-ID: Hello, It looks like this issue has already been resolved. Inside catalog/transformations/File.java, there is a regular expression which should take care of any combination of tabs and spaces. I tested this out by adding random spaces/tabs inside of tc.data. I will resolve the open bug for this, #66, since it appears to have been fixed. David On Fri, May 28, 2010 at 1:17 PM, Mihael Hategan wrote: > On Fri, 2010-05-28 at 12:10 -0500, Michael Wilde wrote: > > David, we should fix this whitespace issue in parsing tc.data. Its a > > common problem that new users (and old) encounter. > > > > We held off on this because the original code to parse that file comes > > from the older VDS system and is now I think supported by Pegasus. > > Either way, we should fix it: first see if Pegasus has it fixed; else > > fix it ourselves and post them a fix. > > A while ago I removed the VDS dependency and extracted only the few > files that we were using from there. They are now in the swift svn. So I > don't think we have to bother with going through Pegasus, though a patch > for them may be the nice thing to do. > > Mihael > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bugzilla-daemon at mcs.anl.gov Mon May 31 06:44:59 2010 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Mon, 31 May 2010 06:44:59 -0500 (CDT) Subject: [Swift-devel] [Bug 66] tc.data gets upset when whitespace between entries is not exactly one tab. In-Reply-To: References: Message-ID: <20100531114459.9C2442D0A3@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=66 David Kelly changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED CC| |dk0966 at cs.ship.edu Resolution| |FIXED --- Comment #3 from David Kelly 2010-05-31 06:44:59 --- Issue appears to be resolved. Tested by adding random tabs and spaces to tc.data and everything worked fine. -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. You are watching the reporter.