From ketan at mcs.anl.gov Mon May 2 16:03:03 2011 From: ketan at mcs.anl.gov (Ketan Maheshwari) Date: Mon, 2 May 2011 16:03:03 -0500 Subject: [Swift-devel] osg question: how to find sites' health Message-ID: <0B5CB7C5-0380-44D3-9A4E-FC512D71F31A@mcs.anl.gov> Hi Allan, I am trying to reuse your work on OSG that you did for extenci. So am using your scripts from allantools/.. A quick question about OSG: How do you find the health of participating sites? On EGI we have something called "lcg-infosites" series of commands that do this. Thanks, Ketan From aespinosa at cs.uchicago.edu Mon May 2 16:32:13 2011 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Mon, 2 May 2011 16:32:13 -0500 Subject: [Swift-devel] Re: osg question: how to find sites' health In-Reply-To: <0B5CB7C5-0380-44D3-9A4E-FC512D71F31A@mcs.anl.gov> References: <0B5CB7C5-0380-44D3-9A4E-FC512D71F31A@mcs.anl.gov> Message-ID: Hi Ketan, Most of the time i just query the ReSS condor pool (condor_status -pool engage-central.renci.org) the look for the following classads: GlueCEInfoTotalCPUs GlueCEInfo*Jobs* <= jobs running, total acceptable jobs, free cores, etc. The OSG monitoring webpages (gratia, rsv) also has related information. 2011/5/2 Ketan Maheshwari : > Hi Allan, > > I am trying to reuse your work on OSG that you did for ?extenci. So am using your scripts from allantools/.. > > A quick question about OSG: How do you find the health of participating sites? > > On EGI we have something called "lcg-infosites" series of commands that do this. > > Thanks, > Ketan > > > -- Allan M. Espinosa PhD student, Computer Science University of Chicago From wilde at mcs.anl.gov Mon May 2 18:29:46 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 2 May 2011 18:29:46 -0500 (CDT) Subject: [Swift-devel] Re: osg question: how to find sites' health In-Reply-To: Message-ID: <1734522034.16367.1304378986088.JavaMail.root@zimbra.anl.gov> Ketan, lets discuss this more tomorrow and report our progress back to the list. I think "health" is a hard term to define for grid sites. At any given time, each service on each site is either working or not. - the sites file builder can do various checks - the checks need to be done under the user's cert to be meaningful - swift needs to recover from what doesnt get caught by the sites file builder - clean reporting of errors helps OSG site admins catch and fix problems I'd like to see Allan's tools merged/extended with a few others, packaged with Swift and documented and tested for Swift users. Mike ----- Original Message ----- > Hi Ketan, > > Most of the time i just query the ReSS condor pool (condor_status > -pool engage-central.renci.org) the look for the following classads: > > GlueCEInfoTotalCPUs > GlueCEInfo*Jobs* <= jobs running, total acceptable jobs, free cores, > etc. > > The OSG monitoring webpages (gratia, rsv) also has related > information. > > 2011/5/2 Ketan Maheshwari : > > Hi Allan, > > > > I am trying to reuse your work on OSG that you did for extenci. So > > am using your scripts from allantools/.. > > > > A quick question about OSG: How do you find the health of > > participating sites? > > > > On EGI we have something called "lcg-infosites" series of commands > > that do this. > > > > Thanks, > > Ketan > > > > > > > > > > -- > Allan M. Espinosa > PhD student, Computer Science > University of Chicago > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From dsk at ci.uchicago.edu Tue May 3 09:53:50 2011 From: dsk at ci.uchicago.edu (Daniel S. Katz) Date: Tue, 3 May 2011 09:53:50 -0500 Subject: [Swift-devel] Re: osg question: how to find sites' health In-Reply-To: <1734522034.16367.1304378986088.JavaMail.root@zimbra.anl.gov> References: <1734522034.16367.1304378986088.JavaMail.root@zimbra.anl.gov> Message-ID: <8C7B2834-A692-408D-9D4C-9F8FAF2DE1FC@ci.uchicago.edu> I think this would be a useful thing for Allan work on under ExTENCI, after this week, and secondary to getting the SCEC code running and tested. I'm also trying to get more ExTENCI funds, and a general purpose tool might be a good deliverable for that. Dan On May 2, 2011, at 6:29 PM, Michael Wilde wrote: > Ketan, lets discuss this more tomorrow and report our progress back to the list. > > I think "health" is a hard term to define for grid sites. At any given time, each service on each site is either working or not. > > - the sites file builder can do various checks > - the checks need to be done under the user's cert to be meaningful > - swift needs to recover from what doesnt get caught by the sites file builder > - clean reporting of errors helps OSG site admins catch and fix problems > > I'd like to see Allan's tools merged/extended with a few others, packaged with Swift and documented and tested for Swift users. > > Mike > > > ----- Original Message ----- >> Hi Ketan, >> >> Most of the time i just query the ReSS condor pool (condor_status >> -pool engage-central.renci.org) the look for the following classads: >> >> GlueCEInfoTotalCPUs >> GlueCEInfo*Jobs* <= jobs running, total acceptable jobs, free cores, >> etc. >> >> The OSG monitoring webpages (gratia, rsv) also has related >> information. >> >> 2011/5/2 Ketan Maheshwari : >>> Hi Allan, >>> >>> I am trying to reuse your work on OSG that you did for extenci. So >>> am using your scripts from allantools/.. >>> >>> A quick question about OSG: How do you find the health of >>> participating sites? >>> >>> On EGI we have something called "lcg-infosites" series of commands >>> that do this. >>> >>> Thanks, >>> Ketan >>> >>> >>> >> >> >> >> -- >> Allan M. Espinosa >> PhD student, Computer Science >> University of Chicago >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Daniel S. Katz University of Chicago (773) 834-7186 (voice) (773) 834-6818 (fax) d.katz at ieee.org or dsk at ci.uchicago.edu http://www.ci.uchicago.edu/~dsk/ From skenny at uchicago.edu Tue May 3 15:33:48 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Tue, 3 May 2011 13:33:48 -0700 Subject: [Swift-devel] anyone else having svn troubles? Message-ID: [skenny at bridled tmp]$ svn co https://svn.ci.uchicago.edu/svn/vdl2/trunkswift svn: Can't find a temporary directory: Internal error -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketancmaheshwari at gmail.com Tue May 3 15:35:35 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Tue, 3 May 2011 15:35:35 -0500 Subject: [Swift-devel] anyone else having svn troubles? In-Reply-To: References: Message-ID: Yes, I just filed a ci support ticket. --Ketan On May 3, 2011, at 3:33 PM, Sarah Kenny wrote: > [skenny at bridled tmp]$ svn co https://svn.ci.uchicago.edu/svn/vdl2/trunk swift > svn: Can't find a temporary directory: Internal error > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidkelly999 at gmail.com Wed May 4 15:58:37 2011 From: davidkelly999 at gmail.com (David Kelly) Date: Wed, 4 May 2011 16:58:37 -0400 Subject: [Swift-devel] Website documentation Message-ID: Hello, I just finished the preliminary version of the user guide in asciidoc. It is at http://www.ci.uchicago.edu/~davidk/userguide.html. The asciidoc source is at http://www.ci.uchicago.edu/~davidk/userguide.txt. David -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Wed May 4 17:33:48 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 4 May 2011 17:33:48 -0500 (CDT) Subject: [Swift-devel] Re: Website documentation In-Reply-To: Message-ID: <407116320.26591.1304548428092.JavaMail.root@zimbra.anl.gov> Nice job, David! This really looks great. Awesome, in fact, for a first cut. One thing that jumped out at me: in the swift.properties section: - the actual descriptions should be in plain text, not fixed width text - to keep the descriptions in sync with the sample/default swift.properties file in etc/, perhaps the sample file could be coded in such a way that we can grep the properties out in a "doc build" process that runs as part of ant or make, and automatically include them in the users guide. - Perhaps move the properties to an appendix rather than a chapter - the chapter could just talk about how to manage properties, and what the general "families" of properties are. Mike ----- Original Message ----- Hello, I just finished the preliminary version of the user guide in asciidoc. It is at http://www.ci.uchicago.edu/~davidk/userguide.html . The asciidoc source is at http://www.ci.uchicago.edu/~davidk/userguide.txt . David -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From dsk at ci.uchicago.edu Thu May 5 08:18:47 2011 From: dsk at ci.uchicago.edu (Daniel S. Katz) Date: Thu, 5 May 2011 08:18:47 -0500 Subject: [Swift-devel] Website documentation In-Reply-To: References: Message-ID: We need someone new to go through this... Perhaps we could ask Lorenzo to do this? Just skimming... I think the table is 2.11 should only have one operator in each row I'm unclear about the first table in 3.1. perhaps there should be some code or text before it? On May 4, 2011, at 3:58 PM, David Kelly wrote: > Hello, > > I just finished the preliminary version of the user guide in asciidoc. It is at http://www.ci.uchicago.edu/~davidk/userguide.html. The asciidoc source is at http://www.ci.uchicago.edu/~davidk/userguide.txt. > > David > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Daniel S. Katz University of Chicago (773) 834-7186 (voice) (773) 834-6818 (fax) d.katz at ieee.org or dsk at ci.uchicago.edu http://www.ci.uchicago.edu/~dsk/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From benc at hawaga.org.uk Thu May 5 08:23:03 2011 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 5 May 2011 13:23:03 +0000 (GMT) Subject: [Swift-devel] Website documentation In-Reply-To: References: Message-ID: > > I'm unclear about the first table in 3.1. perhaps there should be some code or text before it? do you mean the table that shows examples of swift variable expressions and the filenames they mapped to? I never figured out what a nice way to present this mapping was... somehow both the version in docbook and this docbook don't work very well. something more graphical or colourful might help (eg using red for invalid mappings rather than the word INVALID) -- From ketancmaheshwari at gmail.com Thu May 5 08:27:49 2011 From: ketancmaheshwari at gmail.com (ketan) Date: Thu, 05 May 2011 08:27:49 -0500 Subject: [Swift-devel] Website documentation In-Reply-To: References: Message-ID: <4DC2A5D5.3090709@gmail.com> A small comment on tables, you can shrink the columns to the size of text by width and cols specification in asciidoc. For instance, the following will allocate 3 parts to first col and 10 to second keeping width of table to 70%: [width="70%", cols="^3,10", options="header"] On 5/4/11 3:58 PM, David Kelly wrote: > Hello, > > I just finished the preliminary version of the user guide in asciidoc. > It is at http://www.ci.uchicago.edu/~davidk/userguide.html > . The asciidoc > source is at http://www.ci.uchicago.edu/~davidk/userguide.txt > . > > David > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From dsk at ci.uchicago.edu Thu May 5 08:34:11 2011 From: dsk at ci.uchicago.edu (Daniel S. Katz) Date: Thu, 5 May 2011 08:34:11 -0500 Subject: [Swift-devel] Website documentation In-Reply-To: References: Message-ID: <1AB74874-319B-427C-8302-5AAF2A2ED3AC@ci.uchicago.edu> On May 5, 2011, at 8:23 AM, Ben Clifford wrote: > >> >> I'm unclear about the first table in 3.1. perhaps there should be some code or text before it? > > do you mean the table that shows examples of swift variable expressions > and the filenames they mapped to? > > I never figured out what a nice way to present this mapping was... somehow > both the version in docbook and this docbook don't work very well. > something more graphical or colourful might help (eg using red for invalid > mappings rather than the word INVALID) yes. What am I supposed to get from this table? f is mapped to myfile? But how was it mapped? Where was f defined? f[o] is mapped to INVALID? What does this mean? Maybe the code below the table should be moved before the table, and myfile in the table should be changed to plot_outfile_param? Or maybe a template of a single_file_mapper is needed before the table? Dan -- Daniel S. Katz University of Chicago (773) 834-7186 (voice) (773) 834-6818 (fax) d.katz at ieee.org or dsk at ci.uchicago.edu http://www.ci.uchicago.edu/~dsk/ From tim.g.armstrong at gmail.com Thu May 5 08:44:22 2011 From: tim.g.armstrong at gmail.com (Tim Armstrong) Date: Thu, 5 May 2011 08:44:22 -0500 Subject: [Swift-devel] CDM Message-ID: Hi All, I'm looking at trying to use collective data management to broadcast some shared data files for SwiftR. I have found the CDM section in the swift user guide, but are there any additional resources to help me understand how CDM in swift works? - Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: From wozniak at mcs.anl.gov Thu May 5 08:53:05 2011 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Thu, 5 May 2011 08:53:05 -0500 (Central Daylight Time) Subject: [Swift-devel] CDM In-Reply-To: References: Message-ID: Hi Tim I'm out of town today and tomorrow but I'll check on this when I'm back. CDM broadcast is only designed for the BG/P, but we could plug in support for other systems if there is an available broadcast mechanism. Justin On Thu, 5 May 2011, Tim Armstrong wrote: > Hi All, > I'm looking at trying to use collective data management to broadcast some > shared data files for SwiftR. I have found the CDM section in the swift > user guide, but are there any additional resources to help me understand how > CDM in swift works? > > - Tim > -- Justin M Wozniak From wilde at mcs.anl.gov Thu May 5 10:34:39 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 5 May 2011 10:34:39 -0500 (CDT) Subject: [Swift-devel] CDM In-Reply-To: Message-ID: <760530538.29706.1304609679640.JavaMail.root@zimbra.anl.gov> We should in the interim work up a cookbook entry for CDM. I suspect we can start with the example of how we use cdm DIRECT for modftdock on beagle. I suspect thats what you want, Tim? For this, we used following rule in file "cdm": rule .* DIRECT /lustre/beagle/wilde/mp/run04 ...and added the option "-fs.file cdm" to the swift command line. (where run04 was the directory I ran the swift command from, and below which the scripts input and output files resided. The files were mapped to relative paths, and the DIRECT rule instructed swift to bypass staging of files to/from the work directory. - Mike ----- Original Message ----- > Hi Tim > I'm out of town today and tomorrow but I'll check on this when I'm > back. CDM broadcast is only designed for the BG/P, but we could plug > in > support for other systems if there is an available broadcast > mechanism. > Justin > > On Thu, 5 May 2011, Tim Armstrong wrote: > > > Hi All, > > I'm looking at trying to use collective data management to > > broadcast some > > shared data files for SwiftR. I have found the CDM section in the > > swift > > user guide, but are there any additional resources to help me > > understand how > > CDM in swift works? > > > > - Tim > > > > -- > Justin M Wozniak > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Thu May 5 10:47:52 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 5 May 2011 10:47:52 -0500 (CDT) Subject: [Swift-devel] Website documentation In-Reply-To: Message-ID: <1472977275.29786.1304610472373.JavaMail.root@zimbra.anl.gov> Yes, indeed. We also have a long list of changes we want to make in the user guide; some of these are in bugzilla; others are in private notes we need to turn into doc bugs. Ben's initial Users Guide has served us well, but now its time for some major (but incremental) revamping. Some of the material from the ParCo paper can go into the Users Guide. Also: - more thorough explanation of the basic data and execution model - more clear explanation of mapping (ParCo has some of that) with more examples and explanations of techniques - improved diagrams of the runtime environment - more explanation of runtime environment issues like retry and replication (beyond whats in the properties descriptions) - lots of clarification needed on Coasters and providers in general - section on script development and debugging - annotated examples (can start with ParCo examples; need some simpler ones too) In general - much work ahead of us here. - Mike ----- Original Message ----- We need someone new to go through this... Perhaps we could ask Lorenzo to do this? Just skimming... I think the table is 2.11 should only have one operator in each row I'm unclear about the first table in 3.1. perhaps there should be some code or text before it? On May 4, 2011, at 3:58 PM, David Kelly wrote: Hello, I just finished the preliminary version of the user guide in asciidoc. It is at http://www.ci.uchicago.edu/~davidk/userguide.html . The asciidoc source is at http://www.ci.uchicago.edu/~davidk/userguide.txt . David _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Daniel S. Katz University of Chicago (773) 834-7186 (voice) (773) 834-6818 (fax) d.katz at ieee.org or dsk at ci.uchicago.edu http://www.ci.uchicago.edu/~dsk/ _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.g.armstrong at gmail.com Thu May 5 11:47:36 2011 From: tim.g.armstrong at gmail.com (Tim Armstrong) Date: Thu, 5 May 2011 11:47:36 -0500 Subject: [Swift-devel] CDM In-Reply-To: <760530538.29706.1304609679640.JavaMail.root@zimbra.anl.gov> References: <760530538.29706.1304609679640.JavaMail.root@zimbra.anl.gov> Message-ID: Yes, I'm just starting to think about how best to implement swiftExport() now that I'm staging the shared data files through Swift. I wasn't sure for what systems CDM was implemented. I can see that it would be handy on Beagle if there are large shared data sets - Tim On Thu, May 5, 2011 at 10:34 AM, Michael Wilde wrote: > We should in the interim work up a cookbook entry for CDM. > > I suspect we can start with the example of how we use cdm DIRECT for > modftdock on beagle. I suspect thats what you want, Tim? > > For this, we used following rule in file "cdm": > > rule .* DIRECT /lustre/beagle/wilde/mp/run04 > > ...and added the option "-fs.file cdm" to the swift command line. > > (where run04 was the directory I ran the swift command from, and below > which the scripts input and output files resided. The files were mapped to > relative paths, and the DIRECT rule instructed swift to bypass staging of > files to/from the work directory. > > - Mike > > > ----- Original Message ----- > > Hi Tim > > I'm out of town today and tomorrow but I'll check on this when I'm > > back. CDM broadcast is only designed for the BG/P, but we could plug > > in > > support for other systems if there is an available broadcast > > mechanism. > > Justin > > > > On Thu, 5 May 2011, Tim Armstrong wrote: > > > > > Hi All, > > > I'm looking at trying to use collective data management to > > > broadcast some > > > shared data files for SwiftR. I have found the CDM section in the > > > swift > > > user guide, but are there any additional resources to help me > > > understand how > > > CDM in swift works? > > > > > > - Tim > > > > > > > -- > > Justin M Wozniak > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From skenny at uchicago.edu Thu May 5 13:27:56 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Thu, 5 May 2011 11:27:56 -0700 Subject: [Swift-devel] vacation days 5/13 & 5/16 Message-ID: hi all, just a heads-up i will be away from the lab/off-line on friday 5/13 and monday 5/16 -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Fri May 6 11:35:53 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 6 May 2011 11:35:53 -0500 (CDT) Subject: [Swift-devel] sites.xml settings for Beagle in Swift trunk? In-Reply-To: <1625204505.35455.1304699599783.JavaMail.root@zimbra.anl.gov> Message-ID: <426200653.35490.1304699753032.JavaMail.root@zimbra.anl.gov> Ketan, Justin, Did you work out yesterday the right sites.xml settings for Coasters-PBS-Cray-aprun? I think this page was the start of your documentation for those, Justin: https://sites.google.com/site/swiftdevel/internals/providers/coasters-provider Ketan, can you move these to the new cookbook, and work out any issues to make sure they are correct? Did you both determine that the trunk code is working OK, or that it has bugs that still prevent it from running correctly on Beagle? Thanks, - Mike From ketan at mcs.anl.gov Fri May 6 11:50:56 2011 From: ketan at mcs.anl.gov (ketan) Date: Fri, 06 May 2011 11:50:56 -0500 Subject: [Swift-devel] Re: sites.xml settings for Beagle in Swift trunk? In-Reply-To: <426200653.35490.1304699753032.JavaMail.root@zimbra.anl.gov> References: <426200653.35490.1304699753032.JavaMail.root@zimbra.anl.gov> Message-ID: <4DC426F0.4050407@mcs.anl.gov> On 5/6/11 11:35 AM, Michael Wilde wrote: > Ketan, Justin, > > Did you work out yesterday the right sites.xml settings for Coasters-PBS-Cray-aprun? > > I think this page was the start of your documentation for those, Justin: > https://sites.google.com/site/swiftdevel/internals/providers/coasters-provider > > Ketan, can you move these to the new cookbook, and work out any issues to make sure they are correct? > > Did you both determine that the trunk code is working OK, or that it has bugs that still prevent it from running correctly on Beagle? I tested yesterday with the trunk code on beagle with new coaster-site attributes and this bug is not in anymore. However, testing the trunk code on a large run is still not concluded. I will report as soon as a big run gets underway on beagle. > Thanks, > > - Mike From wilde at mcs.anl.gov Fri May 6 12:00:23 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 6 May 2011 12:00:23 -0500 (CDT) Subject: [Swift-devel] Re: sites.xml settings for Beagle in Swift trunk? In-Reply-To: <4DC426F0.4050407@mcs.anl.gov> Message-ID: <259833798.35643.1304701223468.JavaMail.root@zimbra.anl.gov> So that others can test as well, could you post the latest recommended sites.xml settings for *trunk code* for Beagle? How will we help users change from the current sites settings in use for the 0.92-branch Beagle-support mods to the new trunk settings when that goes live in 0.93? Will gensites be able to help us with the transition? Thanks, Mike ----- Original Message ----- > On 5/6/11 11:35 AM, Michael Wilde wrote: > > Ketan, Justin, > > > > Did you work out yesterday the right sites.xml settings for > > Coasters-PBS-Cray-aprun? > > > > I think this page was the start of your documentation for those, > > Justin: > > https://sites.google.com/site/swiftdevel/internals/providers/coasters-provider > > > > Ketan, can you move these to the new cookbook, and work out any > > issues to make sure they are correct? > > > > Did you both determine that the trunk code is working OK, or that it > > has bugs that still prevent it from running correctly on Beagle? > I tested yesterday with the trunk code on beagle with new coaster-site > attributes and this bug is not in anymore. > However, testing the trunk code on a large run is still not concluded. > I > will report as soon as a big run gets underway on beagle. > > Thanks, > > > > - Mike -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Fri May 6 14:16:03 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 06 May 2011 12:16:03 -0700 Subject: [Swift-devel] Re: sites.xml settings for Beagle in Swift trunk? In-Reply-To: <4DC426F0.4050407@mcs.anl.gov> References: <426200653.35490.1304699753032.JavaMail.root@zimbra.anl.gov> <4DC426F0.4050407@mcs.anl.gov> Message-ID: <1304709363.1131.0.camel@blabla2.none> It may be useful to know what the exact problem was. Mihael On Fri, 2011-05-06 at 11:50 -0500, ketan wrote: > On 5/6/11 11:35 AM, Michael Wilde wrote: > > Ketan, Justin, > > > > Did you work out yesterday the right sites.xml settings for Coasters-PBS-Cray-aprun? > > > > I think this page was the start of your documentation for those, Justin: > > https://sites.google.com/site/swiftdevel/internals/providers/coasters-provider > > > > Ketan, can you move these to the new cookbook, and work out any issues to make sure they are correct? > > > > Did you both determine that the trunk code is working OK, or that it has bugs that still prevent it from running correctly on Beagle? > I tested yesterday with the trunk code on beagle with new coaster-site > attributes and this bug is not in anymore. > However, testing the trunk code on a large run is still not concluded. I > will report as soon as a big run gets underway on beagle. > > Thanks, > > > > - Mike > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From ketan at mcs.anl.gov Fri May 6 14:28:12 2011 From: ketan at mcs.anl.gov (ketan) Date: Fri, 06 May 2011 14:28:12 -0500 Subject: [Swift-devel] Re: sites.xml settings for Beagle in Swift trunk? In-Reply-To: <1304709363.1131.0.camel@blabla2.none> References: <426200653.35490.1304699753032.JavaMail.root@zimbra.anl.gov> <4DC426F0.4050407@mcs.anl.gov> <1304709363.1131.0.camel@blabla2.none> Message-ID: <4DC44BCC.4090102@mcs.anl.gov> Mihael, For PBS provider, Justin made enhancement through which multiple provider attributes could be specified in sites.xml, separated by semicolon using the "providerAttribute" key as follows: pbs.aprun pbs.mpp=true While the older ppn key parsing was modified to parse just the integer instead of the colon separated string of previous version: 24:cray:pack I missed to notice this above change in trunk and retained the above line in my sites.xml. This was causing the sites.xml parsing to fail and hence nothing was written on the ~/.globus/script/PBS*.submit file. Still, this empty submit file was getting submitted which made Swift think that jobs are submitted and so it was emitting a "nnn jobs submitted" status. Only to find that nothing was submitted. The problem disappeared simply by removing the above colon separated line. I would suggest, we continue to support both versions for PBS provider until some later version, say 0.95 before phasing into the new setup. Ketan On 5/6/11 2:16 PM, Mihael Hategan wrote: > It may be useful to know what the exact problem was. > > Mihael > > On Fri, 2011-05-06 at 11:50 -0500, ketan wrote: >> On 5/6/11 11:35 AM, Michael Wilde wrote: >>> Ketan, Justin, >>> >>> Did you work out yesterday the right sites.xml settings for Coasters-PBS-Cray-aprun? >>> >>> I think this page was the start of your documentation for those, Justin: >>> https://sites.google.com/site/swiftdevel/internals/providers/coasters-provider >>> >>> Ketan, can you move these to the new cookbook, and work out any issues to make sure they are correct? >>> >>> Did you both determine that the trunk code is working OK, or that it has bugs that still prevent it from running correctly on Beagle? >> I tested yesterday with the trunk code on beagle with new coaster-site >> attributes and this bug is not in anymore. >> However, testing the trunk code on a large run is still not concluded. I >> will report as soon as a big run gets underway on beagle. >>> Thanks, >>> >>> - Mike >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From hategan at mcs.anl.gov Fri May 6 14:32:51 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 06 May 2011 12:32:51 -0700 Subject: [Swift-devel] Re: sites.xml settings for Beagle in Swift trunk? In-Reply-To: <4DC44BCC.4090102@mcs.anl.gov> References: <426200653.35490.1304699753032.JavaMail.root@zimbra.anl.gov> <4DC426F0.4050407@mcs.anl.gov> <1304709363.1131.0.camel@blabla2.none> <4DC44BCC.4090102@mcs.anl.gov> Message-ID: <1304710371.1567.0.camel@blabla2.none> Ok. Thanks. Mihael On Fri, 2011-05-06 at 14:28 -0500, ketan wrote: > Mihael, > > For PBS provider, Justin made enhancement through which multiple > provider attributes could be specified in sites.xml, separated by > semicolon using the "providerAttribute" key as follows: > > > pbs.aprun > pbs.mpp=true > > > While the older ppn key parsing was modified to parse just the integer > instead of the colon separated string of previous version: > > 24:cray:pack > > I missed to notice this above change in trunk and retained the above > line in my sites.xml. This was causing the sites.xml parsing to fail and > hence nothing was written on the ~/.globus/script/PBS*.submit file. > > Still, this empty submit file was getting submitted which made Swift > think that jobs are submitted and so it was emitting a "nnn jobs > submitted" status. > > Only to find that nothing was submitted. > > The problem disappeared simply by removing the above colon separated line. > > > I would suggest, we continue to support both versions for PBS provider > until some later version, say 0.95 before phasing into the new setup. > > > Ketan > > > > On 5/6/11 2:16 PM, Mihael Hategan wrote: > > It may be useful to know what the exact problem was. > > > > Mihael > > > > On Fri, 2011-05-06 at 11:50 -0500, ketan wrote: > >> On 5/6/11 11:35 AM, Michael Wilde wrote: > >>> Ketan, Justin, > >>> > >>> Did you work out yesterday the right sites.xml settings for Coasters-PBS-Cray-aprun? > >>> > >>> I think this page was the start of your documentation for those, Justin: > >>> https://sites.google.com/site/swiftdevel/internals/providers/coasters-provider > >>> > >>> Ketan, can you move these to the new cookbook, and work out any issues to make sure they are correct? > >>> > >>> Did you both determine that the trunk code is working OK, or that it has bugs that still prevent it from running correctly on Beagle? > >> I tested yesterday with the trunk code on beagle with new coaster-site > >> attributes and this bug is not in anymore. > >> However, testing the trunk code on a large run is still not concluded. I > >> will report as soon as a big run gets underway on beagle. > >>> Thanks, > >>> > >>> - Mike > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > From ketan at mcs.anl.gov Fri May 6 14:55:36 2011 From: ketan at mcs.anl.gov (ketan) Date: Fri, 06 May 2011 14:55:36 -0500 Subject: [Swift-devel] Re: sites.xml settings for Beagle in Swift trunk? In-Reply-To: <259833798.35643.1304701223468.JavaMail.root@zimbra.anl.gov> References: <259833798.35643.1304701223468.JavaMail.root@zimbra.anl.gov> Message-ID: <4DC45238.1000500@mcs.anl.gov> Please find attached the correct sites.xml template for trunk version of coasters-PBS provider. Ketan On 5/6/11 12:00 PM, Michael Wilde wrote: > So that others can test as well, could you post the latest recommended sites.xml settings for *trunk code* for Beagle? > > How will we help users change from the current sites settings in use for the 0.92-branch Beagle-support mods to the new trunk settings when that goes live in 0.93? > > Will gensites be able to help us with the transition? > > Thanks, > > Mike > > > ----- Original Message ----- >> On 5/6/11 11:35 AM, Michael Wilde wrote: >>> Ketan, Justin, >>> >>> Did you work out yesterday the right sites.xml settings for >>> Coasters-PBS-Cray-aprun? >>> >>> I think this page was the start of your documentation for those, >>> Justin: >>> https://sites.google.com/site/swiftdevel/internals/providers/coasters-provider >>> >>> Ketan, can you move these to the new cookbook, and work out any >>> issues to make sure they are correct? >>> >>> Did you both determine that the trunk code is working OK, or that it >>> has bugs that still prevent it from running correctly on Beagle? >> I tested yesterday with the trunk code on beagle with new coaster-site >> attributes and this bug is not in anymore. >> However, testing the trunk code on a large run is still not concluded. >> I >> will report as soon as a big run gets underway on beagle. >>> Thanks, >>> >>> - Mike -------------- next part -------------- A non-text attachment was scrubbed... Name: sites.template.xml Type: text/xml Size: 1201 bytes Desc: not available URL: From wilde at mcs.anl.gov Fri May 6 18:33:20 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 6 May 2011 18:33:20 -0500 (CDT) Subject: [Swift-devel] Ad-hoc coaster sites Message-ID: <2119943636.37578.1304724800163.JavaMail.root@zimbra.anl.gov> Can anyone who has tested and/or packaged and/or documented scripts for launching ad-hoc pools of nodes as coaster sites send a pointer in response to this query with where they are and what state they are in? We have several people that want to run Swift on clouds, and are looking for the best examples of such scripts to post as a base for people to try cloud execution with. In particular, what is the state of the scripts and tests that were done to run scripts on the MCS compute-server pool? Thanks, - Mike From benc at hawaga.org.uk Mon May 9 02:58:28 2011 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 9 May 2011 07:58:28 +0000 (GMT) Subject: [Swift-devel] Ad-hoc coaster sites In-Reply-To: <2119943636.37578.1304724800163.JavaMail.root@zimbra.anl.gov> References: <2119943636.37578.1304724800163.JavaMail.root@zimbra.anl.gov> Message-ID: > We have several people that want to run Swift on clouds, and are looking > for the best examples of such scripts to post as a base for people to > try cloud execution with. It sounds like the nimbus context broker (which might have a different name now) would be useful there -- from what I've seen it seems to be an almost perfect use-case, and would avoid reimplementing a "simpler" version of the same thing. -- From tim.g.armstrong at gmail.com Mon May 9 09:38:06 2011 From: tim.g.armstrong at gmail.com (Tim Armstrong) Date: Mon, 9 May 2011 09:38:06 -0500 Subject: [Swift-devel] Coding a server in Swift Message-ID: Hi All, for SwiftR, I've been looking at trying to implement a multi-threaded server in Swift, but haven't been quite sure how to go about it. Basically I want to write something that sits in a loop, reading requests from a named pipe on the file system, and forking off threads to handle the request. Something like: iterate { request = readData(inputFile) // block until data available handleRequest(request) // do all the work in a new thread. } My problem is that my (shaky) understanding is that iterate has sequential semantics - that all of the statements within the loop body must finish before the next iteration of the loop starts, which would mean only one request was processed at a time. Am I correct in my understanding? If so, is there a workaround where I could get it to fork off threads like this? - Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Mon May 9 10:37:00 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 9 May 2011 10:37:00 -0500 (CDT) Subject: [Swift-devel] Ad-hoc coaster sites In-Reply-To: Message-ID: <492206562.40777.1304955420412.JavaMail.root@zimbra.anl.gov> Kate asked John Breshnahan to look into this. These questions occur to me: - can the context broker work on multiple clouds (EC2, Magellan, Futuregrid, and Bionimbus) - will it support the "bag of workstations" model? (ie no VMs involved) - how does Swift talk to it? Mike ----- Original Message ----- > > We have several people that want to run Swift on clouds, and are > > looking > > for the best examples of such scripts to post as a base for people > > to > > try cloud execution with. > > It sounds like the nimbus context broker (which might have a different > name now) would be useful there -- from what I've seen it seems to be > an > almost perfect use-case, and would avoid reimplementing a "simpler" > version of the same thing. > > -- -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From benc at hawaga.org.uk Mon May 9 10:41:41 2011 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 9 May 2011 15:41:41 +0000 (GMT) Subject: [Swift-devel] condor coasters In-Reply-To: <492206562.40777.1304955420412.JavaMail.root@zimbra.anl.gov> References: <492206562.40777.1304955420412.JavaMail.root@zimbra.anl.gov> Message-ID: > - will it support the "bag of workstations" model? (ie no VMs involved) well now condor was the traditional answer for that, but it didn't work very well in practice. the really big blocker i found on workstation pools was that they tended to not have a shared file system, which was totally at odds with swift's site model of the time. maybe that's not so much of a problem anymore, if there are other ways of staging data. -- From wilde at mcs.anl.gov Mon May 9 10:49:15 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 9 May 2011 10:49:15 -0500 (CDT) Subject: [Swift-devel] Re: condor coasters In-Reply-To: Message-ID: <739480942.40861.1304956155877.JavaMail.root@zimbra.anl.gov> The "bag or workstations model" is of interest for two user groups: - MCS runs 10+ 8-core compute servers (that mount /home). These make a great "site" for initial scaleup of scripts for MCS users. - Users anywhere with a similar capability can do the same. Coaster provider staging gives good support for setups like this that lack a shared filesystem. - Mike ----- Original Message ----- > > - will it support the "bag of workstations" model? (ie no VMs > > involved) > > well now condor was the traditional answer for that, but it didn't > work > very well in practice. > > the really big blocker i found on workstation pools was that they > tended > to not have a shared file system, which was totally at odds with > swift's > site model of the time. > > maybe that's not so much of a problem anymore, if there are other ways > of > staging data. > > -- -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From benc at hawaga.org.uk Mon May 9 11:51:40 2011 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 9 May 2011 16:51:40 +0000 (GMT) Subject: [Swift-devel] Coding a server in Swift In-Reply-To: References: Message-ID: That's a fairly crazy thing to do, I think. Here is a digression which may or may not be relevant at the end. In the past I thought a bit about handling "ever growing" data sets (for example, a new dated file of sensor readings is placed in an input directory every hour or day or week), with Swift causing some processing to happen on each of those files as they appear. In the case of a fixed-size dataset, that's fairly straightforward: map all the files in a directory into an array, and then use foreach to run something on all of those files, generating corresponding output files. At the high level, Swift shouldn't have a problem with a directory that grows: you map a directory that is still growing, and you run a foreach over it. Most of swift can handle an array that is not yet complete - thats core to the parallel behaviour. However the present mapping infrastructure can't deal with that (was my opinion at the time). It might be possible to implement a Java-land function that instead creates the appropriate array object over time rather than through mapping with a mapper. I'm not sure. Then, call that "polldir" and you could saysomething like: a = polldir("/path/to/dir") foreach e in a { ... } I think the behaviour you desire is then fairly straightforwardly expressed. It might be interesting for someone to have a deeper investigation into whether something like polldir could be easily implemented (although in the future, mappers would be the right place, I think) -- http://www.hawaga.org.uk/ben/ From bresnaha at mcs.anl.gov Tue May 10 15:38:28 2011 From: bresnaha at mcs.anl.gov (John Bresnahan) Date: Tue, 10 May 2011 10:38:28 -1000 Subject: [Swift-devel] virtual clusters an swift Message-ID: <4DC9A244.1080404@mcs.anl.gov> Hello all, I am looking to have make a repeatable launch program that brings up virtual clusters across cloud domains for use with swift. I have a few questions about what swift assumes about a cluster on which it runs. How does it access the machines? ssh? Is any other base software needed or assumed to be running? Can it deal with clusters that have a single head node with an external IP but all internal IPs for worker nodes? Is there anything else that is assumed about the machines that swift uses. Thanks, John From wilde at mcs.anl.gov Tue May 10 15:59:00 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 10 May 2011 15:59:00 -0500 (CDT) Subject: [Swift-devel] virtual clusters an swift In-Reply-To: <4DC9A244.1080404@mcs.anl.gov> Message-ID: <111489875.49875.1305061140846.JavaMail.root@zimbra.anl.gov> Hi John, For cloud use, Swift uses a resource-provisioning mechanism called "coasters" which is introduced here: http://www.ci.uchicago.edu/swift/guides/userguide.php#coasters and http://wiki.cogkit.org/wiki/Coasters This short paper from the recent CCA-11 workshop explains a bit more on the cloud configuration: http://cca11.org/files/2011/04/p16.pdf To configure for a cloud: on every cloud worker node, start a coaster "worker". This is is Perl script, worker.pl, located in the Swift bin/ dir. Command line args tell worker.pl to connect either to a Swift client, or to an intermediary Swift coaster-service. Using the coaster-service will allow you to create a long-lived Swift coaster pool that can support the repeated execution of Swift scripts by Swift client tasks. We'll locate and package up for you a simple set of startup scripts that can be run on any collection of hosts that you can ssh to, as well as the scripts that we used on Bionimbus (which used ssh port forwarding). - Mike ----- Original Message ----- > Hello all, > > I am looking to have make a repeatable launch program that brings up > virtual clusters across cloud > domains for use with swift. I have a few questions about what swift > assumes about a cluster on > which it runs. > > How does it access the machines? ssh? Is any other base software > needed or assumed to be running? > Can it deal with clusters that have a single head node with an > external IP but all internal IPs for > worker nodes? > Is there anything else that is assumed about the machines that swift > uses. > > Thanks, > > John > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Tue May 10 16:20:34 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 10 May 2011 14:20:34 -0700 Subject: [Swift-devel] virtual clusters an swift In-Reply-To: <4DC9A244.1080404@mcs.anl.gov> References: <4DC9A244.1080404@mcs.anl.gov> Message-ID: <1305062434.18078.20.camel@blabla2.none> On Tue, 2011-05-10 at 10:38 -1000, John Bresnahan wrote: > Hello all, > > I am looking to have make a repeatable launch program that brings up virtual clusters across cloud > domains for use with swift. I have a few questions about what swift assumes about a cluster on > which it runs. > > How does it access the machines? ssh? There are various "providers" that implement how swift accesses a machine. SSH is one of them. GT is another. Those are the remote ones. It can also talk directly to local schedulers, but that's probably not what you want. > Is any other base software needed or assumed to be running? There is an assumption of some kind of shared filesystem. Other than that, it depends. There are various schemes to get things running. One, which Mike mentioned, is a glide-in like thing (called coasters), but the main purpose of that is improved performance (e.g. faster job submission rates, avoiding shared FSs as much as possible). On the other hand, if there are non-trivial costs associated with allocating nodes, that might be worth considering. The shared filesystem requirement can probably be circumvented if coasters are used for file staging. In that case, the "worker" script somehow needs to be made available to the worker nodes. > Can it deal with clusters that have a single head node with an external IP but all internal IPs for > worker nodes? In most cases that's what we assumed to be the case, so yes. When not using coasters, that's pretty much there by default, because all communication is with head node services. When using coasters, requests are piped through the head node which acts as a bridge between WNs and the client. > Is there anything else that is assumed about the machines that swift uses. /bin/bash and standard tools (cp, mv, etc.) on both head node and worker nodes. Coasters also require md5sum or gmd5sum, wget or curl, and java, all on the head node and perl on the worker nodes. There are ways to get around md5 and wget. With coasters, a valid GSI proxy is needed (as well as the proper CA certs on both client and head node). Sadly, that is even in the case when SSH is used. Mihael From davidkelly999 at gmail.com Wed May 11 04:04:57 2011 From: davidkelly999 at gmail.com (David Kelly) Date: Wed, 11 May 2011 05:04:57 -0400 Subject: [Swift-devel] Re: Website documentation In-Reply-To: <407116320.26591.1304548428092.JavaMail.root@zimbra.anl.gov> References: <407116320.26591.1304548428092.JavaMail.root@zimbra.anl.gov> Message-ID: I have a basic version of the script for generating website docs in /ci/www/projects/swift/guides. It works in a pretty similar way to what we have now - the documents must manually be copied to asciidocs/docs//*.txt and then the script converts the txt files to html and pdf. Instead of having to copy the files manually, I think the next step would be to have it generate these from releases using svn. I will add this to the wiki once it gets a little more finalized. I am also working on some improvements to the documents themselves. The asciidoc version of the tutorial brings in the contents of external .swift scripts when generating the html. Changes to the example scripts should automatically be referenced in the tutorial. I will also try add something like dynamically generating the properties into the user guide as was suggested, as well as other various clean up and fixes. David On Wed, May 4, 2011 at 6:33 PM, Michael Wilde wrote: > Nice job, David! This really looks great. Awesome, in fact, for a first > cut. > > One thing that jumped out at me: in the swift.properties section: > > - the actual descriptions should be in plain text, not fixed width text > > - to keep the descriptions in sync with the sample/default swift.properties > file in etc/, perhaps the sample file could be coded in such a way that we > can grep the properties out in a "doc build" process that runs as part of > ant or make, and automatically include them in the users guide. > > - Perhaps move the properties to an appendix rather than a chapter - the > chapter could just talk about how to manage properties, and what the general > "families" of properties are. > > Mike > > > ------------------------------ > > Hello, > > I just finished the preliminary version of the user guide in asciidoc. It > is at http://www.ci.uchicago.edu/~davidk/userguide.html. The asciidoc > source is at http://www.ci.uchicago.edu/~davidk/userguide.txt. > > David > > > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From benc at hawaga.org.uk Wed May 11 04:19:14 2011 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 11 May 2011 09:19:14 +0000 (GMT) Subject: [Swift-devel] virtual clusters an swift In-Reply-To: <1305062434.18078.20.camel@blabla2.none> References: <4DC9A244.1080404@mcs.anl.gov> <1305062434.18078.20.camel@blabla2.none> Message-ID: > One, which Mike mentioned, is a glide-in like thing (called coasters), > but the main purpose of that is improved performance (e.g. faster job > submission rates, avoiding shared FSs as much as possible). On the other > hand, if there are non-trivial costs associated with allocating nodes, > that might be worth considering. This sounds like you see two ways of doing things: one-vm-per-swift-app-call and one-cluster-of-vms-running-coasters Whenever I thought about this before, something like coasters (or appropriately configured PBS when I first thought about it), seemed the sensible of the two. I think node alloc time (assuming it behaves like EC2 stuff i've done elsewhere) is on the order of booting a machine from scratch. -- From hategan at mcs.anl.gov Wed May 11 13:35:04 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 11 May 2011 11:35:04 -0700 Subject: [Swift-devel] virtual clusters an swift In-Reply-To: References: <4DC9A244.1080404@mcs.anl.gov> <1305062434.18078.20.camel@blabla2.none> Message-ID: <1305138904.9745.7.camel@blabla2.none> On Wed, 2011-05-11 at 09:19 +0000, Ben Clifford wrote: > > One, which Mike mentioned, is a glide-in like thing (called coasters), > > but the main purpose of that is improved performance (e.g. faster job > > submission rates, avoiding shared FSs as much as possible). On the other > > hand, if there are non-trivial costs associated with allocating nodes, > > that might be worth considering. > > This sounds like you see two ways of doing things: > one-vm-per-swift-app-call and one-cluster-of-vms-running-coasters Mostly. I was thinking of a BGP like configuration in which each job allocates and boots a set of worker nodes. What I was trying to say in my own twisted way was that coasters, while providing an integrated and fast solution, are not fundamentally needed. A plain PBS scheme could work as well, and if the jobs aren't too small, various costs may get amortized. > > Whenever I thought about this before, something like coasters (or > appropriately configured PBS when I first thought about it), seemed the > sensible of the two. Right. > > I think node alloc time (assuming it behaves like EC2 stuff i've done > elsewhere) is on the order of booting a machine from scratch. > And right. Same as BGP. Those are environments where you may want to grab nodes and stick to them. From aespinosa at cs.uchicago.edu Wed May 11 20:00:35 2011 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Wed, 11 May 2011 20:00:35 -0500 Subject: [Swift-devel] Re: Broken pipe on persistent coasters (was Re: Next steps on making the ExTENCI SCEC workflow run reliably) In-Reply-To: <1305152280.18881.6.camel@blabla2.none> References: <1305149802.17175.3.camel@blabla2.none> <1305152280.18881.6.camel@blabla2.none> Message-ID: Redirecting the thread to swift-devel. I did a simple test where I killed off a worker while a single job is being dispatched (persistent coasters, passive workers). $run_service.sh $cat workflow.swift int t = 300; app (external o) sleep_pads(int time) { sleep_pads time; } external o_pads; o_pads = sleep_pads(t); $swift workflow.swift Swift svn swift-r4399 cog-r3087 RunID: 20110511-1908-kv67luid Progress: Find: https://communicado.ci.uchicago.edu:64999 Find: keepalive(120), reconnect - https://communicado.ci.uchicago.edu:64999 Passive queue processor initialized. Callback URI is http://128.135.125.17:63999 Progress: Submitted:1 Progress: Active:1 Progress: Active:1 Progress: Active:1 Progress: Active:1 Progress: Active:1 ... ... (on a parallel terminal): $/worker.pl http://communicado.ci.uchicago.edu:63999 PADS /scratch Ctrl-C # when the job sleep_pads() started running for a while $ Upon killing the worker, the application terminated as well. But the swift console session still reports the job as being 'Active'. Also, no error has been reports (yet) on the coaster service log. Maybe these will register later after a sufficient amount of time? i'll report on this again later as the run further progresses but i do get the same last tens of lines from the Swift log: 2011-05-11 19:54:18,868-0500 INFO AbstractStreamKarajanChannel$Multiplexer No streams 2011-05-11 19:54:28,747-0500 INFO AbstractStreamKarajanChannel Sender 1545215993 queue size: 0 2011-05-11 19:54:28,873-0500 INFO AbstractStreamKarajanChannel$Multiplexer No streams 2011-05-11 19:54:38,754-0500 INFO AbstractStreamKarajanChannel Sender 1545215993 queue size: 0 2011-05-11 19:54:38,881-0500 INFO AbstractStreamKarajanChannel$Multiplexer No streams 2011-05-11 19:54:48,756-0500 INFO AbstractStreamKarajanChannel Sender 1545215993 queue size: 0 2011-05-11 19:54:48,885-0500 INFO AbstractStreamKarajanChannel$Multiplexer No streams 2011-05-11 19:54:58,762-0500 INFO AbstractStreamKarajanChannel Sender 1545215993 queue size: 0 2011-05-11 19:54:58,903-0500 INFO AbstractStreamKarajanChannel$Multiplexer No streams 2011-05-11 19:55:08,768-0500 INFO AbstractStreamKarajanChannel Sender 1545215993 queue size: 0 2011-05-11 19:55:08,912-0500 INFO AbstractStreamKarajanChannel$Multiplexer No streams 2011-05-11 19:55:18,772-0500 INFO AbstractStreamKarajanChannel Sender 1545215993 queue size: 0 2011-05-11 19:55:18,921-0500 INFO AbstractStreamKarajanChannel$Multiplexer No streams 2011-05-11 19:55:28,779-0500 INFO AbstractStreamKarajanChannel Sender 1545215993 queue size: 0 2011-05-11 19:55:28,926-0500 INFO AbstractStreamKarajanChannel$Multiplexer No streams 2011-05-11 19:55:38,784-0500 INFO AbstractStreamKarajanChannel Sender 1545215993 queue size: 0 2011-05-11 19:55:38,933-0500 INFO AbstractStreamKarajanChannel$Multiplexer No streams 2011-05-11 19:55:48,791-0500 INFO AbstractStreamKarajanChannel Sender 1545215993 queue size: 0 2011-05-11 19:55:48,954-0500 INFO AbstractStreamKarajanChannel$Multiplexer No streams 2011-05-11 19:55:58,800-0500 INFO AbstractStreamKarajanChannel Sender 1545215993 queue size: 0 2011-05-11 19:55:58,955-0500 INFO AbstractStreamKarajanChannel$Multiplexer No streams 2011-05-11 19:56:08,808-0500 INFO AbstractStreamKarajanChannel Sender 1545215993 queue size: 0 2011-05-11 19:56:08,972-0500 INFO AbstractStreamKarajanChannel$Multiplexer No streams 2011-05-11 19:56:18,816-0500 INFO AbstractStreamKarajanChannel Sender 1545215993 queue size: 0 2011-05-11 19:56:18,974-0500 INFO AbstractStreamKarajanChannel$Multiplexer No streams 2011-05-11 19:56:28,822-0500 INFO AbstractStreamKarajanChannel Sender 1545215993 queue size: 0 2011-05-11 19:56:28,984-0500 INFO AbstractStreamKarajanChannel$Multiplexer No streams 2011-05-11 19:56:38,825-0500 INFO AbstractStreamKarajanChannel Sender 1545215993 queue size: 0 2011-05-11 19:56:38,988-0500 INFO AbstractStreamKarajanChannel$Multiplexer No streams 2011-05-11 19:56:48,832-0500 INFO AbstractStreamKarajanChannel Sender 1545215993 queue size: 0 2011-05-11 19:56:48,999-0500 INFO AbstractStreamKarajanChannel$Multiplexer No streams 2011-05-11 19:56:58,845-0500 INFO AbstractStreamKarajanChannel Sender 1545215993 queue size: 0 2011-05-11 19:56:59,001-0500 INFO AbstractStreamKarajanChannel$Multiplexer No streams 2011-05-11 19:57:08,846-0500 INFO AbstractStreamKarajanChannel Sender 1545215993 queue size: 0 2011-05-11 19:57:09,014-0500 INFO AbstractStreamKarajanChannel$Multiplexer No streams 2011-05-11 19:57:18,854-0500 INFO AbstractStreamKarajanChannel Sender 1545215993 queue size: 0 2011-05-11 19:57:19,025-0500 INFO AbstractStreamKarajanChannel$Multiplexer No streams 2011/5/11 Mihael Hategan : > On Wed, 2011-05-11 at 16:42 -0500, Allan Espinosa wrote: >> Right. Workers die because they exceed the maximum walltime. ?Does the >> coaster service expect the workers to die cleanly (passive ones)? > > Hmm. They aren't expected to die. Which may be a problem. > > We (as in I) need to change that. Passive workers should advertise their > walltime to the service and the service should take that into account so > that jobs don't get sent to workers who don't have enough time left. > > However, as inefficient as this may be, the service should notify the > client that the jobs that were running on a dying worker have failed, > and those jobs should be restarted by swift. Is that not happening? > > > > -- Allan M. Espinosa PhD student, Computer Science University of Chicago From aespinosa at cs.uchicago.edu Wed May 11 20:02:44 2011 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Wed, 11 May 2011 20:02:44 -0500 Subject: [Swift-devel] Re: Broken pipe on persistent coasters (was Re: Next steps on making the ExTENCI SCEC workflow run reliably) In-Reply-To: References: <1305149802.17175.3.camel@blabla2.none> <1305152280.18881.6.camel@blabla2.none> Message-ID: 2011/5/11 Allan Espinosa : > 2011/5/11 Mihael Hategan : >> On Wed, 2011-05-11 at 16:42 -0500, Allan Espinosa wrote: >>> Right. Workers die because they exceed the maximum walltime. ?Does the >>> coaster service expect the workers to die cleanly (passive ones)? >> >> Hmm. They aren't expected to die. Which may be a problem. >> >> We (as in I) need to change that. Passive workers should advertise their >> walltime to the service and the service should take that into account so >> that jobs don't get sent to workers who don't have enough time left. I remember that previous versions of the worker.pl has an idle timeout parameter. >> >> However, as inefficient as this may be, the service should notify the >> client that the jobs that were running on a dying worker have failed, >> and those jobs should be restarted by swift. Is that not happening? >> In this case, it hasn't (yet) From hategan at mcs.anl.gov Wed May 11 20:59:06 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 11 May 2011 18:59:06 -0700 Subject: [Swift-devel] Re: Broken pipe on persistent coasters (was Re: Next steps on making the ExTENCI SCEC workflow run reliably) In-Reply-To: References: <1305149802.17175.3.camel@blabla2.none> <1305152280.18881.6.camel@blabla2.none> Message-ID: <1305165546.26007.2.camel@blabla2.none> On Wed, 2011-05-11 at 20:02 -0500, Allan Espinosa wrote: > 2011/5/11 Allan Espinosa : > > > 2011/5/11 Mihael Hategan : > >> On Wed, 2011-05-11 at 16:42 -0500, Allan Espinosa wrote: > >>> Right. Workers die because they exceed the maximum walltime. Does the > >>> coaster service expect the workers to die cleanly (passive ones)? > >> > >> Hmm. They aren't expected to die. Which may be a problem. > >> > >> We (as in I) need to change that. Passive workers should advertise their > >> walltime to the service and the service should take that into account so > >> that jobs don't get sent to workers who don't have enough time left. > > I remember that previous versions of the worker.pl has an idle timeout > parameter. That was only to shut them down in case they lose connection to the service, but I remove that since the heartbeats pretty much do the same thing. Or so I remember. > > >> > >> However, as inefficient as this may be, the service should notify the > >> client that the jobs that were running on a dying worker have failed, > >> and those jobs should be restarted by swift. Is that not happening? > >> > > In this case, it hasn't (yet) Ok. That's a bug, and I think it's a major bug in your case. Please file a bug report on it and I'll get to it as soon as I can. From hategan at mcs.anl.gov Thu May 12 12:17:26 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 12 May 2011 10:17:26 -0700 Subject: [Swift-devel] Re: Broken pipe on persistent coasters (was Re: Next steps on making the ExTENCI SCEC workflow run reliably) In-Reply-To: <1305165546.26007.2.camel@blabla2.none> References: <1305149802.17175.3.camel@blabla2.none> <1305152280.18881.6.camel@blabla2.none> <1305165546.26007.2.camel@blabla2.none> Message-ID: <1305220646.9453.2.camel@blabla2.none> On Wed, 2011-05-11 at 18:59 -0700, Mihael Hategan wrote: > > > > >> > > >> However, as inefficient as this may be, the service should notify the > > >> client that the jobs that were running on a dying worker have failed, > > >> and those jobs should be restarted by swift. Is that not happening? > > >> > > > > In this case, it hasn't (yet) > > Ok. That's a bug, and I think it's a major bug in your case. Please file > a bug report on it and I'll get to it as soon as I can. > I see what's happening. In automatic mode a walltime exceeded causes the task to fail and that is the channel through which the service finds out that something went wrong. In passive mode, you manage the task, and the service currently has no means of learning that a worker has failed. From aespinosa at cs.uchicago.edu Fri May 13 15:06:32 2011 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Fri, 13 May 2011 15:06:32 -0500 Subject: [Swift-devel] Patch: make sure chxml is in $PATH Message-ID: My Swift sessions on trunk were reporting that 'chxml' is missing so I made sure that it is found. I don't have a test for this but hopefully this fixes trunk for other users. -Allan diff --git a/bin/swift b/bin/swift index ab13cdf..b9155af 100755 --- a/bin/swift +++ b/bin/swift @@ -63,7 +63,7 @@ CMDLINE=`fixCommandLine "$@"` # make sure sites.xml file is well-formed -chxml $CMDLINE +$SWIFT_HOME/bin/chxml $CMDLINE ### SETUP OTHER ENV VARIABLES #### From hategan at mcs.anl.gov Fri May 13 18:27:02 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 13 May 2011 16:27:02 -0700 Subject: [Swift-devel] DataNode.toString() Message-ID: <1305329222.3494.18.camel@blabla2.none> I changed that in trunk. It used to be: org.griphyn.vdl.mapping.RootDataNode identifier dataset:20110512-2343-5rl3b7x5:720000000072 type Sgt with no value at dataset=sgt_var (not closed) That was annoying, noisy, and I had no idea what's what. It is now: name:type = value - [Open/Closed] The provenance data should still be the same, but it may not. So please let me know if anything breaks. From wilde at mcs.anl.gov Sat May 14 09:09:05 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 14 May 2011 09:09:05 -0500 (CDT) Subject: [Swift-devel] Test Re: [Swift-user] badness In-Reply-To: <1305324528.2320.0.camel@blabla2.none> Message-ID: <423163983.65438.1305382145624.JavaMail.root@zimbra.anl.gov> Can you post the test case to bugzilla? This would be a good exercise for David and/or Alberto as they work on extending the test suite, and a good practice in general for capturing such test scripts as regression tests. - Mike ----- Original Message ----- > Groovy. I can reproduce the problem. > > On Fri, 2011-05-13 at 15:46 -0500, Allan Espinosa wrote: > > Oh. http://www.ci.uchicago.edu/~aespinosa/postproc-trunk.tar.gz > > > > 2011/5/13 Mihael Hategan : > > > Emm... link to tar.gz? > > > > > > On Fri, 2011-05-13 at 15:28 -0500, Allan Espinosa wrote: > > >> WARNING: possible mail bomb, NOT CHECKED FOR VIRUSES: > > >> Maximum number of files (1500) exceeded at > > >> /usr/sbin/amavisd-new line 5048. > > > > > > > > > > > > > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From yadudoc1729 at gmail.com Sat May 14 17:03:40 2011 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Sun, 15 May 2011 03:33:40 +0530 Subject: [Swift-devel] Setting up swift dev environment in Eclipse Message-ID: Hi, I am trying to set up a development environment in Eclipse (Helios 3.6.2) on ubuntu 10.10. I've gotten eclipse running but I'm facing some trouble importing the source. I've listed out some issues that need clarification, any help or advice would be greatly appreciated. 1. The online doc [1] says just a File > Import is enough, but helios needs a project to start with. So I made a project, and in the setup wizard added the source. 2. There is an option to choose which version of java to use, I opted for java-6-sun-1.6.0.24 3. I tried to build from within eclipse but that generated some 30000+ errors. In order to step through the control flow, I need to get the code to compile from the IDE, using ant from the commandline doesn't help here. I'm a GSoC student for swift this summer, and I'm new to the java dev environment. So please bear with me if these sound silly. -- Thanks, Yadu From hategan at mcs.anl.gov Sat May 14 17:17:09 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 14 May 2011 15:17:09 -0700 Subject: [Swift-devel] Setting up swift dev environment in Eclipse In-Reply-To: References: Message-ID: <1305411429.3337.4.camel@blabla2.none> On Sun, 2011-05-15 at 03:33 +0530, Yadu Nand wrote: > Hi, > > I am trying to set up a development environment in Eclipse (Helios 3.6.2) > on ubuntu 10.10. I've gotten eclipse running but I'm facing some trouble > importing the source. I've listed out some issues that need clarification, > any help or advice would be greatly appreciated. > > 1. The online doc [1] says just a File > Import is enough, but helios needs > a project to start with. So I made a project, and in the setup wizard added > the source. Which doc? Import -> Existing Project Into Workspace. So you don't need to create a project. > > 2. There is an option to choose which version of java to use, I opted for > java-6-sun-1.6.0.24 That's fine. > > 3. I tried to build from within eclipse but that generated some 30000+ > errors. In order to step through the control flow, I need to get the code > to compile from the IDE, using ant from the commandline doesn't help > here. Not really. You can start the swift jvm with remote debugging enabled and then connect to the jvm from eclipse. There should be information online on how exactly to do that. Though you should still have the sources visible in eclipse. Mihael From hategan at mcs.anl.gov Sat May 14 17:33:12 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 14 May 2011 15:33:12 -0700 Subject: [Swift-devel] Setting up swift dev environment in Eclipse In-Reply-To: References: <1305411429.3337.4.camel@blabla2.none> Message-ID: <1305412392.4201.6.camel@blabla2.none> On Sun, 2011-05-15 at 03:54 +0530, Yadu Nand wrote: > Sorry missed the link to the doc I was referring to. > Here it is > https://sites.google.com/site/swiftdevel/internals/new-developer-guide That document doesn't seem quite complete. I personally use eclipse to compile and run swift. Essentially things are organized into modules (in the checkout they would be in cog/modules). All these modules should have eclipse projects inside them (i.e. cog/modules/swift/.project). These modules may depend on one another. You should import them one by one ending up with multiple eclipse projects. The essential modules are: swift abstraction abstraction-common provider-(local, gt2, coaster) jglobus util karajan There may be some others. In that case, eclipse will complain about missing projects, so also add the missing projects/modules if necessary. From ketancmaheshwari at gmail.com Tue May 17 09:32:26 2011 From: ketancmaheshwari at gmail.com (ketan) Date: Tue, 17 May 2011 09:32:26 -0500 Subject: [Swift-devel] ssh:pbs to beagle In-Reply-To: <1304018230.10193.0.camel@blabla2.none> References: <1403032176.2307.1304013826306.JavaMail.root@zimbra.anl.gov> <1952F257-772E-4C97-8AE8-0A73F8E9C7E8@gmail.com> <1304018230.10193.0.camel@blabla2.none> Message-ID: <4DD286FA.6030002@gmail.com> With lots of help from Mike, yesterday, we successfully submitted swift jobs from Bridled machine to Beagle via ssh:pbs. Following are the notes from the exercise: 1. Used, automatic coasters, with security, since there is no way to specify -nosec. This implies: a. make sure proxy is valid on both ends (bridled and beagle), using grid-proxy-init b. make sure ca certs are present on both ends, X509_CERT_DIR=/home/ketan/TRUSTEDCA, X509_CADIR=/home/ketan/TRUSTEDCA 2. For ssh authentication, make sure the auth.defaults is in place with proper authentication info and permissions: a. ~/.ssh/auth.defaults looks like the following for a key-based access: bridled.ci.uchicago.edu.type=key bridled.ci.uchicago.edu.username=uname bridled.ci.uchicago.edu.key=/path/to/your/private_key bridled.ci.uchicago.edu.passphrase=yourpassphrase login1.beagle.ci.uchicago.edu.type=key login1.beagle.ci.uchicago.edu.username=uname login1.beagle.ci.uchicago.edu.key=/path/to/your/private_key login1.beagle.ci.uchicago.edu.passphrase=yourpassphrase b. Make sure you have 600 perm on this auth.defaults file. 3. Java: We found the following exception was occuring because of IBM java on Beagle: Could not start connection handler java.io.EOFException We installed locally the Sun java and the above exception was gone. 4. Owing to the fact that beagle login nodes cannot write on /home filesystem, we encountered error 524 from worker.pl being unable to write workdirs/jobdirs to a previously set /home as workdir location. Make sure your workdir is set to /lustre/beagle/your/preferred/path. Alternatively, setting it to PADS /gpfs is also ok since worker nodes can write their. Beagle admins do not encourage this though. To wrap up, following are the relavant files; sites.xml: CI-CCR000013 24 pbs.aprun;pbs.mpp;depth=24 24 1000 1 1 1 .63 10000 /lustre/beagle/ketan/swift.workdir =========== tc: ssh-pbs cat /bin/cat null null null =========== cf: (note, provider staging is enabled, required) wrapperlog.always.transfer=true sitedir.keep=true execution.retries=1 lazy.errors=true status.mode=provider use.provider.staging=true provider.staging.pin.swiftfiles=false foreach.max.threads=10 provenance.log=true =========== swift commandline: swift -config cf -tc.file tc -sites.file beagle-coaster.xml catsn.swift -n=1 =========== Regards, Ketan On 4/28/11 2:17 PM, Mihael Hategan wrote: > What does your sites file look like? > > On Thu, 2011-04-28 at 13:36 -0500, Ketan Maheshwari wrote: >> Ok, I got past CredentialException with grid-proxy-init, now I am facing this (note: I have turned on provider staging) : >> >> ======== >> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc -sites.file beagle-coaster-ssh-pbs.xml catsn.swift -n=1 >> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified locally) >> >> RunID: 20110428-1332-llaa031f >> Progress: >> Could not start connection handler >> java.io.EOFException >> at org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:61) >> at org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:65) >> at org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:127) >> at org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:147) >> at org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:177) >> at org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.setSocket(AbstractTCPChannel.java:30) >> at org.globus.cog.karajan.workflow.service.channels.GSSChannel.(GSSChannel.java:47) >> at org.globus.cog.karajan.workflow.service.ConnectionHandler.(ConnectionHandler.java:41) >> at org.globus.cog.abstraction.coaster.service.local.LocalService.handleConnection(LocalService.java:63) >> at org.globus.net.BaseServer.run(BaseServer.java:247) >> at java.lang.Thread.run(Thread.java:662) >> Progress: Submitted:1 >> Could not start connection handler >> java.io.EOFException >> at org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:61) >> at org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:65) >> at org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:127) >> at org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:147) >> at org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:177) >> at org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.setSocket(AbstractTCPChannel.java:30) >> at org.globus.cog.karajan.workflow.service.channels.GSSChannel.(GSSChannel.java:47) >> at org.globus.cog.karajan.workflow.service.ConnectionHandler.(ConnectionHandler.java:41) >> at org.globus.cog.abstraction.coaster.service.local.LocalService.handleConnection(LocalService.java:63) >> at org.globus.net.BaseServer.run(BaseServer.java:247) >> at java.lang.Thread.run(Thread.java:662) >> Progress: Submitted:1 >> Exception in cat: >> Arguments: [data.txt] >> Host: beagle-remote-pbs-coasters-ssh >> Directory: catsn-20110428-1332-llaa031f/jobs/b/cat-bxal1d9kTODO: outs >> ---- >> >> Caused by: Could not submit job >> Caused by: org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could not submit job >> Caused by: org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could not start coaster service >> Caused by: org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Task ended before registration was received. >> STDOUT: >> STDERR: >> Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 1 >> Final status: Failed:1 >> The following errors have occurred: >> 1. Job failed with an exit code of 1 >> >> ======== >> >> >> From bridled to communicado, I see the following error: >> >> ************** >> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc -sites.file coaster-local-ssh-communicado.xml catsn.swift -n=1 >> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified locally) >> >> RunID: 20110428-1335-k685b2ye >> Progress: >> Progress: Submitted:1 >> Progress: Active:1 >> Exception in cat: >> Arguments: [data.txt] >> Host: communicado-ssh >> Directory: catsn-20110428-1335-k685b2ye/jobs/c/cat-coip1d9kTODO: outs >> ---- >> >> Caused by: Job failed with an exit code of 524 >> Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 524 >> Final status: Failed:1 >> The following errors have occurred: >> 1. Job failed with an exit code of 524 >> >> ************ >> >> >> -- >> Ketan >> >> >> >> >> On Apr 28, 2011, at 1:03 PM, Michael Wilde wrote: >> >>> For now - create a proxy using grid-proxy-init on the swift execution machine. >>> I think there is an option to set "no security" for this config but I cant recall where that is specified. Maybe swift.properties, I cant recall. >>> >>> - Mike >>> >>> ----- Original Message ----- >>>> Hi, >>>> >>>> It looks better now. However, I am getting the following: >>>> >>>> ===== >>>> >>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc -sites.file >>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 >>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog modified >>>> locally) >>>> >>>> RunID: 20110428-1251-oi9theh8 >>>> Progress: >>>> Progress: Stage in:1 >>>> Could not submit job >>>> Caused by: >>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: >>>> Could not submit job >>>> Caused by: >>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: >>>> Could not start coaster service >>>> Caused by: >>>> org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: >>>> org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] Proxy file >>>> (/tmp/x509up_u2006) not found. >>>> Caused by: org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] Proxy >>>> file (/tmp/x509up_u2006) not found. >>>> Failed to transfer wrapper log from >>>> catsn-20110428-1251-oi9theh8/info/e on beagle-remote-pbs-coasters-ssh >>>> >>>> ===== >>>> >>>> How do I specify "-nosec" on automatic coasters? >>>> >>>> Ketan >>>> >>>> On Apr 28, 2011, at 12:00 PM, Michael Wilde wrote: >>>> >>>>> OK. Was there a cookbook on the ssh settings? Did you set up a >>>>> $HOME/.ssh/auth.defaults per the user guide? >>>>> >>>>> Here is an auth.defaults example. Im not sure its 100% correct, but >>>>> could serve as a base for you: >>>>> >>>>> xlogin1.pads.ci.uchicago.edu.type=password >>>>> xlogin1.pads.ci.uchicago.edu.username=wilde >>>>> >>>>> login.pads.ci.uchicago.edu.type=key >>>>> login.pads.ci.uchicago.edu.username=wilde >>>>> login.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa >>>>> login.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE SURE >>>>> mode=600!!! >>>>> >>>>> login1.pads.ci.uchicago.edu.type=key >>>>> login1.pads.ci.uchicago.edu.username=wilde >>>>> login1.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa >>>>> login1.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE >>>>> SURE mode=600!!! >>>>> >>>>> login.mcs.anl.gov.type=key >>>>> login.mcs.anl.gov.username=wilde >>>>> login.mcs.anl.gov.key=/home/wilde/.ssh/swift_rsa >>>>> login.mcs.anl.gov.passphrase=yourpassphrasehere # MAKE SURE >>>>> mode=600!!! >>>>> >>>>> - Mike >>>>> >>>>> ----- Original Message ----- >>>>>> It does look like an ssh problem. I am getting the same stderr and >>>>>> log >>>>>> messages on trying to communicate from Bridled to Communicado. >>>>>> >>>>>> Ketan >>>>>> >>>>>> On Apr 28, 2011, at 11:19 AM, Michael Wilde wrote: >>>>>> >>>>>>> Have you already run a simple hellow-world swift test from >>>>>>> communicado to bridled to make sure you have ssh configured >>>>>>> correctly? I would do that first. >>>>>>> >>>>>>> Im not sure if an ssh problem explains what you show below, or >>>>>>> not. >>>>>>> >>>>>>> - Mike >>>>>>> >>>>>>> ----- Original Message ----- >>>>>>>> Thanks, I made the change. However, now, I am getting the >>>>>>>> following >>>>>>>> on >>>>>>>> my stderr >>>>>>>> >>>>>>>> >>>>>>>> =========== >>>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc >>>>>>>> -sites.file >>>>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 >>>>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog >>>>>>>> modified >>>>>>>> locally) >>>>>>>> >>>>>>>> RunID: 20110428-1022-n9s0k0e0 >>>>>>>> Progress: >>>>>>>> [ketan] >>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>> [ketan] Progress: Initializing site shared directory:1 >>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>> Progress: Initializing site shared directory:1 >>>>>>>> ======== >>>>>>>> >>>>>>>> And from the log it seems some network transmission has failed: >>>>>>>> >>>>>>>> 2011-04-28 10:22:45,261-0500 INFO TransportProtocolCommon Sending >>>>>>>> SSH_MSG_SERVICE_REQUEST >>>>>>>> 2011-04-28 10:22:45,264-0500 INFO TransportProtocolCommon >>>>>>>> Received >>>>>>>> SSH_MSG_SERVICE_ACCEPT >>>>>>>> 2011-04-28 10:24:27,626-0500 INFO TransportProtocolCommon The >>>>>>>> Transport Protocol thread failed >>>>>>>> java.io.IOException: The socket is EOF >>>>>>>> at >>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readBufferedData(TransportProtocolInputStream.java:183) >>>>>>>> at >>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readMessage(TransportProtocolInputStream.java:226) >>>>>>>> at >>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.processMessages(TransportProtocolCommon.java:1440) >>>>>>>> at >>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.startBinaryPacketProtocol(TransportProtocolCommon.java:1034) >>>>>>>> at >>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.run(TransportProtocolCommon.java:393) >>>>>>>> at java.lang.Thread.run(Thread.java:662) >>>>>>>> >>>>>>>> >>>>>>>> Any clues? >>>>>>>> Ketan >>>>>>>> >>>>>>>> >>>>>>>> On Apr 28, 2011, at 10:20 AM, Michael Wilde wrote: >>>>>>>> >>>>>>>>> The pool name in your sites file is pads-remote-pbs-coasters-ssh >>>>>>>>> but >>>>>>>>> you used pbs in your tc.data. >>>>>>>>> >>>>>>>>> - Mike >>>>>>>>> >>>>>>>>> ----- Original Message ----- >>>>>>>>>> Hello, >>>>>>>>>> >>>>>>>>>> Some context: >>>>>>>>>> I am trying to submit a big run on Beagle using swift + >>>>>>>>>> coasters. >>>>>>>>>> However, a previous run is already underway on beagle. So, >>>>>>>>>> there >>>>>>>>>> are >>>>>>>>>> two difficulties running a new run from its login node: >>>>>>>>>> >>>>>>>>>> 1. Running another swift from the same jvm will result in chaos >>>>>>>>>> on >>>>>>>>>> the >>>>>>>>>> logs (As far as I know, please correct me if this is not the >>>>>>>>>> case >>>>>>>>>> anymore) >>>>>>>>>> >>>>>>>>>> 2. Login node is already under load because of my running >>>>>>>>>> previous >>>>>>>>>> big >>>>>>>>>> run. >>>>>>>>>> >>>>>>>>>> /context >>>>>>>>>> >>>>>>>>>> So, I am now trying to submit this big run from a remote host >>>>>>>>>> (bridled). I know this has been done on PADS using ssh:pbs, >>>>>>>>>> provider >>>>>>>>>> coaster. I tried the similar approach on a trial swift script >>>>>>>>>> but >>>>>>>>>> getting error. >>>>>>>>>> >>>>>>>>>> Following is the error message: >>>>>>>>>> >>>>>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc >>>>>>>>>> -sites.file >>>>>>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 >>>>>>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog >>>>>>>>>> modified >>>>>>>>>> locally) >>>>>>>>>> >>>>>>>>>> RunID: 20110428-1002-c8rvqhe6 >>>>>>>>>> Progress: >>>>>>>>>> The application "cat" is not available in your tc.data catalog >>>>>>>>>> Caused by: >>>>>>>>>> org.globus.cog.karajan.scheduler.NoSuchResourceException >>>>>>>>>> Final status: Failed:1 >>>>>>>>>> The following errors have occurred: >>>>>>>>>> 1. The application "cat" is not available in your tc.data >>>>>>>>>> catalog >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Attached are my .swift, sites.xml and tc.data files. >>>>>>>>>> >>>>>>>>>> Could someone indicate if what I am doing is doable and if so >>>>>>>>>> how >>>>>>>>>> can >>>>>>>>>> I correctly configure my sites and tc setup. >>>>>>>>>> >>>>>>>>>> Thanks. >>>>>>>>>> Ketan >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Swift-devel mailing list >>>>>>>>>> Swift-devel at ci.uchicago.edu >>>>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>>>>> -- >>>>>>>>> Michael Wilde >>>>>>>>> Computation Institute, University of Chicago >>>>>>>>> Mathematics and Computer Science Division >>>>>>>>> Argonne National Laboratory >>>>>>>>> >>>>>>> -- >>>>>>> Michael Wilde >>>>>>> Computation Institute, University of Chicago >>>>>>> Mathematics and Computer Science Division >>>>>>> Argonne National Laboratory >>>>>>> >>>>> -- >>>>> Michael Wilde >>>>> Computation Institute, University of Chicago >>>>> Mathematics and Computer Science Division >>>>> Argonne National Laboratory >>>>> >>> -- >>> Michael Wilde >>> Computation Institute, University of Chicago >>> Mathematics and Computer Science Division >>> Argonne National Laboratory >>> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From wilde at mcs.anl.gov Tue May 17 09:42:09 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 17 May 2011 09:42:09 -0500 (CDT) Subject: [Swift-devel] ssh:pbs to beagle In-Reply-To: <4DD286FA.6030002@gmail.com> Message-ID: <301496644.73974.1305643329916.JavaMail.root@zimbra.anl.gov> ----- Original Message ----- > With lots of help from Mike, yesterday, we successfully submitted > swift > jobs from Bridled machine to Beagle via ssh:pbs. > > Following are the notes from the exercise: > > 1. Used, automatic coasters, with security, since there is no way to > specify -nosec. This implies: > a. make sure proxy is valid on both ends (bridled and beagle), > using grid-proxy-init > b. make sure ca certs are present on both ends, > X509_CERT_DIR=/home/ketan/TRUSTEDCA, X509_CADIR=/home/ketan/TRUSTEDCA We could specify -nosec if we do a version of this with the external coaster-servce process (at the cost of a bt more complexity, but we're trying to wrap that nicely in reliable scripts). This suggests, though, that many of the options provided by coaster-service should also be made available when the coaster service is run inside Swift (-passive, -nosec, and the options to write at least the worker connection port# to a file for scripting). The latter may still take some synchronization effort within a wrapper script that manually starts the workers. - Mike > 2. For ssh authentication, make sure the auth.defaults is in place > with > proper authentication info and permissions: > a. ~/.ssh/auth.defaults looks like the following for a key-based > access: > > bridled.ci.uchicago.edu.type=key > bridled.ci.uchicago.edu.username=uname > bridled.ci.uchicago.edu.key=/path/to/your/private_key > bridled.ci.uchicago.edu.passphrase=yourpassphrase > > > login1.beagle.ci.uchicago.edu.type=key > login1.beagle.ci.uchicago.edu.username=uname > login1.beagle.ci.uchicago.edu.key=/path/to/your/private_key > login1.beagle.ci.uchicago.edu.passphrase=yourpassphrase > > b. Make sure you have 600 perm on this auth.defaults file. > > 3. Java: We found the following exception was occuring because of IBM > java on Beagle: > > Could not start connection handler > java.io.EOFException > > We installed locally the Sun java and the above exception was gone. This makes me wonder if we should require Sun Java on Beagle (and make a module for it). It also suggests that our test suite be used to "certify" Swift on various Java implementations: at least we should advertise which Java we do release testing on and show users how to certify the release themselves on other Javas. > > 4. Owing to the fact that beagle login nodes cannot write on /home > filesystem, we encountered error 524 from worker.pl being unable to > write workdirs/jobdirs to a previously set /home as workdir location. > Make sure your workdir is set to /lustre/beagle/your/preferred/path. > Alternatively, setting it to PADS /gpfs is also ok since worker nodes > can write their. Beagle admins do not encourage this though. Mihael or Justin: I was surprised to see that coaster provider staging used the tag to determine the jobdir on the compute node, on Beagle where /tmp is not writeable. I always thought that it would honor the tag to let the user specify the provider staging jobdir. But this seems not to be the case. Can you clarify how the jobdir is determined in the provider staging case and also when the scratch tag is used and not? - Mike > > To wrap up, following are the relavant files; > sites.xml: > > > > jobmanager="ssh:pbs"/> > CI-CCR000013 > > 24 > > key="providerAttributes">pbs.aprun;pbs.mpp;depth=24 > > 24 > 1000 > 1 > 1 > 1 > > .63 > 10000 > > /lustre/beagle/ketan/swift.workdir > > > =========== > > tc: > > ssh-pbs cat /bin/cat null null null > =========== > > cf: (note, provider staging is enabled, required) > > wrapperlog.always.transfer=true > sitedir.keep=true > execution.retries=1 > lazy.errors=true > status.mode=provider > use.provider.staging=true > provider.staging.pin.swiftfiles=false > foreach.max.threads=10 > provenance.log=true > =========== > > swift commandline: > > swift -config cf -tc.file tc -sites.file beagle-coaster.xml > catsn.swift -n=1 > =========== > > > Regards, > Ketan > > > On 4/28/11 2:17 PM, Mihael Hategan wrote: > > What does your sites file look like? > > > > On Thu, 2011-04-28 at 13:36 -0500, Ketan Maheshwari wrote: > >> Ok, I got past CredentialException with grid-proxy-init, now I am > >> facing this (note: I have turned on provider staging) : > >> > >> ======== > >> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc > >> -sites.file beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > >> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog > >> modified locally) > >> > >> RunID: 20110428-1332-llaa031f > >> Progress: > >> Could not start connection handler > >> java.io.EOFException > >> at > >> org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:61) > >> at > >> org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:65) > >> at > >> org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:127) > >> at > >> org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:147) > >> at > >> org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:177) > >> at > >> org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.setSocket(AbstractTCPChannel.java:30) > >> at > >> org.globus.cog.karajan.workflow.service.channels.GSSChannel.(GSSChannel.java:47) > >> at > >> org.globus.cog.karajan.workflow.service.ConnectionHandler.(ConnectionHandler.java:41) > >> at > >> org.globus.cog.abstraction.coaster.service.local.LocalService.handleConnection(LocalService.java:63) > >> at org.globus.net.BaseServer.run(BaseServer.java:247) > >> at java.lang.Thread.run(Thread.java:662) > >> Progress: Submitted:1 > >> Could not start connection handler > >> java.io.EOFException > >> at > >> org.globus.gsi.gssapi.net.impl.GSIGssInputStream.readHandshakeToken(GSIGssInputStream.java:61) > >> at > >> org.globus.gsi.gssapi.net.impl.GSIGssSocket.readToken(GSIGssSocket.java:65) > >> at > >> org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:127) > >> at > >> org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:147) > >> at > >> org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:177) > >> at > >> org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.setSocket(AbstractTCPChannel.java:30) > >> at > >> org.globus.cog.karajan.workflow.service.channels.GSSChannel.(GSSChannel.java:47) > >> at > >> org.globus.cog.karajan.workflow.service.ConnectionHandler.(ConnectionHandler.java:41) > >> at > >> org.globus.cog.abstraction.coaster.service.local.LocalService.handleConnection(LocalService.java:63) > >> at org.globus.net.BaseServer.run(BaseServer.java:247) > >> at java.lang.Thread.run(Thread.java:662) > >> Progress: Submitted:1 > >> Exception in cat: > >> Arguments: [data.txt] > >> Host: beagle-remote-pbs-coasters-ssh > >> Directory: catsn-20110428-1332-llaa031f/jobs/b/cat-bxal1d9kTODO: > >> outs > >> ---- > >> > >> Caused by: Could not submit job > >> Caused by: > >> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > >> Could not submit job > >> Caused by: > >> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > >> Could not start coaster service > >> Caused by: > >> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > >> Task ended before registration was received. > >> STDOUT: > >> STDERR: > >> Caused by: > >> org.globus.cog.abstraction.impl.common.execution.JobException: Job > >> failed with an exit code of 1 > >> Final status: Failed:1 > >> The following errors have occurred: > >> 1. Job failed with an exit code of 1 > >> > >> ======== > >> > >> > >> From bridled to communicado, I see the following error: > >> > >> ************** > >> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc > >> -sites.file coaster-local-ssh-communicado.xml catsn.swift -n=1 > >> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog > >> modified locally) > >> > >> RunID: 20110428-1335-k685b2ye > >> Progress: > >> Progress: Submitted:1 > >> Progress: Active:1 > >> Exception in cat: > >> Arguments: [data.txt] > >> Host: communicado-ssh > >> Directory: catsn-20110428-1335-k685b2ye/jobs/c/cat-coip1d9kTODO: > >> outs > >> ---- > >> > >> Caused by: Job failed with an exit code of 524 > >> Caused by: > >> org.globus.cog.abstraction.impl.common.execution.JobException: Job > >> failed with an exit code of 524 > >> Final status: Failed:1 > >> The following errors have occurred: > >> 1. Job failed with an exit code of 524 > >> > >> ************ > >> > >> > >> -- > >> Ketan > >> > >> > >> > >> > >> On Apr 28, 2011, at 1:03 PM, Michael Wilde wrote: > >> > >>> For now - create a proxy using grid-proxy-init on the swift > >>> execution machine. > >>> I think there is an option to set "no security" for this config > >>> but I cant recall where that is specified. Maybe swift.properties, > >>> I cant recall. > >>> > >>> - Mike > >>> > >>> ----- Original Message ----- > >>>> Hi, > >>>> > >>>> It looks better now. However, I am getting the following: > >>>> > >>>> ===== > >>>> > >>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc > >>>> -sites.file > >>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > >>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog > >>>> modified > >>>> locally) > >>>> > >>>> RunID: 20110428-1251-oi9theh8 > >>>> Progress: > >>>> Progress: Stage in:1 > >>>> Could not submit job > >>>> Caused by: > >>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > >>>> Could not submit job > >>>> Caused by: > >>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > >>>> Could not start coaster service > >>>> Caused by: > >>>> org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: > >>>> org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] Proxy file > >>>> (/tmp/x509up_u2006) not found. > >>>> Caused by: org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] > >>>> Proxy > >>>> file (/tmp/x509up_u2006) not found. > >>>> Failed to transfer wrapper log from > >>>> catsn-20110428-1251-oi9theh8/info/e on > >>>> beagle-remote-pbs-coasters-ssh > >>>> > >>>> ===== > >>>> > >>>> How do I specify "-nosec" on automatic coasters? > >>>> > >>>> Ketan > >>>> > >>>> On Apr 28, 2011, at 12:00 PM, Michael Wilde wrote: > >>>> > >>>>> OK. Was there a cookbook on the ssh settings? Did you set up a > >>>>> $HOME/.ssh/auth.defaults per the user guide? > >>>>> > >>>>> Here is an auth.defaults example. Im not sure its 100% correct, > >>>>> but > >>>>> could serve as a base for you: > >>>>> > >>>>> xlogin1.pads.ci.uchicago.edu.type=password > >>>>> xlogin1.pads.ci.uchicago.edu.username=wilde > >>>>> > >>>>> login.pads.ci.uchicago.edu.type=key > >>>>> login.pads.ci.uchicago.edu.username=wilde > >>>>> login.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa > >>>>> login.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE > >>>>> SURE > >>>>> mode=600!!! > >>>>> > >>>>> login1.pads.ci.uchicago.edu.type=key > >>>>> login1.pads.ci.uchicago.edu.username=wilde > >>>>> login1.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa > >>>>> login1.pads.ci.uchicago.edu.passphrase=yourpassphrasehere # MAKE > >>>>> SURE mode=600!!! > >>>>> > >>>>> login.mcs.anl.gov.type=key > >>>>> login.mcs.anl.gov.username=wilde > >>>>> login.mcs.anl.gov.key=/home/wilde/.ssh/swift_rsa > >>>>> login.mcs.anl.gov.passphrase=yourpassphrasehere # MAKE SURE > >>>>> mode=600!!! > >>>>> > >>>>> - Mike > >>>>> > >>>>> ----- Original Message ----- > >>>>>> It does look like an ssh problem. I am getting the same stderr > >>>>>> and > >>>>>> log > >>>>>> messages on trying to communicate from Bridled to Communicado. > >>>>>> > >>>>>> Ketan > >>>>>> > >>>>>> On Apr 28, 2011, at 11:19 AM, Michael Wilde wrote: > >>>>>> > >>>>>>> Have you already run a simple hellow-world swift test from > >>>>>>> communicado to bridled to make sure you have ssh configured > >>>>>>> correctly? I would do that first. > >>>>>>> > >>>>>>> Im not sure if an ssh problem explains what you show below, or > >>>>>>> not. > >>>>>>> > >>>>>>> - Mike > >>>>>>> > >>>>>>> ----- Original Message ----- > >>>>>>>> Thanks, I made the change. However, now, I am getting the > >>>>>>>> following > >>>>>>>> on > >>>>>>>> my stderr > >>>>>>>> > >>>>>>>> > >>>>>>>> =========== > >>>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc > >>>>>>>> -sites.file > >>>>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > >>>>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog > >>>>>>>> modified > >>>>>>>> locally) > >>>>>>>> > >>>>>>>> RunID: 20110428-1022-n9s0k0e0 > >>>>>>>> Progress: > >>>>>>>> [ketan] > >>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>> [ketan] Progress: Initializing site shared directory:1 > >>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>> Progress: Initializing site shared directory:1 > >>>>>>>> ======== > >>>>>>>> > >>>>>>>> And from the log it seems some network transmission has > >>>>>>>> failed: > >>>>>>>> > >>>>>>>> 2011-04-28 10:22:45,261-0500 INFO TransportProtocolCommon > >>>>>>>> Sending > >>>>>>>> SSH_MSG_SERVICE_REQUEST > >>>>>>>> 2011-04-28 10:22:45,264-0500 INFO TransportProtocolCommon > >>>>>>>> Received > >>>>>>>> SSH_MSG_SERVICE_ACCEPT > >>>>>>>> 2011-04-28 10:24:27,626-0500 INFO TransportProtocolCommon The > >>>>>>>> Transport Protocol thread failed > >>>>>>>> java.io.IOException: The socket is EOF > >>>>>>>> at > >>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readBufferedData(TransportProtocolInputStream.java:183) > >>>>>>>> at > >>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolInputStream.readMessage(TransportProtocolInputStream.java:226) > >>>>>>>> at > >>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.processMessages(TransportProtocolCommon.java:1440) > >>>>>>>> at > >>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.startBinaryPacketProtocol(TransportProtocolCommon.java:1034) > >>>>>>>> at > >>>>>>>> com.sshtools.j2ssh.transport.TransportProtocolCommon.run(TransportProtocolCommon.java:393) > >>>>>>>> at java.lang.Thread.run(Thread.java:662) > >>>>>>>> > >>>>>>>> > >>>>>>>> Any clues? > >>>>>>>> Ketan > >>>>>>>> > >>>>>>>> > >>>>>>>> On Apr 28, 2011, at 10:20 AM, Michael Wilde wrote: > >>>>>>>> > >>>>>>>>> The pool name in your sites file is > >>>>>>>>> pads-remote-pbs-coasters-ssh > >>>>>>>>> but > >>>>>>>>> you used pbs in your tc.data. > >>>>>>>>> > >>>>>>>>> - Mike > >>>>>>>>> > >>>>>>>>> ----- Original Message ----- > >>>>>>>>>> Hello, > >>>>>>>>>> > >>>>>>>>>> Some context: > >>>>>>>>>> I am trying to submit a big run on Beagle using swift + > >>>>>>>>>> coasters. > >>>>>>>>>> However, a previous run is already underway on beagle. So, > >>>>>>>>>> there > >>>>>>>>>> are > >>>>>>>>>> two difficulties running a new run from its login node: > >>>>>>>>>> > >>>>>>>>>> 1. Running another swift from the same jvm will result in > >>>>>>>>>> chaos > >>>>>>>>>> on > >>>>>>>>>> the > >>>>>>>>>> logs (As far as I know, please correct me if this is not > >>>>>>>>>> the > >>>>>>>>>> case > >>>>>>>>>> anymore) > >>>>>>>>>> > >>>>>>>>>> 2. Login node is already under load because of my running > >>>>>>>>>> previous > >>>>>>>>>> big > >>>>>>>>>> run. > >>>>>>>>>> > >>>>>>>>>> /context > >>>>>>>>>> > >>>>>>>>>> So, I am now trying to submit this big run from a remote > >>>>>>>>>> host > >>>>>>>>>> (bridled). I know this has been done on PADS using ssh:pbs, > >>>>>>>>>> provider > >>>>>>>>>> coaster. I tried the similar approach on a trial swift > >>>>>>>>>> script > >>>>>>>>>> but > >>>>>>>>>> getting error. > >>>>>>>>>> > >>>>>>>>>> Following is the error message: > >>>>>>>>>> > >>>>>>>>>> [ketan at bridled catsn.works]$ swift -config cf -tc.file tc > >>>>>>>>>> -sites.file > >>>>>>>>>> beagle-coaster-ssh-pbs.xml catsn.swift -n=1 > >>>>>>>>>> Swift svn swift-r4252 (swift modified locally) cog-r3088 > >>>>>>>>>> (cog > >>>>>>>>>> modified > >>>>>>>>>> locally) > >>>>>>>>>> > >>>>>>>>>> RunID: 20110428-1002-c8rvqhe6 > >>>>>>>>>> Progress: > >>>>>>>>>> The application "cat" is not available in your tc.data > >>>>>>>>>> catalog > >>>>>>>>>> Caused by: > >>>>>>>>>> org.globus.cog.karajan.scheduler.NoSuchResourceException > >>>>>>>>>> Final status: Failed:1 > >>>>>>>>>> The following errors have occurred: > >>>>>>>>>> 1. The application "cat" is not available in your tc.data > >>>>>>>>>> catalog > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Attached are my .swift, sites.xml and tc.data files. > >>>>>>>>>> > >>>>>>>>>> Could someone indicate if what I am doing is doable and if > >>>>>>>>>> so > >>>>>>>>>> how > >>>>>>>>>> can > >>>>>>>>>> I correctly configure my sites and tc setup. > >>>>>>>>>> > >>>>>>>>>> Thanks. > >>>>>>>>>> Ketan > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> _______________________________________________ > >>>>>>>>>> Swift-devel mailing list > >>>>>>>>>> Swift-devel at ci.uchicago.edu > >>>>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>>>>>>>> -- > >>>>>>>>> Michael Wilde > >>>>>>>>> Computation Institute, University of Chicago > >>>>>>>>> Mathematics and Computer Science Division > >>>>>>>>> Argonne National Laboratory > >>>>>>>>> > >>>>>>> -- > >>>>>>> Michael Wilde > >>>>>>> Computation Institute, University of Chicago > >>>>>>> Mathematics and Computer Science Division > >>>>>>> Argonne National Laboratory > >>>>>>> > >>>>> -- > >>>>> Michael Wilde > >>>>> Computation Institute, University of Chicago > >>>>> Mathematics and Computer Science Division > >>>>> Argonne National Laboratory > >>>>> > >>> -- > >>> Michael Wilde > >>> Computation Institute, University of Chicago > >>> Mathematics and Computer Science Division > >>> Argonne National Laboratory > >>> > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From yadudoc1729 at gmail.com Tue May 17 11:14:39 2011 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Tue, 17 May 2011 21:44:39 +0530 Subject: [Swift-devel] Setting up swift dev environment in Eclipse In-Reply-To: <1305412392.4201.6.camel@blabla2.none> References: <1305411429.3337.4.camel@blabla2.none> <1305412392.4201.6.camel@blabla2.none> Message-ID: Hi, I imported the following modules into a new workspace : abstraction abstraction-common abstraction-provider-(coaster,condor,dcache,gt2,gt4_0_0,local, localscheduler, webdav, ssh) grapheditor jglobus karajan resources swift util Now, I am able to compile properly into a java application but 3 errors and 3801 warnings remain. I am not sure if these need to be fixed. So, I'm pasting the 3 errors. Description Resource Path Location Type org.globus.cog.abstraction.impl.file.local cannot be resolved to a type FileResourceImpl.java /abstraction-provider-dcache/src/org/globus/cog/abstraction/impl/file/dcache line 20 Java Problem Description Resource Path Location Type The method resolve(String) is undefined for the type FileResourceImpl FileResourceImpl.java /abstraction-provider-dcache/src/org/globus/cog/abstraction/impl/file/dcache line 78 Java Problem Description Resource Path Location Type The method resolve(String) is undefined for the type FileResourceImpl FileResourceImpl.java /abstraction-provider-dcache/src/org/globus/cog/abstraction/impl/file/dcache line 79 Java Problem On Sun, May 15, 2011 at 4:03 AM, Mihael Hategan wrote: > > > On Sun, 2011-05-15 at 03:54 +0530, Yadu Nand wrote: >> Sorry missed the link to the doc I was referring to. >> Here it is >> https://sites.google.com/site/swiftdevel/internals/new-developer-guide > > That document doesn't seem quite complete. > > I personally use eclipse to compile and run swift. > > Essentially things are organized into modules (in the checkout they > would be in cog/modules). All these modules should have eclipse projects > inside them (i.e. cog/modules/swift/.project). These modules may depend > on one another. You should import them one by one ending up with > multiple eclipse projects. > > The essential modules are: > swift > abstraction > abstraction-common > provider-(local, gt2, coaster) > jglobus > util > karajan > > There may be some others. In that case, eclipse will complain about > missing projects, so also add the missing projects/modules if necessary. > > > -- Thanks and Regards, Yadu Nand B From hategan at mcs.anl.gov Tue May 17 12:03:30 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 17 May 2011 10:03:30 -0700 Subject: [Swift-devel] Setting up swift dev environment in Eclipse In-Reply-To: References: <1305411429.3337.4.camel@blabla2.none> <1305412392.4201.6.camel@blabla2.none> Message-ID: <1305651810.30986.0.camel@blabla2.none> Either ignore that or close provider-dcache. Mihael On Tue, 2011-05-17 at 21:44 +0530, Yadu Nand wrote: > Hi, > > I imported the following modules into a new workspace : > > abstraction > abstraction-common > abstraction-provider-(coaster,condor,dcache,gt2,gt4_0_0,local, > localscheduler, webdav, ssh) > grapheditor > jglobus > karajan > resources > swift > util > > Now, I am able to compile properly into a java application but 3 errors > and 3801 warnings remain. I am not sure if these need to be fixed. > So, I'm pasting the 3 errors. > > Description Resource Path Location Type > org.globus.cog.abstraction.impl.file.local cannot be resolved to a > type FileResourceImpl.java /abstraction-provider-dcache/src/org/globus/cog/abstraction/impl/file/dcache line > 20 Java Problem > > Description Resource Path Location Type > The method resolve(String) is undefined for the type > FileResourceImpl FileResourceImpl.java /abstraction-provider-dcache/src/org/globus/cog/abstraction/impl/file/dcache line > 78 Java Problem > > Description Resource Path Location Type > The method resolve(String) is undefined for the type > FileResourceImpl FileResourceImpl.java /abstraction-provider-dcache/src/org/globus/cog/abstraction/impl/file/dcache line > 79 Java Problem > > > On Sun, May 15, 2011 at 4:03 AM, Mihael Hategan wrote: > > > > > > On Sun, 2011-05-15 at 03:54 +0530, Yadu Nand wrote: > >> Sorry missed the link to the doc I was referring to. > >> Here it is > >> https://sites.google.com/site/swiftdevel/internals/new-developer-guide > > > > That document doesn't seem quite complete. > > > > I personally use eclipse to compile and run swift. > > > > Essentially things are organized into modules (in the checkout they > > would be in cog/modules). All these modules should have eclipse projects > > inside them (i.e. cog/modules/swift/.project). These modules may depend > > on one another. You should import them one by one ending up with > > multiple eclipse projects. > > > > The essential modules are: > > swift > > abstraction > > abstraction-common > > provider-(local, gt2, coaster) > > jglobus > > util > > karajan > > > > There may be some others. In that case, eclipse will complain about > > missing projects, so also add the missing projects/modules if necessary. > > > > > > > From wozniak at mcs.anl.gov Tue May 17 12:54:18 2011 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Tue, 17 May 2011 12:54:18 -0500 (Central Daylight Time) Subject: [Swift-devel] www cron behavior Message-ID: Hello Does anyone have cron doing anything in the Swift www space? If so, please let David and me know. Thanks -- Justin M Wozniak From yadudoc1729 at gmail.com Tue May 17 13:07:58 2011 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Tue, 17 May 2011 23:37:58 +0530 Subject: [Swift-devel] Setting up swift dev environment in Eclipse In-Reply-To: <1305651810.30986.0.camel@blabla2.none> References: <1305411429.3337.4.camel@blabla2.none> <1305412392.4201.6.camel@blabla2.none> <1305651810.30986.0.camel@blabla2.none> Message-ID: Hi, Ignoring abstraction-provider-dcache (as in removing it from the list of imported modules ) brought me back to an error saying : Description Resource Path Location Type Project 'abstraction' is missing required Java project: 'abstraction-provider-dcache' abstraction Build path Build Path Problem Description Resource Path Location Type The project cannot be built until build path errors are resolved abstraction Unknown Java Problem On Tue, May 17, 2011 at 10:33 PM, Mihael Hategan wrote: > Either ignore that or close provider-dcache. > > Mihael > > On Tue, 2011-05-17 at 21:44 +0530, Yadu Nand wrote: >> Hi, >> >> I imported the following modules into a new workspace : >> >> abstraction >> abstraction-common >> abstraction-provider-(coaster,condor,dcache,gt2,gt4_0_0,local, >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?localscheduler, webdav, ssh) >> grapheditor >> jglobus >> karajan >> resources >> swift >> util >> >> Now, I am able to compile properly into a java application but 3 errors >> and 3801 warnings remain. I am not sure if these need to be fixed. >> So, I'm pasting the 3 errors. >> >> Description ? Resource ? ? ? ?Path ? ?Location ? ? ? ?Type >> org.globus.cog.abstraction.impl.file.local cannot be resolved to a >> type ?FileResourceImpl.java ? /abstraction-provider-dcache/src/org/globus/cog/abstraction/impl/file/dcache ? ?line >> 20 ? ?Java Problem >> >> Description ? Resource ? ? ? ?Path ? ?Location ? ? ? ?Type >> The method resolve(String) is undefined for the type >> FileResourceImpl ? ? ?FileResourceImpl.java ? /abstraction-provider-dcache/src/org/globus/cog/abstraction/impl/file/dcache ? ?line >> 78 ? ?Java Problem >> >> Description ? Resource ? ? ? ?Path ? ?Location ? ? ? ?Type >> The method resolve(String) is undefined for the type >> FileResourceImpl ? ? ?FileResourceImpl.java ? /abstraction-provider-dcache/src/org/globus/cog/abstraction/impl/file/dcache ? ?line >> 79 ? ?Java Problem >> >> >> On Sun, May 15, 2011 at 4:03 AM, Mihael Hategan wrote: >> > >> > >> > On Sun, 2011-05-15 at 03:54 +0530, Yadu Nand wrote: >> >> Sorry missed the link to the doc I was referring to. >> >> Here it is >> >> https://sites.google.com/site/swiftdevel/internals/new-developer-guide >> > >> > That document doesn't seem quite complete. >> > >> > I personally use eclipse to compile and run swift. >> > >> > Essentially things are organized into modules (in the checkout they >> > would be in cog/modules). All these modules should have eclipse projects >> > inside them (i.e. cog/modules/swift/.project). These modules may depend >> > on one another. You should import them one by one ending up with >> > multiple eclipse projects. >> > >> > The essential modules are: >> > swift >> > abstraction >> > abstraction-common >> > provider-(local, gt2, coaster) >> > jglobus >> > util >> > karajan >> > >> > There may be some others. In that case, eclipse will complain about >> > missing projects, so also add the missing projects/modules if necessary. >> > >> > >> > >> > > > -- Thanks and Regards, Yadu Nand B From hategan at mcs.anl.gov Tue May 17 13:52:13 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 17 May 2011 11:52:13 -0700 Subject: [Swift-devel] Setting up swift dev environment in Eclipse In-Reply-To: References: <1305411429.3337.4.camel@blabla2.none> <1305412392.4201.6.camel@blabla2.none> <1305651810.30986.0.camel@blabla2.none> Message-ID: <1305658333.31768.9.camel@blabla2.none> Click on project "abstraction", right-click to bring up the context menu, choose "Properties" (alternatively Alt+Enter), click "Java Build Path", choose the "Projects" tab and there remove the dependency on provider-dcache. Again, you could simply ignore the issue. The compiler will insert dummy code for methods that don't compile. That code will throw an exception if you try to invoke the respective methods, but since you're not going to use that provider, the issue will not show up. On Tue, 2011-05-17 at 23:37 +0530, Yadu Nand wrote: > Hi, > > Ignoring abstraction-provider-dcache (as in removing it from the list > of imported > modules ) brought me back to an error saying : > > Description Resource Path Location Type > Project 'abstraction' is missing required Java project: > 'abstraction-provider-dcache' abstraction Build path Build Path > Problem > > Description Resource Path Location Type > The project cannot be built until build path errors are > resolved abstraction Unknown Java Problem > > > On Tue, May 17, 2011 at 10:33 PM, Mihael Hategan wrote: > > Either ignore that or close provider-dcache. > > > > Mihael > > > > On Tue, 2011-05-17 at 21:44 +0530, Yadu Nand wrote: > >> Hi, > >> > >> I imported the following modules into a new workspace : > >> > >> abstraction > >> abstraction-common > >> abstraction-provider-(coaster,condor,dcache,gt2,gt4_0_0,local, > >> localscheduler, webdav, ssh) > >> grapheditor > >> jglobus > >> karajan > >> resources > >> swift > >> util > >> > >> Now, I am able to compile properly into a java application but 3 errors > >> and 3801 warnings remain. I am not sure if these need to be fixed. > >> So, I'm pasting the 3 errors. > >> > >> Description Resource Path Location Type > >> org.globus.cog.abstraction.impl.file.local cannot be resolved to a > >> type FileResourceImpl.java /abstraction-provider-dcache/src/org/globus/cog/abstraction/impl/file/dcache line > >> 20 Java Problem > >> > >> Description Resource Path Location Type > >> The method resolve(String) is undefined for the type > >> FileResourceImpl FileResourceImpl.java /abstraction-provider-dcache/src/org/globus/cog/abstraction/impl/file/dcache line > >> 78 Java Problem > >> > >> Description Resource Path Location Type > >> The method resolve(String) is undefined for the type > >> FileResourceImpl FileResourceImpl.java /abstraction-provider-dcache/src/org/globus/cog/abstraction/impl/file/dcache line > >> 79 Java Problem > >> > >> > >> On Sun, May 15, 2011 at 4:03 AM, Mihael Hategan wrote: > >> > > >> > > >> > On Sun, 2011-05-15 at 03:54 +0530, Yadu Nand wrote: > >> >> Sorry missed the link to the doc I was referring to. > >> >> Here it is > >> >> https://sites.google.com/site/swiftdevel/internals/new-developer-guide > >> > > >> > That document doesn't seem quite complete. > >> > > >> > I personally use eclipse to compile and run swift. > >> > > >> > Essentially things are organized into modules (in the checkout they > >> > would be in cog/modules). All these modules should have eclipse projects > >> > inside them (i.e. cog/modules/swift/.project). These modules may depend > >> > on one another. You should import them one by one ending up with > >> > multiple eclipse projects. > >> > > >> > The essential modules are: > >> > swift > >> > abstraction > >> > abstraction-common > >> > provider-(local, gt2, coaster) > >> > jglobus > >> > util > >> > karajan > >> > > >> > There may be some others. In that case, eclipse will complain about > >> > missing projects, so also add the missing projects/modules if necessary. > >> > > >> > > >> > > >> > > > > > > > > > From wozniak at mcs.anl.gov Tue May 17 16:56:08 2011 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Tue, 17 May 2011 16:56:08 -0500 (CDT) Subject: [Swift-devel] DataNode.toString() In-Reply-To: <1305329222.3494.18.camel@blabla2.none> References: <1305329222.3494.18.camel@blabla2.none> Message-ID: I think this change affects arguments to apps: type file; (file f) echo (int i) { app { echo i stdout=@f; } } int greetings = 2; file hw = echo(greetings); ------> DEBUG vdl:execute2 JOB_START jobid=echo-tnyfi9ak tr=echo arguments=[greetings:int = 2.0 - Closed] ... On Fri, 13 May 2011, Mihael Hategan wrote: > I changed that in trunk. It used to be: > > org.griphyn.vdl.mapping.RootDataNode identifier > dataset:20110512-2343-5rl3b7x5:720000000072 type Sgt with no value at > dataset=sgt_var (not closed) > > That was annoying, noisy, and I had no idea what's what. > > It is now: > > name:type = value - [Open/Closed] > > The provenance data should still be the same, but it may not. So please > let me know if anything breaks. > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -- Justin M Wozniak From hategan at mcs.anl.gov Tue May 17 17:51:06 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 17 May 2011 15:51:06 -0700 Subject: [Swift-devel] DataNode.toString() In-Reply-To: References: <1305329222.3494.18.camel@blabla2.none> Message-ID: <1305672666.3986.2.camel@blabla2.none> Indeed. I committed a fix to unwrap swift data when passing values to execute() (stdin, out, err, and the arguments in particular). I don't think toString() should be "overloaded" like it was. On Tue, 2011-05-17 at 16:56 -0500, Justin M Wozniak wrote: > I think this change affects arguments to apps: > > type file; > > (file f) echo (int i) { > app { echo i stdout=@f; } > } > > int greetings = 2; > file hw = echo(greetings); > > ------> > > DEBUG vdl:execute2 JOB_START jobid=echo-tnyfi9ak tr=echo > arguments=[greetings:int = 2.0 - Closed] > ... > > On Fri, 13 May 2011, Mihael Hategan wrote: > > > I changed that in trunk. It used to be: > > > > org.griphyn.vdl.mapping.RootDataNode identifier > > dataset:20110512-2343-5rl3b7x5:720000000072 type Sgt with no value at > > dataset=sgt_var (not closed) > > > > That was annoying, noisy, and I had no idea what's what. > > > > It is now: > > > > name:type = value - [Open/Closed] > > > > The provenance data should still be the same, but it may not. So please > > let me know if anything breaks. > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > From hategan at mcs.anl.gov Thu May 19 15:52:08 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 19 May 2011 13:52:08 -0700 Subject: [Swift-devel] telecon Message-ID: <1305838328.6364.0.camel@blabla2.none> I don't have skype working. Is there any way this can be bridged to normal phone calls? From wilde at mcs.anl.gov Thu May 19 17:03:07 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 19 May 2011 17:03:07 -0500 (CDT) Subject: [Swift-devel] telecon In-Reply-To: <1305838328.6364.0.camel@blabla2.none> Message-ID: <363095809.87424.1305842587769.JavaMail.root@zimbra.anl.gov> Sure. We just need to remember to start bridging you in. - Mike ----- Original Message ----- > I don't have skype working. Is there any way this can be bridged to > normal phone calls? > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Thu May 19 17:10:07 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 19 May 2011 15:10:07 -0700 Subject: [Swift-devel] telecon In-Reply-To: <363095809.87424.1305842587769.JavaMail.root@zimbra.anl.gov> References: <363095809.87424.1305842587769.JavaMail.root@zimbra.anl.gov> Message-ID: <1305843007.8401.0.camel@blabla2.none> Wasn't this happening an hour ago? On Thu, 2011-05-19 at 17:03 -0500, Michael Wilde wrote: > Sure. We just need to remember to start bridging you in. > > - Mike > > ----- Original Message ----- > > I don't have skype working. Is there any way this can be bridged to > > normal phone calls? > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From wilde at mcs.anl.gov Thu May 19 17:23:59 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 19 May 2011 17:23:59 -0500 (CDT) Subject: [Swift-devel] telecon In-Reply-To: <1305843007.8401.0.camel@blabla2.none> Message-ID: <1126504715.87500.1305843839492.JavaMail.root@zimbra.anl.gov> 25 hours ago :) Wed 4PM CDT. Sorry, we should have patched you in - will start doing that. Best if you can IM Justin or me that you are ready for the call. - Mike ----- Original Message ----- > Wasn't this happening an hour ago? > > On Thu, 2011-05-19 at 17:03 -0500, Michael Wilde wrote: > > Sure. We just need to remember to start bridging you in. > > > > - Mike > > > > ----- Original Message ----- > > > I don't have skype working. Is there any way this can be bridged > > > to > > > normal phone calls? > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wozniak at mcs.anl.gov Thu May 19 18:05:38 2011 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Thu, 19 May 2011 18:05:38 -0500 (Central Daylight Time) Subject: [Swift-devel] telecon In-Reply-To: <1126504715.87500.1305843839492.JavaMail.root@zimbra.anl.gov> References: <1126504715.87500.1305843839492.JavaMail.root@zimbra.anl.gov> Message-ID: I'll send reminder emails in the future. On Thu, 19 May 2011, Michael Wilde wrote: > 25 hours ago :) Wed 4PM CDT. Sorry, we should have patched you in - > will start doing that. Best if you can IM Justin or me that you are > ready for the call. > > - Mike > > ----- Original Message ----- >> Wasn't this happening an hour ago? >> >> On Thu, 2011-05-19 at 17:03 -0500, Michael Wilde wrote: >>> Sure. We just need to remember to start bridging you in. >>> >>> - Mike >>> >>> ----- Original Message ----- >>>> I don't have skype working. Is there any way this can be bridged >>>> to >>>> normal phone calls? >>>> >>>> _______________________________________________ >>>> Swift-devel mailing list >>>> Swift-devel at ci.uchicago.edu >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> > > -- Justin M Wozniak From ketancmaheshwari at gmail.com Fri May 20 15:14:26 2011 From: ketancmaheshwari at gmail.com (ketan) Date: Fri, 20 May 2011 15:14:26 -0500 Subject: [Swift-devel] recent trunk changes swift parsing Message-ID: <4DD6CBA2.10407@gmail.com> I updated trunk and seems swift parsing has changed a bit? I see this: Swift svn swift-r4502 cog-r3128 (cog modified locally) RunID: 20110520-2005-kst5ztsf Progress: time: Fri, 20 May 2011 20:05:51 +0000 SwiftScript trace: str_roots.[0]:string = 3lyv-4 - Closed Execution failed: For input string: "3.0" for my this swift: type file_pdb; type file_dat; app (file_dat dat_file) do_one_dock( string param_root, string param_modulo, file_pdb param_file_static, file_pdb param_file_mobile ) { modftdock 32 "-modulo" @param_modulo "-root" @param_root "-static" @param_file_static "-mobile" @param_file_mobile "-calculate_grid" @arg("grid","2.5") "-angle_step" "10" "-keep" "10" "-noelec"; } string modulus = @arg("m","100"); string str_roots[] = readData( @arg( "list" ) ); foreach str_root in str_roots { trace( str_root ); string str_file_static = @strcat( "input/", str_root, ".pdb" ); string str_file_mobile = "input/4TRA.pdb"; file_pdb file_static < single_file_mapper; file = str_file_static >; file_pdb file_mobile < single_file_mapper; file = str_file_mobile >; file_dat dat_files[] < simple_mapper; padding = 3, location = "output", prefix = @strcat( str_root, "_" ), suffix = ".dat" >; // break docking jobs + do 'em in parallel int n = @toint(@arg("n","1")); foreach mod_index in [0:n-1] { string str_modulo = @strcat(mod_index, ":", modulus); dat_files[ mod_index ] = do_one_dock( str_root, str_modulo, file_static, file_mobile ); } } From hategan at mcs.anl.gov Fri May 20 15:31:28 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 20 May 2011 13:31:28 -0700 Subject: [Swift-devel] recent trunk changes swift parsing In-Reply-To: <4DD6CBA2.10407@gmail.com> References: <4DD6CBA2.10407@gmail.com> Message-ID: <1305923488.5757.0.camel@blabla2.none> Can you post a stack trace or a pointer to the log? On Fri, 2011-05-20 at 15:14 -0500, ketan wrote: > I updated trunk and seems swift parsing has changed a bit? > > I see this: > > Swift svn swift-r4502 cog-r3128 (cog modified locally) > > RunID: 20110520-2005-kst5ztsf > Progress: time: Fri, 20 May 2011 20:05:51 +0000 > SwiftScript trace: str_roots.[0]:string = 3lyv-4 - Closed > Execution failed: > For input string: "3.0" > > > for my this swift: > > type file_pdb; > type file_dat; > > app (file_dat dat_file) > do_one_dock( > string param_root, > string param_modulo, > file_pdb param_file_static, > file_pdb param_file_mobile ) > { > modftdock 32 > "-modulo" @param_modulo > "-root" @param_root > "-static" @param_file_static > "-mobile" @param_file_mobile > "-calculate_grid" @arg("grid","2.5") > "-angle_step" "10" > "-keep" "10" > "-noelec"; > } > > string modulus = @arg("m","100"); > string str_roots[] = readData( @arg( "list" ) ); > > foreach str_root in str_roots > { > trace( str_root ); > > string str_file_static = @strcat( "input/", str_root, ".pdb" ); > string str_file_mobile = "input/4TRA.pdb"; > > file_pdb file_static < single_file_mapper; file = str_file_static >; > file_pdb file_mobile < single_file_mapper; file = str_file_mobile >; > file_dat dat_files[] < simple_mapper; > padding = 3, > location = "output", > prefix = @strcat( str_root, "_" ), > suffix = ".dat" >; > > // break docking jobs + do 'em in parallel > int n = @toint(@arg("n","1")); > foreach mod_index in [0:n-1] > { > string str_modulo = @strcat(mod_index, ":", modulus); > dat_files[ mod_index ] = do_one_dock( str_root, > str_modulo, > file_static, > file_mobile ); > } > } > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From wozniak at mcs.anl.gov Fri May 20 15:43:55 2011 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Fri, 20 May 2011 15:43:55 -0500 (CDT) Subject: [Swift-devel] recent trunk changes swift parsing In-Reply-To: <1305923488.5757.0.camel@blabla2.none> References: <4DD6CBA2.10407@gmail.com> <1305923488.5757.0.camel@blabla2.none> Message-ID: Here are two related cases to consider (one commented out). type file; app (file o) touch() { touch @o; } file t[] ; /* foreach i in t { i = touch(); } */ foreach i in [1:5] { t[i] = touch(); } On Fri, 20 May 2011, Mihael Hategan wrote: > Can you post a stack trace or a pointer to the log? > > On Fri, 2011-05-20 at 15:14 -0500, ketan wrote: >> I updated trunk and seems swift parsing has changed a bit? >> >> I see this: >> >> Swift svn swift-r4502 cog-r3128 (cog modified locally) >> >> RunID: 20110520-2005-kst5ztsf >> Progress: time: Fri, 20 May 2011 20:05:51 +0000 >> SwiftScript trace: str_roots.[0]:string = 3lyv-4 - Closed >> Execution failed: >> For input string: "3.0" >> >> >> for my this swift: >> >> type file_pdb; >> type file_dat; >> >> app (file_dat dat_file) >> do_one_dock( >> string param_root, >> string param_modulo, >> file_pdb param_file_static, >> file_pdb param_file_mobile ) >> { >> modftdock 32 >> "-modulo" @param_modulo >> "-root" @param_root >> "-static" @param_file_static >> "-mobile" @param_file_mobile >> "-calculate_grid" @arg("grid","2.5") >> "-angle_step" "10" >> "-keep" "10" >> "-noelec"; >> } >> >> string modulus = @arg("m","100"); >> string str_roots[] = readData( @arg( "list" ) ); >> >> foreach str_root in str_roots >> { >> trace( str_root ); >> >> string str_file_static = @strcat( "input/", str_root, ".pdb" ); >> string str_file_mobile = "input/4TRA.pdb"; >> >> file_pdb file_static < single_file_mapper; file = str_file_static >; >> file_pdb file_mobile < single_file_mapper; file = str_file_mobile >; >> file_dat dat_files[] < simple_mapper; >> padding = 3, >> location = "output", >> prefix = @strcat( str_root, "_" ), >> suffix = ".dat" >; >> >> // break docking jobs + do 'em in parallel >> int n = @toint(@arg("n","1")); >> foreach mod_index in [0:n-1] >> { >> string str_modulo = @strcat(mod_index, ":", modulus); >> dat_files[ mod_index ] = do_one_dock( str_root, >> str_modulo, >> file_static, >> file_mobile ); >> } >> } >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -- Justin M Wozniak From hategan at mcs.anl.gov Fri May 20 16:15:39 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 20 May 2011 14:15:39 -0700 Subject: [Swift-devel] recent trunk changes swift parsing In-Reply-To: References: <4DD6CBA2.10407@gmail.com> <1305923488.5757.0.camel@blabla2.none> Message-ID: <1305926139.7346.0.camel@blabla2.none> I'm not following. Are these supposed to fail? On Fri, 2011-05-20 at 15:43 -0500, Justin M Wozniak wrote: > Here are two related cases to consider (one commented out). > > type file; > > app (file o) touch() > { > touch @o; > } > > file t[] ; > > /* > foreach i in t > { > i = touch(); > } > */ > > foreach i in [1:5] > { > t[i] = touch(); > } > > > On Fri, 20 May 2011, Mihael Hategan wrote: > > > Can you post a stack trace or a pointer to the log? > > > > On Fri, 2011-05-20 at 15:14 -0500, ketan wrote: > >> I updated trunk and seems swift parsing has changed a bit? > >> > >> I see this: > >> > >> Swift svn swift-r4502 cog-r3128 (cog modified locally) > >> > >> RunID: 20110520-2005-kst5ztsf > >> Progress: time: Fri, 20 May 2011 20:05:51 +0000 > >> SwiftScript trace: str_roots.[0]:string = 3lyv-4 - Closed > >> Execution failed: > >> For input string: "3.0" > >> > >> > >> for my this swift: > >> > >> type file_pdb; > >> type file_dat; > >> > >> app (file_dat dat_file) > >> do_one_dock( > >> string param_root, > >> string param_modulo, > >> file_pdb param_file_static, > >> file_pdb param_file_mobile ) > >> { > >> modftdock 32 > >> "-modulo" @param_modulo > >> "-root" @param_root > >> "-static" @param_file_static > >> "-mobile" @param_file_mobile > >> "-calculate_grid" @arg("grid","2.5") > >> "-angle_step" "10" > >> "-keep" "10" > >> "-noelec"; > >> } > >> > >> string modulus = @arg("m","100"); > >> string str_roots[] = readData( @arg( "list" ) ); > >> > >> foreach str_root in str_roots > >> { > >> trace( str_root ); > >> > >> string str_file_static = @strcat( "input/", str_root, ".pdb" ); > >> string str_file_mobile = "input/4TRA.pdb"; > >> > >> file_pdb file_static < single_file_mapper; file = str_file_static >; > >> file_pdb file_mobile < single_file_mapper; file = str_file_mobile >; > >> file_dat dat_files[] < simple_mapper; > >> padding = 3, > >> location = "output", > >> prefix = @strcat( str_root, "_" ), > >> suffix = ".dat" >; > >> > >> // break docking jobs + do 'em in parallel > >> int n = @toint(@arg("n","1")); > >> foreach mod_index in [0:n-1] > >> { > >> string str_modulo = @strcat(mod_index, ":", modulus); > >> dat_files[ mod_index ] = do_one_dock( str_root, > >> str_modulo, > >> file_static, > >> file_mobile ); > >> } > >> } > >> > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > From bresnaha at mcs.anl.gov Fri May 20 16:35:34 2011 From: bresnaha at mcs.anl.gov (John Bresnahan) Date: Fri, 20 May 2011 11:35:34 -1000 Subject: [Swift-devel] obtaining FG accounts Message-ID: <4DD6DEA6.2040006@mcs.anl.gov> Hello, I just finished a phone call with some of you who were interested in getting future grid accounts. Here is the process: The best resource for Nimbus clouds right now is FutureGrid (http://www.futuregrid.org). There are 3 sizable Nimbus clouds and soon to be more. You should have no trouble getting an account. Fill out this form to request access: https://portal.futuregrid.org/user/register And once you have an account, check this tutorial on getting started with Nimbus: https://portal.futuregrid.org/tutorials/nimbus John From dsk at ci.uchicago.edu Fri May 20 16:40:50 2011 From: dsk at ci.uchicago.edu (Daniel S. Katz) Date: Fri, 20 May 2011 13:40:50 -0800 Subject: [Swift-devel] an interesting paper Message-ID: <8929B2EC-09D8-410B-BC96-9D662BE48DDC@ci.uchicago.edu> From an IPDPS workshop... -- Daniel S. Katz University of Chicago (773) 834-7186 (voice) (773) 834-6818 (fax) d.katz at ieee.org or dsk at ci.uchicago.edu http://www.ci.uchicago.edu/~dsk/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 4385d228.pdf Type: application/pdf Size: 323928 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketancmaheshwari at gmail.com Fri May 20 17:02:18 2011 From: ketancmaheshwari at gmail.com (ketan) Date: Fri, 20 May 2011 17:02:18 -0500 Subject: [Swift-devel] recent trunk changes swift parsing In-Reply-To: <1305926139.7346.0.camel@blabla2.none> References: <4DD6CBA2.10407@gmail.com> <1305923488.5757.0.camel@blabla2.none> <1305926139.7346.0.camel@blabla2.none> Message-ID: <4DD6E4EA.8080804@gmail.com> Hi, >>>> file_dat dat_files[]< simple_mapper; >>>> padding = 3, >>>> location = "output", >>>> prefix = @strcat( str_root, "_" ), >>>> suffix = ".dat">; >>>> The padding attribute in above simple_mapper seems to be failing. When I wrap 3 in double quotes (padding = "3"), it works. Mike tells me Sarah made recent modifications to simple_mapper code that might have caused it. Ketan From hategan at mcs.anl.gov Fri May 20 17:06:05 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 20 May 2011 15:06:05 -0700 Subject: [Swift-devel] recent trunk changes swift parsing In-Reply-To: <4DD6E4EA.8080804@gmail.com> References: <4DD6CBA2.10407@gmail.com> <1305923488.5757.0.camel@blabla2.none> <1305926139.7346.0.camel@blabla2.none> <4DD6E4EA.8080804@gmail.com> Message-ID: <1305929165.10049.0.camel@blabla2.none> That's great info, but there are few things more specific and useful than a stack trace in this case. Mihael On Fri, 2011-05-20 at 17:02 -0500, ketan wrote: > Hi, > > >>>> file_dat dat_files[]< simple_mapper; > >>>> padding = 3, > >>>> location = "output", > >>>> prefix = @strcat( str_root, "_" ), > >>>> suffix = ".dat">; > >>>> > > The padding attribute in above simple_mapper seems to be failing. When I > wrap 3 in double quotes (padding = "3"), it works. Mike tells me Sarah > made recent modifications to simple_mapper code that might have caused it. > > > Ketan > From hategan at mcs.anl.gov Fri May 20 17:18:00 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 20 May 2011 15:18:00 -0700 Subject: [Swift-devel] recent trunk changes swift parsing In-Reply-To: <1305929165.10049.0.camel@blabla2.none> References: <4DD6CBA2.10407@gmail.com> <1305923488.5757.0.camel@blabla2.none> <1305926139.7346.0.camel@blabla2.none> <4DD6E4EA.8080804@gmail.com> <1305929165.10049.0.camel@blabla2.none> Message-ID: <1305929880.11070.1.camel@blabla2.none> Well, I'd like to commit the fix, but it seems that the SVN server is not recognizing my password. So here's the patch for somebody whose password works: --- mapping/MappingParam.java (revision 4473) +++ mapping/MappingParam.java (working copy) @@ -3,6 +3,7 @@ import java.util.Map; import org.griphyn.vdl.karajan.VDL2FutureException; +import org.griphyn.vdl.type.Types; /** The MappingParam class provides helper methods to deal with * parameters to mappers. The basic usage pattern is to @@ -44,7 +45,12 @@ if (value instanceof DSHandle) { DSHandle handle = (DSHandle) value; checkHandle(handle); - return handle.getValue().toString(); + if (handle.getType().equals(Types.INT)) { + return Integer.valueOf(((Number) handle.getValue()).intValue()); + } + else { + return handle.getValue().toString(); + } } else if (value == null) { if (!defSet) { On Fri, 2011-05-20 at 15:06 -0700, Mihael Hategan wrote: > That's great info, but there are few things more specific and useful > than a stack trace in this case. > > Mihael > > On Fri, 2011-05-20 at 17:02 -0500, ketan wrote: > > Hi, > > > > >>>> file_dat dat_files[]< simple_mapper; > > >>>> padding = 3, > > >>>> location = "output", > > >>>> prefix = @strcat( str_root, "_" ), > > >>>> suffix = ".dat">; > > >>>> > > > > The padding attribute in above simple_mapper seems to be failing. When I > > wrap 3 in double quotes (padding = "3"), it works. Mike tells me Sarah > > made recent modifications to simple_mapper code that might have caused it. > > > > > > Ketan > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From bresnaha at mcs.anl.gov Fri May 20 19:20:48 2011 From: bresnaha at mcs.anl.gov (John Bresnahan) Date: Fri, 20 May 2011 14:20:48 -1000 Subject: [Swift-devel] Getting VMs from FG for use with swift Message-ID: <4DD70560.1020901@mcs.anl.gov> Our phone call today left me motiviated to show you guys how easy it is to get virtual machines for use with swift on FutureGrid. I made some small scripts around the Nimbus tool cloudinitd. The scripts just make installing the software and running it trivial. With a single command you can get N VMs from the FutureGrid Nimbus clouds (N can be on the order of hundreds). When the tool is done it outputs a line separated list of hostnames. All of these hostnames have root access available via your ~/.ssh/id_rsa keys. If/when you have FutureGrid credentials, untar the attachment and give it a try. There are a few minor configurations needed: 1) edit the file env.sh and set your FutureGrid security credentials: % cat env.sh export FUTUREGRID_IAAS_ACCESS_KEY=XXXXXXXXXXXXXXXXXX export FUTUREGRID_IAAS_SECRET_KEY=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX export FUTUREGRID_HOTEL_NODES=2 export FUTUREGRID_SIERRA_NODES=2 You can also change the value '2' to be whatever number of VMs you want. 2) install it on your system. (this single command downloads and installs everything you need under the cwd): % ./install.sh 3) boot the VMs % ./bin/bootit.sh. You will see much status output, but the last several lines will be the hostnames acquired from the cloud. Let me know when you guys are ready to check this out! -------------- next part -------------- A non-text attachment was scrubbed... Name: swift-vm-boot.tar.gz Type: application/x-gzip Size: 36436 bytes Desc: not available URL: From ketancmaheshwari at gmail.com Fri May 20 21:06:22 2011 From: ketancmaheshwari at gmail.com (ketan) Date: Fri, 20 May 2011 21:06:22 -0500 Subject: [Swift-devel] cdm Message-ID: <4DD71E1E.6020406@gmail.com> Hello, Seems CDM with pbs provider is not working in the trunk version. Following is the CDM entry: rule .* DIRECT /lustre/beagle/ketan/labs/modftdock/production/campaign3 The input data resides in a folder called "input" in the above path. I do not get any error message but nothing seems to be executing since empty output files are created. Ketan From yadudoc1729 at gmail.com Sat May 21 02:41:21 2011 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Sat, 21 May 2011 13:11:21 +0530 Subject: [Swift-devel] Setting up swift dev environment in Eclipse In-Reply-To: <1305658333.31768.9.camel@blabla2.none> References: <1305411429.3337.4.camel@blabla2.none> <1305412392.4201.6.camel@blabla2.none> <1305651810.30986.0.camel@blabla2.none> <1305658333.31768.9.camel@blabla2.none> Message-ID: Hi, I've gotten swift running with Eclipse as a front-end debugger. I'm documenting the steps and links which were helpful to me with the hope that, newbies like me will find it useful. Justin has put the info here : https://sites.google.com/site/swiftdevel/internals/debugging I'm keeping everything I've done on a public google doc here: https://docs.google.com/document/d/1z5UvA2yUM_NjaATn-_YzH1EQUBrJvn2dMCT-nWIffLk/edit?hl=en_US -- Thanks and Regards, Yadu Nand B From benc at hawaga.org.uk Sat May 21 03:27:40 2011 From: benc at hawaga.org.uk (Ben Clifford) Date: Sat, 21 May 2011 08:27:40 +0000 (GMT) Subject: [Swift-devel] an interesting paper In-Reply-To: <8929B2EC-09D8-410B-BC96-9D662BE48DDC@ci.uchicago.edu> References: <8929B2EC-09D8-410B-BC96-9D662BE48DDC@ci.uchicago.edu> Message-ID: On Fri, 20 May 2011, Daniel S. Katz wrote: > From an IPDPS workshop... all that fooling around to order things! single-assignment is much better ;) -- From wilde at mcs.anl.gov Sat May 21 06:54:58 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 21 May 2011 06:54:58 -0500 (CDT) Subject: [Swift-devel] Fwd: [hpc-announce] LCPC call for papers - Languages and Compilers for Parallel Computing In-Reply-To: <8A812602-51B0-40AB-8B0C-0CC628D0E327@CS.ColoState.EDU> Message-ID: <990326905.92397.1305978898564.JavaMail.root@zimbra.anl.gov> ----- Forwarded Message ----- From: "Michelle Strout" To: hpc-announce at mcs.anl.gov Sent: Friday, May 20, 2011 10:34:56 AM Subject: [hpc-announce] LCPC call for papers [Apologies if you got multiple copies of this email. If you'd like to opt out of these announcements, information on how to unsubscribe is available at the bottom of this email.] The 24th International Workshop on Languages and Compilers for Parallel Computing Colorado State University, Fort Collins, Colorado September 8-10, 2011 The LCPC workshop (http://lcpc11.cs.colostate.edu/) is a forum for sharing cutting-edge research on all aspects of parallel languages, compilers and related topics including runtime systems and tools. The scope of the workshop spans foundational results and practical experience, and all classes of parallel processors including concurrent, multithreaded, multicore, accelerated, multiprocessor, and tightly-clustered systems. Given the rise of multicore processors, LCPC is particularly interested in work that seeks to transition parallel programming into the computing mainstream. Specific topics of interest include (but are not limited to): ? Parallel programming models and languages ? Compiling for parallelism ? Automatic parallelization ? Analysis, optimization, and verification of parallel programs ? Parallel runtime systems ? Parallel libraries ? Parallel application frameworks ? Performance analysis tools ? Debugging tools for parallel programs ? Parallel algorithms ? Parallel applications ? Concurrent data structures ? Synchronization and concurrency control ? Software engineering for parallel programs ? Fault tolerance for parallel systems ? Adaptive compilation and optimization of parallel programs ? Software techniques for accelerators (including GPGPUs) Papers should report on original research, and should include enough background material to make them accessible to the entire LCPC research community. Papers describing experiences should indicate how they illustrate general principles; papers about parallel programming foundations should indicate how they relate to practice. All submissions must be made electronically through the submission link in the web site. Abstracts must include contact information, the full list of authors and their affiliations, and a summary description (100-300 words) of the anticipated content of the paper. LCPC 2011 papers are limited to 15 pages in the Springer LNCS format. Papers must be submitted in PDF format and must be viewable by Adobe Acrobat Reader. The proceedings will be published by Springer. Authors of accepted papers and posters will be required to sign the Springer copyright form. The submission website will be available by early May. Instructions for preparing papers for the proceedings will be emailed to authors of accepted papers. This year, the Third Annual Workshop on Concurrent Collections will be co-located with LCPC on September 7, 2011. (http://cnc11.rice.edu/) Important Dates: Abstracts: May 27, 2011 Full papers: June 3, 2011 Notification: July 27, 2011 Final paper: Aug 29, 2011 Workshop: Sept 8-10, 2011 ================================== Michelle Mills Strout Associate Professor Colorado State University Computer Science Department 1873 Campus Delivery Fort Collins, CO 80523-1873 (970) 491-4193 mstrout at cs.colostate.edu ================================== ------- The hpc-announce mailing list has been setup to have a common mailing list to share information with respect to upcoming HPC related events. You are included in this mailing list based on your participation or interest in a previous HPC conference or other event. The purpose for providing a single mailing list is to allow participants to easily identify such emails, and if you feel that the number of such emails is too many, possibly even filter them to less-frequently-read folders. However, if you do not wish to receive any emails from hpc-announce, you can unsubscribe from the mailing list (https://lists.mcs.anl.gov/mailman/listinfo/hpc-announce). Once unsubscribed, we guarantee that you will not be added back in through participation in a different HPC related conference or event. You will need to send an email to hpc-announce-owner at mcs.anl.gov to be added back on. -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Sat May 21 06:56:40 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 21 May 2011 06:56:40 -0500 (CDT) Subject: [Swift-devel] Setting up swift dev environment in Eclipse In-Reply-To: Message-ID: <204547436.92399.1305979000028.JavaMail.root@zimbra.anl.gov> Nice progress, and thanks for documenting it, Yadu. We'll give you write permission on the swiftdevel site, so you can help grow the documentation for new (and old ;) Swift developers. Regards, Mike ----- Original Message ----- > Hi, > > I've gotten swift running with Eclipse as a front-end debugger. > I'm documenting the steps and links which were helpful to me > with the hope that, newbies like me will find it useful. > > Justin has put the info here : > https://sites.google.com/site/swiftdevel/internals/debugging > > I'm keeping everything I've done on a public google doc here: > https://docs.google.com/document/d/1z5UvA2yUM_NjaATn-_YzH1EQUBrJvn2dMCT-nWIffLk/edit?hl=en_US > > -- > Thanks and Regards, > Yadu Nand B > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Sat May 21 07:12:12 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 21 May 2011 07:12:12 -0500 (CDT) Subject: [Swift-devel] cdm In-Reply-To: <4DD71E1E.6020406@gmail.com> Message-ID: <359417195.92426.1305979932613.JavaMail.root@zimbra.anl.gov> Ketan, please package up and post a complete example that includes the 5 major files (.swift.=, sites, tc, properties, and fs.data) and show a run on both trunk and 0.92.1, with both log files and the _swiftwrap info log files. Thanks, Mike ----- Original Message ----- > Hello, > > Seems CDM with pbs provider is not working in the trunk version. > Following is the CDM entry: > > rule .* DIRECT > /lustre/beagle/ketan/labs/modftdock/production/campaign3 > > The input data resides in a folder called "input" in the above path. > > I do not get any error message but nothing seems to be executing since > empty output files are created. > > > Ketan > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Sat May 21 07:16:04 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 21 May 2011 07:16:04 -0500 (CDT) Subject: [Swift-devel] Re: Getting VMs from FG for use with swift In-Reply-To: <4DD70560.1020901@mcs.anl.gov> Message-ID: <1003969351.92428.1305980164804.JavaMail.root@zimbra.anl.gov> Thanks, John. Hopefully David's account will get created soon and he can test this. I have a Future Grid account that Ive never used; I need to learn how, and will try to test as well. I encourage the rest of the team to get FG accounts when you have a moment. David, can you post this info on Justin's swiftdevel page for new developers? Thanks, - Mike ----- Original Message ----- > Our phone call today left me motiviated to show you guys how easy it > is to get virtual machines for > use with swift on FutureGrid. > > I made some small scripts around the Nimbus tool cloudinitd. The > scripts just make installing the > software and running it trivial. With a single command you can get N > VMs from the FutureGrid Nimbus > clouds (N can be on the order of hundreds). When the tool is done it > outputs a line separated list > of hostnames. All of these hostnames have root access available via > your ~/.ssh/id_rsa keys. > > If/when you have FutureGrid credentials, untar the attachment and give > it a try. There are a few > minor configurations needed: > > > 1) edit the file env.sh and set your FutureGrid security credentials: > > % cat env.sh > export FUTUREGRID_IAAS_ACCESS_KEY=XXXXXXXXXXXXXXXXXX > export FUTUREGRID_IAAS_SECRET_KEY=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > > export FUTUREGRID_HOTEL_NODES=2 > export FUTUREGRID_SIERRA_NODES=2 > > You can also change the value '2' to be whatever number of VMs you > want. > > > 2) install it on your system. (this single command downloads and > installs everything you need under > the cwd): > > % ./install.sh > > 3) boot the VMs > % ./bin/bootit.sh. > > You will see much status output, but the last several lines will be > the hostnames acquired from the > cloud. > > Let me know when you guys are ready to check this out! -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From yadudoc1729 at gmail.com Sat May 21 08:04:32 2011 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Sat, 21 May 2011 18:34:32 +0530 Subject: [Swift-devel] Setting up swift dev environment in Eclipse In-Reply-To: <204547436.92399.1305979000028.JavaMail.root@zimbra.anl.gov> References: <204547436.92399.1305979000028.JavaMail.root@zimbra.anl.gov> Message-ID: Thanks Michael. The write permission to the swiftdevel site will be great. On Sat, May 21, 2011 at 5:26 PM, Michael Wilde wrote: > Nice progress, and thanks for documenting it, Yadu. ?We'll give you write permission on the swiftdevel site, so you can help grow the documentation for new (and old ;) Swift developers. > > Regards, > > Mike > > > ----- Original Message ----- >> Hi, >> >> I've gotten swift running with Eclipse as a front-end debugger. >> I'm documenting the steps and links which were helpful to me >> with the hope that, newbies like me will find it useful. >> >> Justin has put the info here : >> https://sites.google.com/site/swiftdevel/internals/debugging >> >> I'm keeping everything I've done on a public google doc here: >> https://docs.google.com/document/d/1z5UvA2yUM_NjaATn-_YzH1EQUBrJvn2dMCT-nWIffLk/edit?hl=en_US >> >> -- >> Thanks and Regards, >> Yadu Nand B >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > -- Thanks and Regards, Yadu Nand B From ketan at mcs.anl.gov Sat May 21 11:22:50 2011 From: ketan at mcs.anl.gov (ketan) Date: Sat, 21 May 2011 11:22:50 -0500 Subject: [Swift-devel] cdm In-Reply-To: <359417195.92426.1305979932613.JavaMail.root@zimbra.anl.gov> References: <359417195.92426.1305979932613.JavaMail.root@zimbra.anl.gov> Message-ID: <4DD7E6DA.5070005@mcs.anl.gov> Attached are the two tgz packages with swift, config, site, cdm and log files corresponding to trunk and 0.92.1 swift on tested on Beagle for cdm. cdm did not seem to work in either case. Ketan On 5/21/11 7:12 AM, Michael Wilde wrote: > Ketan, please package up and post a complete example that includes the 5 major files (.swift.=, sites, tc, properties, and fs.data) and show a run on both trunk and 0.92.1, with both log files and the _swiftwrap info log files. > > Thanks, > > Mike > > > ----- Original Message ----- >> Hello, >> >> Seems CDM with pbs provider is not working in the trunk version. >> Following is the CDM entry: >> >> rule .* DIRECT >> /lustre/beagle/ketan/labs/modftdock/production/campaign3 >> >> The input data resides in a folder called "input" in the above path. >> >> I do not get any error message but nothing seems to be executing since >> empty output files are created. >> >> >> Ketan >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -------------- next part -------------- A non-text attachment was scrubbed... Name: trunkpack.tgz Type: application/x-gzip Size: 296960 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 0.92.1-pack.tgz Type: application/x-gzip Size: 51200 bytes Desc: not available URL: From ketancmaheshwari at gmail.com Sat May 21 11:33:38 2011 From: ketancmaheshwari at gmail.com (ketan) Date: Sat, 21 May 2011 11:33:38 -0500 Subject: [Swift-devel] cdm In-Reply-To: <359417195.92426.1305979932613.JavaMail.root@zimbra.anl.gov> References: <359417195.92426.1305979932613.JavaMail.root@zimbra.anl.gov> Message-ID: <4DD7E962.2060701@gmail.com> Please find below the links to two tgz packages with swift, config, site, cdm and log files corresponding to trunk and 0.92.1 swift on tested on Beagle for cdm. cdm did not seem to work in either case. http://www.ci.uchicago.edu/~ketan/files/0.92.1-pack.tgz http://www.ci.uchicago.edu/~ketan/files/trunkpack.tgz Ketan On 5/21/11 7:12 AM, Michael Wilde wrote: > Ketan, please package up and post a complete example that includes the 5 major files (.swift.=, sites, tc, properties, and fs.data) and show a run on both trunk and 0.92.1, with both log files and the _swiftwrap info log files. > > Thanks, > > Mike > > > ----- Original Message ----- >> Hello, >> >> Seems CDM with pbs provider is not working in the trunk version. >> Following is the CDM entry: >> >> rule .* DIRECT >> /lustre/beagle/ketan/labs/modftdock/production/campaign3 >> >> The input data resides in a folder called "input" in the above path. >> >> I do not get any error message but nothing seems to be executing since >> empty output files are created. >> >> >> Ketan >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From hockyg at uchicago.edu Sat May 21 11:46:06 2011 From: hockyg at uchicago.edu (Glen Hocky) Date: Sat, 21 May 2011 12:46:06 -0400 Subject: [Swift-devel] recent error on beagle Message-ID: Does anyone know what this error means? It just started happening on beagle + coasters Swift log attached Swift version is Mike's: /home/wilde/swift/rev/swift-r4143+cog-r3056+pbscoast/bin/swift Thanks Glen Progress: Active:48 Progress: Active:48 queuedsize > 0 but no job dequeued. Queued: {} java.lang.Throwable at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:252) at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:520) at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109) queuedsize > 0 but no job dequeued. Queued: {} java.lang.Throwable at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:252) at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:520) at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109) Progress: Active:47 Failed but can retry:1 Shutting down worker Shutting down worker Shutting down worker -------------- next part -------------- A non-text attachment was scrubbed... Name: glassRunSizeSwitching-20110521-0832-zdruneuf.log.gz Type: application/x-gzip Size: 56469 bytes Desc: not available URL: From wilde at mcs.anl.gov Sat May 21 12:40:27 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 21 May 2011 12:40:27 -0500 (CDT) Subject: [Swift-devel] recent error on beagle In-Reply-To: Message-ID: <1763464705.92855.1305999627663.JavaMail.root@zimbra.anl.gov> Glen, I dont recognize it - need to dig deeper. Where can we find all the files (.swift, tc, sites, properties) and command line? You should try module load swift - thats the official 0.92.1 for Beagle. (But its possible the same as you are running here). When did this last work for you? Anything change in sites params? - Mike ----- Original Message ----- > Does anyone know what this error means? It just started happening on > beagle + coasters > Swift log attached > > Swift version is Mike's: > /home/wilde/swift/rev/swift-r4143+cog-r3056+pbscoast/bin/swift > > Thanks > Glen > > Progress: Active:48 > Progress: Active:48 > queuedsize > 0 but no job dequeued. Queued: {} > java.lang.Throwable > at > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:252) > at > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:520) > at > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109) > queuedsize > 0 but no job dequeued. Queued: {} > java.lang.Throwable > at > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:252) > at > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:520) > at > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109) > Progress: Active:47 Failed but can retry:1 > Shutting down worker > > Shutting down worker > > Shutting down worker > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hockyg at uchicago.edu Sat May 21 12:45:25 2011 From: hockyg at uchicago.edu (Glen Hocky) Date: Sat, 21 May 2011 13:45:25 -0400 Subject: [Swift-devel] recent error on beagle In-Reply-To: <1763464705.92855.1305999627663.JavaMail.root@zimbra.anl.gov> References: <1763464705.92855.1305999627663.JavaMail.root@zimbra.anl.gov> Message-ID: Thanks for looking, nothing has changed except some of the actual python script that is running, but that appears to be working b/c the simulation is generating output files. All run stuff for this job are /home/hockyg/reichman/glassy_dynamics/code/swift_pySimulate3/run/final/glassRunSizeSwitching-20110521-0832-zdruneuf.* all files like tc and sites can be found with ls /home/hockyg/reichman/glassy_dynamics/code/swift_pySimulate3/run/final/*20110521-0832* On Sat, May 21, 2011 at 1:40 PM, Michael Wilde wrote: > Glen, I dont recognize it - need to dig deeper. > Where can we find all the files (.swift, tc, sites, properties) and command line? > > You should try module load swift - thats the official 0.92.1 for Beagle. > (But its possible the same as you are running here). > > When did this last work for you? ?Anything change in sites params? > > - Mike > > ----- Original Message ----- >> Does anyone know what this error means? It just started happening on >> beagle + coasters >> Swift log attached >> >> Swift version is Mike's: >> /home/wilde/swift/rev/swift-r4143+cog-r3056+pbscoast/bin/swift >> >> Thanks >> Glen >> >> Progress: Active:48 >> Progress: Active:48 >> queuedsize > 0 but no job dequeued. Queued: {} >> java.lang.Throwable >> at >> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:252) >> at >> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:520) >> at >> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109) >> queuedsize > 0 but no job dequeued. Queued: {} >> java.lang.Throwable >> at >> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:252) >> at >> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:520) >> at >> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109) >> Progress: Active:47 Failed but can retry:1 >> Shutting down worker >> >> Shutting down worker >> >> Shutting down worker >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > From wilde at mcs.anl.gov Sat May 21 14:40:45 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 21 May 2011 14:40:45 -0500 Subject: [Swift-devel] Re: Communicado is crashed - please restore In-Reply-To: References: <593670167.92727.1305991559114.JavaMail.root@zimbra.anl.gov> Message-ID: The machine has "only" 14GB ram, so 16GB heap might thrash it On 5/21/11, Allan Espinosa wrote: > i'm running Swift with a java heap size of 16GB. Maybe you can start > diagnosing the crash from one of my processes. > > -Allan > > 2011/5/21 Michael Wilde : >> Hi CI Team, >> >> Communicado is not responding to ssh. >> >> Please restore service to it. >> >> As its crashing/hanging rather frequently, please help us diagnose the >> cause. >> >> Thanks, >> >> Mike >> >> > > > > -- > Allan M. Espinosa > PhD student, Computer Science > University of Chicago > -- Sent from my mobile device From hategan at mcs.anl.gov Sat May 21 16:02:39 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 21 May 2011 14:02:39 -0700 Subject: [Swift-devel] recent error on beagle In-Reply-To: References: Message-ID: <1306011759.29178.4.camel@blabla2.none> On Sat, 2011-05-21 at 12:46 -0400, Glen Hocky wrote: > Does anyone know what this error means? It just started happening on > queuedsize > 0 but no job dequeued. Queued: {} > java.lang.Throwable > at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:252) > at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:520) > at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109) This seems to be the same error that Sheri was seeing. I committed a fix to trunk. The issue is that the account of running jobs doesn't take into consideration the passing of time, whereas the account of the allocated blocks does. As time goes by things may get to a state where there appear to be more running jobs than possible. This can, however, also be triggered if for some reason the number of workers ends up being larger than what the service thinks it's starting. I suspect that in Sheri's run(s) that might also be the case. Could you let me know if you are running with the stable branch? If that's the case I will port the fix there too. Mihael From wilde at mcs.anl.gov Sat May 21 17:12:03 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 21 May 2011 17:12:03 -0500 (CDT) Subject: [Swift-devel] recent error on beagle In-Reply-To: <1306012169.7746.0.camel@blabla2.none> Message-ID: <2008592710.93053.1306015923349.JavaMail.root@zimbra.anl.gov> ----- Original Message ----- > On Sat, 2011-05-21 at 17:06 -0400, Glen Hocky wrote: > > as I mentioned, I've been running with Mike's swift which was > > patched > > for beagle. are all the things that make running on beagle work in > > trunk? > > No idea. > > Mike? Justin, working with Ketan, just applied changes to trunk which should make it work now on Beagle (or any Cray XT5+ or XE). This uses a different set of sites.xml tags than the prototype in the current Beagle swift 0.92.1 module. Justin has a note on this at: https://sites.google.com/site/swiftdevel/sites/pbs/cray It was working before for one-node worker jobs; now it should work for multi-node worker jobs as well. Justin and Ketan should comment on the state of testing and readiness of this trunk feature. Don't try trunk on Beagle till they give the go-ahead. - Mike > > If so i'll update to the latest and test. I don't think I'm > > using stable... > > Ok > > Mihael -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Sat May 21 17:13:41 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 21 May 2011 17:13:41 -0500 (CDT) Subject: [Swift-devel] recent error on beagle In-Reply-To: <2008592710.93053.1306015923349.JavaMail.root@zimbra.anl.gov> Message-ID: <274923083.93055.1306016021407.JavaMail.root@zimbra.anl.gov> er, I think the Cray guide below needs to get copied to the cookbook or user guide - the devel site may not be publicly readable. - Mike ----- Original Message ----- > ----- Original Message ----- > > On Sat, 2011-05-21 at 17:06 -0400, Glen Hocky wrote: > > > as I mentioned, I've been running with Mike's swift which was > > > patched > > > for beagle. are all the things that make running on beagle work in > > > trunk? > > > > No idea. > > > > Mike? > > Justin, working with Ketan, just applied changes to trunk which should > make it work now on Beagle (or any Cray XT5+ or XE). This uses a > different set of sites.xml tags than the prototype in the current > Beagle swift 0.92.1 module. Justin has a note on this at: > https://sites.google.com/site/swiftdevel/sites/pbs/cray > > It was working before for one-node worker jobs; now it should work for > multi-node worker jobs as well. > > Justin and Ketan should comment on the state of testing and readiness > of this trunk feature. Don't try trunk on Beagle till they give the > go-ahead. > > - Mike > > > > If so i'll update to the latest and test. I don't think I'm > > > using stable... > > > > Ok > > > > Mihael > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From ketancmaheshwari at gmail.com Sun May 22 08:39:15 2011 From: ketancmaheshwari at gmail.com (ketan) Date: Sun, 22 May 2011 08:39:15 -0500 Subject: [Swift-devel] recent error on beagle In-Reply-To: <2008592710.93053.1306015923349.JavaMail.root@zimbra.anl.gov> References: <2008592710.93053.1306015923349.JavaMail.root@zimbra.anl.gov> Message-ID: <4DD91203.6090000@gmail.com> I can confirm that the trunk is not usable for pbs provider. I am using trunk for submitting jobs on beagle and I see a few unexpected things: 1. The stderr is showing inconsistent messages: The results are getting written to the output even though stderr doesn't report any. 2. qsub jobs being cancelled inadvertantly: I submitted 40 of them yesterday, however, only 2 survived today. The log is here: http://www.ci.uchicago.edu/~ketan/files/ftdock-20110521-0337-pokpgg89.log In addition, the ssh-pbs provider does not seem to be working for large runs (it worked for a small number of test runs): Getting unexpected stdouts. Following is the stdout: http://www.ci.uchicago.edu/~ketan/files/ssh-pbs.stdout Following is the log file for the above run: http://www.ci.uchicago.edu/~ketan/files/ftdock-20110521-1750-b0cot9sa.log Ketan On 5/21/11 5:12 PM, Michael Wilde wrote: > > ----- Original Message ----- >> On Sat, 2011-05-21 at 17:06 -0400, Glen Hocky wrote: >>> as I mentioned, I've been running with Mike's swift which was >>> patched >>> for beagle. are all the things that make running on beagle work in >>> trunk? >> No idea. >> >> Mike? > Justin, working with Ketan, just applied changes to trunk which should make it work now on Beagle (or any Cray XT5+ or XE). This uses a different set of sites.xml tags than the prototype in the current Beagle swift 0.92.1 module. Justin has a note on this at: > https://sites.google.com/site/swiftdevel/sites/pbs/cray > > It was working before for one-node worker jobs; now it should work for multi-node worker jobs as well. > > Justin and Ketan should comment on the state of testing and readiness of this trunk feature. Don't try trunk on Beagle till they give the go-ahead. > > - Mike > >>> If so i'll update to the latest and test. I don't think I'm >>> using stable... >> Ok >> >> Mihael From hategan at mcs.anl.gov Sun May 22 13:51:39 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 22 May 2011 11:51:39 -0700 Subject: [Swift-devel] recent error on beagle In-Reply-To: <4DD91203.6090000@gmail.com> References: <2008592710.93053.1306015923349.JavaMail.root@zimbra.anl.gov> <4DD91203.6090000@gmail.com> Message-ID: <1306090299.2956.1.camel@blabla2.none> The second one looks to me like a coaster problem. Can't say much about the first issue. Can you try with plain pbs if you want to test the pbs provider? Mihael On Sun, 2011-05-22 at 08:39 -0500, ketan wrote: > I can confirm that the trunk is not usable for pbs provider. I am using > trunk for submitting jobs on beagle and I see a few unexpected things: > > 1. The stderr is showing inconsistent messages: The results are getting > written to the output even though stderr doesn't report any. > 2. qsub jobs being cancelled inadvertantly: I submitted 40 of them > yesterday, however, only 2 survived today. The log is here: > > http://www.ci.uchicago.edu/~ketan/files/ftdock-20110521-0337-pokpgg89.log > > In addition, the ssh-pbs provider does not seem to be working for large > runs (it worked for a small number of test runs): Getting unexpected > stdouts. Following is the stdout: > > http://www.ci.uchicago.edu/~ketan/files/ssh-pbs.stdout > > Following is the log file for the above run: > > http://www.ci.uchicago.edu/~ketan/files/ftdock-20110521-1750-b0cot9sa.log > > > Ketan > > On 5/21/11 5:12 PM, Michael Wilde wrote: > > > > ----- Original Message ----- > >> On Sat, 2011-05-21 at 17:06 -0400, Glen Hocky wrote: > >>> as I mentioned, I've been running with Mike's swift which was > >>> patched > >>> for beagle. are all the things that make running on beagle work in > >>> trunk? > >> No idea. > >> > >> Mike? > > Justin, working with Ketan, just applied changes to trunk which should make it work now on Beagle (or any Cray XT5+ or XE). This uses a different set of sites.xml tags than the prototype in the current Beagle swift 0.92.1 module. Justin has a note on this at: > > https://sites.google.com/site/swiftdevel/sites/pbs/cray > > > > It was working before for one-node worker jobs; now it should work for multi-node worker jobs as well. > > > > Justin and Ketan should comment on the state of testing and readiness of this trunk feature. Don't try trunk on Beagle till they give the go-ahead. > > > > - Mike > > > >>> If so i'll update to the latest and test. I don't think I'm > >>> using stable... > >> Ok > >> > >> Mihael > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From davidkelly999 at gmail.com Mon May 23 10:24:20 2011 From: davidkelly999 at gmail.com (David Kelly) Date: Mon, 23 May 2011 10:24:20 -0500 Subject: [Swift-devel] Re: Getting VMs from FG for use with swift In-Reply-To: <4DD70560.1020901@mcs.anl.gov> References: <4DD70560.1020901@mcs.anl.gov> Message-ID: Hi John, I now have a futuregrid account and am added to a project. I am now trying to get our scripts working together. I ran into a few problems at first when trying to run the futuregrid scripts. On the first system I tried I was getting a traceback. It is possible that the system I was using has older versions of some of the needed libraries. Then I tried it on a more system that is more frequently updated - my laptop running Ubuntu 10.10. It needed a newer version of the Python crypto tools installed, so I installed that (and the python development libraries) and that part seems fine now. I am now up to the point of the install script where it is trying to register keys, but it is failing. My guess is that I need to change FUTUREGRID_IAAS_ACCESS_KEY and FUTUREGRID_IAAS_SECRET_KEY in env.sh. I'm not sure what these should be exactly. Are these the contents of my ssh keys, an ssh key and a passphrase, or some other type of security? I've tried a few combinations of different things but haven't had much luck yet. Thanks! Regards, David Traceback from earlier: Installing setuptools.......................done. Complete output from command /autonfs/home/davidk/swift-vm-...ython /autonfs/home/davidk/swift-vm-...stall pip: Searching for pip Reading http://pypi.python.org/simple/pip/ Reading http://pip.openplans.org Reading http://www.pip-installer.org Best match: pip 1.0.1 Downloading http://pypi.python.org/packages/source/p/pip/pip-1.0.1.tar.gz#md5=28dcc70225e5bf925532abc5b087a94b Processing pip-1.0.1.tar.gz Running pip-1.0.1/setup.py -q bdist_egg --dist-dir /tmp/easy_install-GHsjHX/pip-1.0.1/egg-dist-tmp-rXjQ7L Traceback (most recent call last): File "/autonfs/home/davidk/swift-vm-boot/ve/bin/easy_install", line 8, in load_entry_point('setuptools==0.6c11', 'console_scripts', 'easy_install')() File "/autonfs/home/davidk/swift-vm-boot/ve/lib/python2.6/site-packages/setuptools-0.6c11-py2.6.egg/setuptools/command/easy_install.py", line 1712, in main File "/autonfs/home/davidk/swift-vm-boot/ve/lib/python2.6/site-packages/setuptools-0.6c11-py2.6.egg/setuptools/command/easy_install.py", line 1700, in with_ei_usage File "/autonfs/home/davidk/swift-vm-boot/ve/lib/python2.6/site-packages/setuptools-0.6c11-py2.6.egg/setuptools/command/easy_install.py", line 1716, in File "/soft/python-2.6.1-r1/lib/python2.6/distutils/core.py", line 152, in setup dist.run_commands() File "/soft/python-2.6.1-r1/lib/python2.6/distutils/dist.py", line 975, in run_commands self.run_command(cmd) File "/soft/python-2.6.1-r1/lib/python2.6/distutils/dist.py", line 995, in run_command cmd_obj.run() File "/autonfs/home/davidk/swift-vm-boot/ve/lib/python2.6/site-packages/setuptools-0.6c11-py2.6.egg/setuptools/command/easy_install.py", line 211, in run File "/autonfs/home/davidk/swift-vm-boot/ve/lib/python2.6/site-packages/setuptools-0.6c11-py2.6.egg/setuptools/command/easy_install.py", line 446, in easy_install File "/autonfs/home/davidk/swift-vm-boot/ve/lib/python2.6/site-packages/setuptools-0.6c11-py2.6.egg/setuptools/command/easy_install.py", line 476, in install_item File "/autonfs/home/davidk/swift-vm-boot/ve/lib/python2.6/site-packages/setuptools-0.6c11-py2.6.egg/setuptools/command/easy_install.py", line 655, in install_eggs File "/autonfs/home/davidk/swift-vm-boot/ve/lib/python2.6/site-packages/setuptools-0.6c11-py2.6.egg/setuptools/command/easy_install.py", line 930, in build_and_install File "/autonfs/home/davidk/swift-vm-boot/ve/lib/python2.6/site-packages/setuptools-0.6c11-py2.6.egg/setuptools/command/easy_install.py", line 919, in run_setup File "/autonfs/home/davidk/swift-vm-boot/ve/lib/python2.6/site-packages/setuptools-0.6c11-py2.6.egg/setuptools/sandbox.py", line 52, in run_setup AttributeError: 'module' object has no attribute '__getstate__' ---------------------------------------- Traceback (most recent call last): File "bin/virtualenv.py", line 1647, in main() File "bin/virtualenv.py", line 558, in main prompt=options.prompt) File "bin/virtualenv.py", line 656, in create_environment install_pip(py_executable) File "bin/virtualenv.py", line 415, in install_pip filter_stdout=_filter_setup) File "bin/virtualenv.py", line 624, in call_subprocess % (cmd_desc, proc.returncode)) OSError: Command /autonfs/home/davidk/swift-vm-...ython /autonfs/home/davidk/swift-vm-...stall pip failed with error code 1 Failed to created the needed python virtual environment On Fri, May 20, 2011 at 7:20 PM, John Bresnahan wrote: > Our phone call today left me motiviated to show you guys how easy it is to > get virtual machines for use with swift on FutureGrid. > > I made some small scripts around the Nimbus tool cloudinitd. The scripts > just make installing the software and running it trivial. With a single > command you can get N VMs from the FutureGrid Nimbus clouds (N can be on the > order of hundreds). When the tool is done it outputs a line separated list > of hostnames. All of these hostnames have root access available via your > ~/.ssh/id_rsa keys. > > If/when you have FutureGrid credentials, untar the attachment and give it a > try. There are a few minor configurations needed: > > > 1) edit the file env.sh and set your FutureGrid security credentials: > > % cat env.sh > export FUTUREGRID_IAAS_ACCESS_KEY=XXXXXXXXXXXXXXXXXX > export FUTUREGRID_IAAS_SECRET_KEY=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > > export FUTUREGRID_HOTEL_NODES=2 > export FUTUREGRID_SIERRA_NODES=2 > > You can also change the value '2' to be whatever number of VMs you want. > > > 2) install it on your system. (this single command downloads and installs > everything you need under the cwd): > > % ./install.sh > > 3) boot the VMs > % ./bin/bootit.sh. > > You will see much status output, but the last several lines will be the > hostnames acquired from the cloud. > > Let me know when you guys are ready to check this out! > -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidkelly999 at gmail.com Tue May 24 10:21:22 2011 From: davidkelly999 at gmail.com (David Kelly) Date: Tue, 24 May 2011 10:21:22 -0500 Subject: [Swift-devel] coaster-service -passive null pointer exception Message-ID: Hello, When trying to run coaster-service in passive mode, I get a null pointer exception when starting worker nodes. This seems to happen when the worker node tries to register with the coaster service. I get: Failed to process data: Failed to register (service returned error: java.lang.NullPointerException) at /home/davidk/work/worker.pl line 762. The exception gets thrown from RegistrationHandler.java. Running a swift script before starting the workers seems to be the workaround.. but I think with the -passive option we shouldn't need to do that anymore, is that correct? Thanks, David -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Tue May 24 12:47:58 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 24 May 2011 10:47:58 -0700 Subject: [Swift-devel] Re: coaster-service -passive null pointer exception In-Reply-To: References: Message-ID: <1306259278.31714.2.camel@blabla2.none> Do you have logs or exact stack trace for this? On Tue, 2011-05-24 at 10:21 -0500, David Kelly wrote: > Hello, > > When trying to run coaster-service in passive mode, I get a null > pointer exception when starting worker nodes. This seems to happen > when the worker node tries to register with the coaster service. I > get: > > Failed to process data: Failed to register (service returned error: > java.lang.NullPointerException) at /home/davidk/work/worker.pl line > 762. > > The exception gets thrown from RegistrationHandler.java. Running a > swift script before starting the workers seems to be the workaround.. > but I think with the -passive option we shouldn't need to do that > anymore, is that correct? > > Thanks, > David > From davidkelly999 at gmail.com Tue May 24 13:44:39 2011 From: davidkelly999 at gmail.com (David Kelly) Date: Tue, 24 May 2011 13:44:39 -0500 Subject: [Swift-devel] Re: coaster-service -passive null pointer exception In-Reply-To: <1306259278.31714.2.camel@blabla2.none> References: <1306259278.31714.2.camel@blabla2.none> Message-ID: Attached is the coaster.log that is being generated, but it doesn't contain much information. But where it was throwing the exception, I called getStrackTrace() to get more info and saved it. Hope this helps: org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.nextId(JobQueue.java:122) org.globus.cog.abstraction.coaster.service.LocalTCPService.registrationReceived(LocalTCPService.java:58) org.globus.cog.abstraction.coaster.service.local.RegistrationHandler.requestComplete(RegistrationHandler.java:41) org.globus.cog.karajan.workflow.service.handlers.RequestHandler.receiveCompleted(RequestHandler.java:84) org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleRequest(AbstractKarajanChannel.java:390) org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel.step(AbstractStreamKarajanChannel.java:158) org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Multiplexer.run(AbstractStreamKarajanChannel.java:377) David On Tue, May 24, 2011 at 12:47 PM, Mihael Hategan wrote: > Do you have logs or exact stack trace for this? > > On Tue, 2011-05-24 at 10:21 -0500, David Kelly wrote: > > Hello, > > > > When trying to run coaster-service in passive mode, I get a null > > pointer exception when starting worker nodes. This seems to happen > > when the worker node tries to register with the coaster service. I > > get: > > > > Failed to process data: Failed to register (service returned error: > > java.lang.NullPointerException) at /home/davidk/work/worker.pl line > > 762. > > > > The exception gets thrown from RegistrationHandler.java. Running a > > swift script before starting the workers seems to be the workaround.. > > but I think with the -passive option we shouldn't need to do that > > anymore, is that correct? > > > > Thanks, > > David > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: coaster.log Type: text/x-log Size: 2432 bytes Desc: not available URL: From hategan at mcs.anl.gov Tue May 24 13:57:35 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 24 May 2011 11:57:35 -0700 Subject: [Swift-devel] Re: coaster-service -passive null pointer exception In-Reply-To: References: <1306259278.31714.2.camel@blabla2.none> Message-ID: <1306263455.523.2.camel@blabla2.none> I committed a tentative fix to trunk. I won't swear by it, but please give it a try and let me know how it works. Mihael On Tue, 2011-05-24 at 13:44 -0500, David Kelly wrote: > Attached is the coaster.log that is being generated, but it doesn't > contain much information. > > But where it was throwing the exception, I called getStrackTrace() to > get more info and saved it. Hope this helps: > > org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.nextId(JobQueue.java:122) > org.globus.cog.abstraction.coaster.service.LocalTCPService.registrationReceived(LocalTCPService.java:58) > org.globus.cog.abstraction.coaster.service.local.RegistrationHandler.requestComplete(RegistrationHandler.java:41) > org.globus.cog.karajan.workflow.service.handlers.RequestHandler.receiveCompleted(RequestHandler.java:84) > org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleRequest(AbstractKarajanChannel.java:390) > org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel.step(AbstractStreamKarajanChannel.java:158) > org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Multiplexer.run(AbstractStreamKarajanChannel.java:377) > > David > > On Tue, May 24, 2011 at 12:47 PM, Mihael Hategan > wrote: > Do you have logs or exact stack trace for this? > > > On Tue, 2011-05-24 at 10:21 -0500, David Kelly wrote: > > Hello, > > > > When trying to run coaster-service in passive mode, I get a > null > > pointer exception when starting worker nodes. This seems to > happen > > when the worker node tries to register with the coaster > service. I > > get: > > > > Failed to process data: Failed to register (service returned > error: > > java.lang.NullPointerException) > at /home/davidk/work/worker.pl line > > 762. > > > > The exception gets thrown from RegistrationHandler.java. > Running a > > swift script before starting the workers seems to be the > workaround.. > > but I think with the -passive option we shouldn't need to do > that > > anymore, is that correct? > > > > Thanks, > > David > > > > > > From davidkelly999 at gmail.com Tue May 24 14:15:15 2011 From: davidkelly999 at gmail.com (David Kelly) Date: Tue, 24 May 2011 14:15:15 -0500 Subject: [Swift-devel] Re: coaster-service -passive null pointer exception In-Reply-To: <1306263455.523.2.camel@blabla2.none> References: <1306259278.31714.2.camel@blabla2.none> <1306263455.523.2.camel@blabla2.none> Message-ID: Thanks. I gave it a try and this is what I am seeing now: Failed to connect: Connection refused at /home/davidk/work/worker.pl line 300. from coaster.log: Error starting coaster service: null Error starting coaster service java.lang.NullPointerException at org.globus.cog.abstraction.coaster.service.CoasterPersistentService.main(CoasterPersistentService.java:145) David On Tue, May 24, 2011 at 1:57 PM, Mihael Hategan wrote: > I committed a tentative fix to trunk. I won't swear by it, but please > give it a try and let me know how it works. > > Mihael > > On Tue, 2011-05-24 at 13:44 -0500, David Kelly wrote: > > Attached is the coaster.log that is being generated, but it doesn't > > contain much information. > > > > But where it was throwing the exception, I called getStrackTrace() to > > get more info and saved it. Hope this helps: > > > > > org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.nextId(JobQueue.java:122) > > > org.globus.cog.abstraction.coaster.service.LocalTCPService.registrationReceived(LocalTCPService.java:58) > > > org.globus.cog.abstraction.coaster.service.local.RegistrationHandler.requestComplete(RegistrationHandler.java:41) > > > org.globus.cog.karajan.workflow.service.handlers.RequestHandler.receiveCompleted(RequestHandler.java:84) > > > org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleRequest(AbstractKarajanChannel.java:390) > > > org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel.step(AbstractStreamKarajanChannel.java:158) > > > org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Multiplexer.run(AbstractStreamKarajanChannel.java:377) > > > > David > > > > On Tue, May 24, 2011 at 12:47 PM, Mihael Hategan > > wrote: > > Do you have logs or exact stack trace for this? > > > > > > On Tue, 2011-05-24 at 10:21 -0500, David Kelly wrote: > > > Hello, > > > > > > When trying to run coaster-service in passive mode, I get a > > null > > > pointer exception when starting worker nodes. This seems to > > happen > > > when the worker node tries to register with the coaster > > service. I > > > get: > > > > > > Failed to process data: Failed to register (service returned > > error: > > > java.lang.NullPointerException) > > at /home/davidk/work/worker.pl line > > > 762. > > > > > > The exception gets thrown from RegistrationHandler.java. > > Running a > > > swift script before starting the workers seems to be the > > workaround.. > > > but I think with the -passive option we shouldn't need to do > > that > > > anymore, is that correct? > > > > > > Thanks, > > > David > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Tue May 24 14:27:39 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 24 May 2011 12:27:39 -0700 Subject: [Swift-devel] Re: coaster-service -passive null pointer exception In-Reply-To: References: <1306259278.31714.2.camel@blabla2.none> <1306263455.523.2.camel@blabla2.none> Message-ID: <1306265259.1890.1.camel@blabla2.none> Sorry about that. Attempt #2 is in svn. Mihael On Tue, 2011-05-24 at 14:15 -0500, David Kelly wrote: > Thanks. I gave it a try and this is what I am seeing now: > > Failed to connect: Connection refused at /home/davidk/work/worker.pl > line 300. > > from coaster.log: > Error starting coaster service: null > Error starting coaster service > java.lang.NullPointerException > at > org.globus.cog.abstraction.coaster.service.CoasterPersistentService.main(CoasterPersistentService.java:145) > > David > > On Tue, May 24, 2011 at 1:57 PM, Mihael Hategan > wrote: > I committed a tentative fix to trunk. I won't swear by it, but > please > give it a try and let me know how it works. > > Mihael > > > On Tue, 2011-05-24 at 13:44 -0500, David Kelly wrote: > > Attached is the coaster.log that is being generated, but it > doesn't > > contain much information. > > > > But where it was throwing the exception, I called > getStrackTrace() to > > get more info and saved it. Hope this helps: > > > > > org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.nextId(JobQueue.java:122) > > > org.globus.cog.abstraction.coaster.service.LocalTCPService.registrationReceived(LocalTCPService.java:58) > > > org.globus.cog.abstraction.coaster.service.local.RegistrationHandler.requestComplete(RegistrationHandler.java:41) > > > org.globus.cog.karajan.workflow.service.handlers.RequestHandler.receiveCompleted(RequestHandler.java:84) > > > org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleRequest(AbstractKarajanChannel.java:390) > > > org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel.step(AbstractStreamKarajanChannel.java:158) > > > org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Multiplexer.run(AbstractStreamKarajanChannel.java:377) > > > > David > > > > On Tue, May 24, 2011 at 12:47 PM, Mihael Hategan > > > wrote: > > Do you have logs or exact stack trace for this? > > > > > > On Tue, 2011-05-24 at 10:21 -0500, David Kelly > wrote: > > > Hello, > > > > > > When trying to run coaster-service in passive > mode, I get a > > null > > > pointer exception when starting worker nodes. This > seems to > > happen > > > when the worker node tries to register with the > coaster > > service. I > > > get: > > > > > > Failed to process data: Failed to register > (service returned > > error: > > > java.lang.NullPointerException) > > at /home/davidk/work/worker.pl line > > > 762. > > > > > > The exception gets thrown from > RegistrationHandler.java. > > Running a > > > swift script before starting the workers seems to > be the > > workaround.. > > > but I think with the -passive option we shouldn't > need to do > > that > > > anymore, is that correct? > > > > > > Thanks, > > > David > > > > > > > > > > > > > > > From davidkelly999 at gmail.com Tue May 24 14:34:30 2011 From: davidkelly999 at gmail.com (David Kelly) Date: Tue, 24 May 2011 14:34:30 -0500 Subject: [Swift-devel] Re: coaster-service -passive null pointer exception In-Reply-To: <1306265259.1890.1.camel@blabla2.none> References: <1306259278.31714.2.camel@blabla2.none> <1306263455.523.2.camel@blabla2.none> <1306265259.1890.1.camel@blabla2.none> Message-ID: That seems to have done the trick. The workers have started and I have run a few scripts now. Thank you! David On Tue, May 24, 2011 at 2:27 PM, Mihael Hategan wrote: > Sorry about that. Attempt #2 is in svn. > > Mihael > > On Tue, 2011-05-24 at 14:15 -0500, David Kelly wrote: > > Thanks. I gave it a try and this is what I am seeing now: > > > > Failed to connect: Connection refused at /home/davidk/work/worker.pl > > line 300. > > > > from coaster.log: > > Error starting coaster service: null > > Error starting coaster service > > java.lang.NullPointerException > > at > > > org.globus.cog.abstraction.coaster.service.CoasterPersistentService.main(CoasterPersistentService.java:145) > > > > David > > > > On Tue, May 24, 2011 at 1:57 PM, Mihael Hategan > > wrote: > > I committed a tentative fix to trunk. I won't swear by it, but > > please > > give it a try and let me know how it works. > > > > Mihael > > > > > > On Tue, 2011-05-24 at 13:44 -0500, David Kelly wrote: > > > Attached is the coaster.log that is being generated, but it > > doesn't > > > contain much information. > > > > > > But where it was throwing the exception, I called > > getStrackTrace() to > > > get more info and saved it. Hope this helps: > > > > > > > > > org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.nextId(JobQueue.java:122) > > > > > > org.globus.cog.abstraction.coaster.service.LocalTCPService.registrationReceived(LocalTCPService.java:58) > > > > > > org.globus.cog.abstraction.coaster.service.local.RegistrationHandler.requestComplete(RegistrationHandler.java:41) > > > > > > org.globus.cog.karajan.workflow.service.handlers.RequestHandler.receiveCompleted(RequestHandler.java:84) > > > > > > org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleRequest(AbstractKarajanChannel.java:390) > > > > > > org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel.step(AbstractStreamKarajanChannel.java:158) > > > > > > org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Multiplexer.run(AbstractStreamKarajanChannel.java:377) > > > > > > David > > > > > > On Tue, May 24, 2011 at 12:47 PM, Mihael Hategan > > > > > wrote: > > > Do you have logs or exact stack trace for this? > > > > > > > > > On Tue, 2011-05-24 at 10:21 -0500, David Kelly > > wrote: > > > > Hello, > > > > > > > > When trying to run coaster-service in passive > > mode, I get a > > > null > > > > pointer exception when starting worker nodes. This > > seems to > > > happen > > > > when the worker node tries to register with the > > coaster > > > service. I > > > > get: > > > > > > > > Failed to process data: Failed to register > > (service returned > > > error: > > > > java.lang.NullPointerException) > > > at /home/davidk/work/worker.pl line > > > > 762. > > > > > > > > The exception gets thrown from > > RegistrationHandler.java. > > > Running a > > > > swift script before starting the workers seems to > > be the > > > workaround.. > > > > but I think with the -passive option we shouldn't > > need to do > > > that > > > > anymore, is that correct? > > > > > > > > Thanks, > > > > David > > > > > > > > > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bresnaha at mcs.anl.gov Tue May 24 15:18:47 2011 From: bresnaha at mcs.anl.gov (John Bresnahan) Date: Tue, 24 May 2011 10:18:47 -1000 Subject: [Swift-devel] Re: Getting VMs from FG for use with swift In-Reply-To: References: <4DD70560.1020901@mcs.anl.gov> Message-ID: <4DDC12A7.8010304@mcs.anl.gov> The GPFS server on the FG cluster hotel died yesterday so I cannot get you your credentials. I'll get back to you when it is up again. Once it is back the process for getting the needed access keys is described here: https://portal.futuregrid.org/tutorials/nimbus On 05/23/2011 05:24 AM, David Kelly wrote: > Hi John, > > I now have a futuregrid account and am added to a project. I am now trying to get our scripts > working together. > > I ran into a few problems at first when trying to run the futuregrid scripts. On the first system I > tried I was getting a traceback. It is possible that the system I was using has older versions of > some of the needed libraries. Then I tried it on a more system that is more frequently updated - my > laptop running Ubuntu 10.10. It needed a newer version of the Python crypto tools installed, so I > installed that (and the python development libraries) and that part seems fine now. > > I am now up to the point of the install script where it is trying to register keys, but it is > failing. My guess is that I need to change FUTUREGRID_IAAS_ACCESS_KEY and FUTUREGRID_IAAS_SECRET_KEY > in env.sh. I'm not sure what these should be exactly. Are these the contents of my ssh keys, an ssh > key and a passphrase, or some other type of security? I've tried a few combinations of different > things but haven't had much luck yet. > > Thanks! > > Regards, > David > > > Traceback from earlier: > Installing setuptools.......................done. > Complete output from command /autonfs/home/davidk/swift-vm-...ython > /autonfs/home/davidk/swift-vm-...stall pip: > Searching for pip > Reading http://pypi.python.org/simple/pip/ > Reading http://pip.openplans.org > Reading http://www.pip-installer.org > Best match: pip 1.0.1 > Downloading > http://pypi.python.org/packages/source/p/pip/pip-1.0.1.tar.gz#md5=28dcc70225e5bf925532abc5b087a94b > Processing pip-1.0.1.tar.gz > Running pip-1.0.1/setup.py -q bdist_egg --dist-dir > /tmp/easy_install-GHsjHX/pip-1.0.1/egg-dist-tmp-rXjQ7L > Traceback (most recent call last): > File "/autonfs/home/davidk/swift-vm-boot/ve/bin/easy_install", line 8, in > load_entry_point('setuptools==0.6c11', 'console_scripts', 'easy_install')() > File > "/autonfs/home/davidk/swift-vm-boot/ve/lib/python2.6/site-packages/setuptools-0.6c11-py2.6.egg/setuptools/command/easy_install.py", > line 1712, in main > File > "/autonfs/home/davidk/swift-vm-boot/ve/lib/python2.6/site-packages/setuptools-0.6c11-py2.6.egg/setuptools/command/easy_install.py", > line 1700, in with_ei_usage > File > "/autonfs/home/davidk/swift-vm-boot/ve/lib/python2.6/site-packages/setuptools-0.6c11-py2.6.egg/setuptools/command/easy_install.py", > line 1716, in > File "/soft/python-2.6.1-r1/lib/python2.6/distutils/core.py", line 152, in setup > dist.run_commands() > File "/soft/python-2.6.1-r1/lib/python2.6/distutils/dist.py", line 975, in run_commands > self.run_command(cmd) > File "/soft/python-2.6.1-r1/lib/python2.6/distutils/dist.py", line 995, in run_command > cmd_obj.run() > File > "/autonfs/home/davidk/swift-vm-boot/ve/lib/python2.6/site-packages/setuptools-0.6c11-py2.6.egg/setuptools/command/easy_install.py", > line 211, in run > File > "/autonfs/home/davidk/swift-vm-boot/ve/lib/python2.6/site-packages/setuptools-0.6c11-py2.6.egg/setuptools/command/easy_install.py", > line 446, in easy_install > File > "/autonfs/home/davidk/swift-vm-boot/ve/lib/python2.6/site-packages/setuptools-0.6c11-py2.6.egg/setuptools/command/easy_install.py", > line 476, in install_item > File > "/autonfs/home/davidk/swift-vm-boot/ve/lib/python2.6/site-packages/setuptools-0.6c11-py2.6.egg/setuptools/command/easy_install.py", > line 655, in install_eggs > File > "/autonfs/home/davidk/swift-vm-boot/ve/lib/python2.6/site-packages/setuptools-0.6c11-py2.6.egg/setuptools/command/easy_install.py", > line 930, in build_and_install > File > "/autonfs/home/davidk/swift-vm-boot/ve/lib/python2.6/site-packages/setuptools-0.6c11-py2.6.egg/setuptools/command/easy_install.py", > line 919, in run_setup > File > "/autonfs/home/davidk/swift-vm-boot/ve/lib/python2.6/site-packages/setuptools-0.6c11-py2.6.egg/setuptools/sandbox.py", > line 52, in run_setup > AttributeError: 'module' object has no attribute '__getstate__' > ---------------------------------------- > Traceback (most recent call last): > File "bin/virtualenv.py", line 1647, in > main() > File "bin/virtualenv.py", line 558, in main > prompt=options.prompt) > File "bin/virtualenv.py", line 656, in create_environment > install_pip(py_executable) > File "bin/virtualenv.py", line 415, in install_pip > filter_stdout=_filter_setup) > File "bin/virtualenv.py", line 624, in call_subprocess > % (cmd_desc, proc.returncode)) > OSError: Command /autonfs/home/davidk/swift-vm-...ython /autonfs/home/davidk/swift-vm-...stall pip > failed with error code 1 > Failed to created the needed python virtual environment > > On Fri, May 20, 2011 at 7:20 PM, John Bresnahan > > wrote: > > Our phone call today left me motiviated to show you guys how easy it is to get virtual machines > for use with swift on FutureGrid. > > I made some small scripts around the Nimbus tool cloudinitd. The scripts just make installing > the software and running it trivial. With a single command you can get N VMs from the > FutureGrid Nimbus clouds (N can be on the order of hundreds). When the tool is done it outputs > a line separated list of hostnames. All of these hostnames have root access available via your > ~/.ssh/id_rsa keys. > > If/when you have FutureGrid credentials, untar the attachment and give it a try. There are a > few minor configurations needed: > > > 1) edit the file env.sh and set your FutureGrid security credentials: > > % cat env.sh > export FUTUREGRID_IAAS_ACCESS_KEY=XXXXXXXXXXXXXXXXXX > export FUTUREGRID_IAAS_SECRET_KEY=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > > export FUTUREGRID_HOTEL_NODES=2 > export FUTUREGRID_SIERRA_NODES=2 > > You can also change the value '2' to be whatever number of VMs you want. > > > 2) install it on your system. (this single command downloads and installs everything you need > under the cwd): > > % ./install.sh > > 3) boot the VMs > % ./bin/bootit.sh. > You will see much status output, but the last several lines will be the hostnames acquired from > the cloud. > > Let me know when you guys are ready to check this out! > > From davidkelly999 at gmail.com Wed May 25 11:44:17 2011 From: davidkelly999 at gmail.com (David Kelly) Date: Wed, 25 May 2011 11:44:17 -0500 Subject: [Swift-devel] Re: coaster-service -passive null pointer exception In-Reply-To: <1306265259.1890.1.camel@blabla2.none> References: <1306259278.31714.2.camel@blabla2.none> <1306263455.523.2.camel@blabla2.none> <1306265259.1890.1.camel@blabla2.none> Message-ID: Hey Mihael, I was testing my script a little more today and ran into some errors related to CONFIGSERVICE. In this example I was testing passive mode but manually specifying ports instead of using the port files. I'm not sure if that makes a difference - I included an output of ps so you could see how each of the apps were being called. There is more detail in the swift log and in logs/coaster.log. Thanks, David On Tue, May 24, 2011 at 2:27 PM, Mihael Hategan wrote: > Sorry about that. Attempt #2 is in svn. > > Mihael > > On Tue, 2011-05-24 at 14:15 -0500, David Kelly wrote: > > Thanks. I gave it a try and this is what I am seeing now: > > > > Failed to connect: Connection refused at /home/davidk/work/worker.pl > > line 300. > > > > from coaster.log: > > Error starting coaster service: null > > Error starting coaster service > > java.lang.NullPointerException > > at > > > org.globus.cog.abstraction.coaster.service.CoasterPersistentService.main(CoasterPersistentService.java:145) > > > > David > > > > On Tue, May 24, 2011 at 1:57 PM, Mihael Hategan > > wrote: > > I committed a tentative fix to trunk. I won't swear by it, but > > please > > give it a try and let me know how it works. > > > > Mihael > > > > > > On Tue, 2011-05-24 at 13:44 -0500, David Kelly wrote: > > > Attached is the coaster.log that is being generated, but it > > doesn't > > > contain much information. > > > > > > But where it was throwing the exception, I called > > getStrackTrace() to > > > get more info and saved it. Hope this helps: > > > > > > > > > org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.nextId(JobQueue.java:122) > > > > > > org.globus.cog.abstraction.coaster.service.LocalTCPService.registrationReceived(LocalTCPService.java:58) > > > > > > org.globus.cog.abstraction.coaster.service.local.RegistrationHandler.requestComplete(RegistrationHandler.java:41) > > > > > > org.globus.cog.karajan.workflow.service.handlers.RequestHandler.receiveCompleted(RequestHandler.java:84) > > > > > > org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleRequest(AbstractKarajanChannel.java:390) > > > > > > org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel.step(AbstractStreamKarajanChannel.java:158) > > > > > > org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Multiplexer.run(AbstractStreamKarajanChannel.java:377) > > > > > > David > > > > > > On Tue, May 24, 2011 at 12:47 PM, Mihael Hategan > > > > > wrote: > > > Do you have logs or exact stack trace for this? > > > > > > > > > On Tue, 2011-05-24 at 10:21 -0500, David Kelly > > wrote: > > > > Hello, > > > > > > > > When trying to run coaster-service in passive > > mode, I get a > > > null > > > > pointer exception when starting worker nodes. This > > seems to > > > happen > > > > when the worker node tries to register with the > > coaster > > > service. I > > > > get: > > > > > > > > Failed to process data: Failed to register > > (service returned > > > error: > > > > java.lang.NullPointerException) > > > at /home/davidk/work/worker.pl line > > > > 762. > > > > > > > > The exception gets thrown from > > RegistrationHandler.java. > > > Running a > > > > swift script before starting the workers seems to > > be the > > > workaround.. > > > > but I think with the -passive option we shouldn't > > need to do > > > that > > > > anymore, is that correct? > > > > > > > > Thanks, > > > > David > > > > > > > > > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configservice-error.tar.gz Type: application/x-gzip Size: 7062 bytes Desc: not available URL: From wozniak at mcs.anl.gov Wed May 25 12:29:30 2011 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Wed, 25 May 2011 12:29:30 -0500 (Central Daylight Time) Subject: [Swift-devel] No concall today Message-ID: Let's skip the call today... -- Justin M Wozniak From davidkelly999 at gmail.com Wed May 25 12:34:12 2011 From: davidkelly999 at gmail.com (David Kelly) Date: Wed, 25 May 2011 12:34:12 -0500 Subject: [Swift-devel] Re: coaster-service -passive null pointer exception In-Reply-To: References: <1306259278.31714.2.camel@blabla2.none> <1306263455.523.2.camel@blabla2.none> <1306265259.1890.1.camel@blabla2.none> Message-ID: Please disregard the previous email. The problem was that the sites.xml that gensites was generating was connecting to the wrong coaster port. It is working fine now, both with auto ports and manual ports. Thanks, David On Wed, May 25, 2011 at 11:44 AM, David Kelly wrote: > Hey Mihael, > > I was testing my script a little more today and ran into some errors > related to CONFIGSERVICE. In this example I was testing passive mode but > manually specifying ports instead of using the port files. I'm not sure if > that makes a difference - I included an output of ps so you could see how > each of the apps were being called. There is more detail in the swift log > and in logs/coaster.log. > > Thanks, > David > > > > On Tue, May 24, 2011 at 2:27 PM, Mihael Hategan wrote: > >> Sorry about that. Attempt #2 is in svn. >> >> Mihael >> >> On Tue, 2011-05-24 at 14:15 -0500, David Kelly wrote: >> > Thanks. I gave it a try and this is what I am seeing now: >> > >> > Failed to connect: Connection refused at /home/davidk/work/worker.pl >> > line 300. >> > >> > from coaster.log: >> > Error starting coaster service: null >> > Error starting coaster service >> > java.lang.NullPointerException >> > at >> > >> org.globus.cog.abstraction.coaster.service.CoasterPersistentService.main(CoasterPersistentService.java:145) >> > >> > David >> > >> > On Tue, May 24, 2011 at 1:57 PM, Mihael Hategan >> > wrote: >> > I committed a tentative fix to trunk. I won't swear by it, but >> > please >> > give it a try and let me know how it works. >> > >> > Mihael >> > >> > >> > On Tue, 2011-05-24 at 13:44 -0500, David Kelly wrote: >> > > Attached is the coaster.log that is being generated, but it >> > doesn't >> > > contain much information. >> > > >> > > But where it was throwing the exception, I called >> > getStrackTrace() to >> > > get more info and saved it. Hope this helps: >> > > >> > > >> > >> org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.nextId(JobQueue.java:122) >> > > >> > >> org.globus.cog.abstraction.coaster.service.LocalTCPService.registrationReceived(LocalTCPService.java:58) >> > > >> > >> org.globus.cog.abstraction.coaster.service.local.RegistrationHandler.requestComplete(RegistrationHandler.java:41) >> > > >> > >> org.globus.cog.karajan.workflow.service.handlers.RequestHandler.receiveCompleted(RequestHandler.java:84) >> > > >> > >> org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleRequest(AbstractKarajanChannel.java:390) >> > > >> > >> org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel.step(AbstractStreamKarajanChannel.java:158) >> > > >> > >> org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Multiplexer.run(AbstractStreamKarajanChannel.java:377) >> > > >> > > David >> > > >> > > On Tue, May 24, 2011 at 12:47 PM, Mihael Hategan >> > >> > > wrote: >> > > Do you have logs or exact stack trace for this? >> > > >> > > >> > > On Tue, 2011-05-24 at 10:21 -0500, David Kelly >> > wrote: >> > > > Hello, >> > > > >> > > > When trying to run coaster-service in passive >> > mode, I get a >> > > null >> > > > pointer exception when starting worker nodes. This >> > seems to >> > > happen >> > > > when the worker node tries to register with the >> > coaster >> > > service. I >> > > > get: >> > > > >> > > > Failed to process data: Failed to register >> > (service returned >> > > error: >> > > > java.lang.NullPointerException) >> > > at /home/davidk/work/worker.pl line >> > > > 762. >> > > > >> > > > The exception gets thrown from >> > RegistrationHandler.java. >> > > Running a >> > > > swift script before starting the workers seems to >> > be the >> > > workaround.. >> > > > but I think with the -passive option we shouldn't >> > need to do >> > > that >> > > > anymore, is that correct? >> > > > >> > > > Thanks, >> > > > David >> > > > >> > > >> > > >> > > >> > > >> > >> > >> > >> > >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wozniak at mcs.anl.gov Wed May 25 13:20:51 2011 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Wed, 25 May 2011 13:20:51 -0500 (Central Daylight Time) Subject: [Swift-devel] DataNode.toString() In-Reply-To: <1305672666.3986.2.camel@blabla2.none> References: <1305329222.3494.18.camel@blabla2.none> <1305672666.3986.2.camel@blabla2.none> Message-ID: This also affects the behavior of trace(). Might it impact other Swift functionality like mappers? On Tue, 17 May 2011, Mihael Hategan wrote: > Indeed. I committed a fix to unwrap swift data when passing values to > execute() (stdin, out, err, and the arguments in particular). > > I don't think toString() should be "overloaded" like it was. > > > On Tue, 2011-05-17 at 16:56 -0500, Justin M Wozniak wrote: >> I think this change affects arguments to apps: >> >> type file; >> >> (file f) echo (int i) { >> app { echo i stdout=@f; } >> } >> >> int greetings = 2; >> file hw = echo(greetings); >> >> ------> >> >> DEBUG vdl:execute2 JOB_START jobid=echo-tnyfi9ak tr=echo >> arguments=[greetings:int = 2.0 - Closed] >> ... >> >> On Fri, 13 May 2011, Mihael Hategan wrote: >> >>> I changed that in trunk. It used to be: >>> >>> org.griphyn.vdl.mapping.RootDataNode identifier >>> dataset:20110512-2343-5rl3b7x5:720000000072 type Sgt with no value at >>> dataset=sgt_var (not closed) >>> >>> That was annoying, noisy, and I had no idea what's what. >>> >>> It is now: >>> >>> name:type = value - [Open/Closed] >>> >>> The provenance data should still be the same, but it may not. So please >>> let me know if anything breaks. >>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> >> > > > -- Justin M Wozniak From hategan at mcs.anl.gov Wed May 25 14:26:18 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 25 May 2011 12:26:18 -0700 Subject: [Swift-devel] DataNode.toString() In-Reply-To: References: <1305329222.3494.18.camel@blabla2.none> <1305672666.3986.2.camel@blabla2.none> Message-ID: <1306351578.20662.1.camel@blabla2.none> On Wed, 2011-05-25 at 13:20 -0500, Justin M Wozniak wrote: > This also affects the behavior of trace(). Yes. That was the intent. Removing the noise from swiftData.toString(). > Might it impact other Swift > functionality like mappers? I don't think so, but if you can think of a specific scenario, I'm listening. > > On Tue, 17 May 2011, Mihael Hategan wrote: > > > Indeed. I committed a fix to unwrap swift data when passing values to > > execute() (stdin, out, err, and the arguments in particular). > > > > I don't think toString() should be "overloaded" like it was. > > > > > > On Tue, 2011-05-17 at 16:56 -0500, Justin M Wozniak wrote: > >> I think this change affects arguments to apps: > >> > >> type file; > >> > >> (file f) echo (int i) { > >> app { echo i stdout=@f; } > >> } > >> > >> int greetings = 2; > >> file hw = echo(greetings); > >> > >> ------> > >> > >> DEBUG vdl:execute2 JOB_START jobid=echo-tnyfi9ak tr=echo > >> arguments=[greetings:int = 2.0 - Closed] > >> ... > >> > >> On Fri, 13 May 2011, Mihael Hategan wrote: > >> > >>> I changed that in trunk. It used to be: > >>> > >>> org.griphyn.vdl.mapping.RootDataNode identifier > >>> dataset:20110512-2343-5rl3b7x5:720000000072 type Sgt with no value at > >>> dataset=sgt_var (not closed) > >>> > >>> That was annoying, noisy, and I had no idea what's what. > >>> > >>> It is now: > >>> > >>> name:type = value - [Open/Closed] > >>> > >>> The provenance data should still be the same, but it may not. So please > >>> let me know if anything breaks. > >>> > >>> _______________________________________________ > >>> Swift-devel mailing list > >>> Swift-devel at ci.uchicago.edu > >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>> > >> > > > > > > > From wozniak at mcs.anl.gov Wed May 25 14:53:15 2011 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Wed, 25 May 2011 14:53:15 -0500 (Central Daylight Time) Subject: [Swift-devel] DataNode.toString() In-Reply-To: <1306351578.20662.1.camel@blabla2.none> References: <1305329222.3494.18.camel@blabla2.none> <1305672666.3986.2.camel@blabla2.none> <1306351578.20662.1.camel@blabla2.none> Message-ID: Ok, just so we're all on the same page: trace("hi", "all"); used to say: SwiftScript trace: hi, all now says: SwiftScript trace: ?:string = hi - Closed, ?:string = all - Closed On Wed, 25 May 2011, Mihael Hategan wrote: > On Wed, 2011-05-25 at 13:20 -0500, Justin M Wozniak wrote: >> This also affects the behavior of trace(). > > Yes. That was the intent. Removing the noise from swiftData.toString(). > >> Might it impact other Swift >> functionality like mappers? > > I don't think so, but if you can think of a specific scenario, I'm > listening. > >> >> On Tue, 17 May 2011, Mihael Hategan wrote: >> >>> Indeed. I committed a fix to unwrap swift data when passing values to >>> execute() (stdin, out, err, and the arguments in particular). >>> >>> I don't think toString() should be "overloaded" like it was. >>> >>> >>> On Tue, 2011-05-17 at 16:56 -0500, Justin M Wozniak wrote: >>>> I think this change affects arguments to apps: >>>> >>>> type file; >>>> >>>> (file f) echo (int i) { >>>> app { echo i stdout=@f; } >>>> } >>>> >>>> int greetings = 2; >>>> file hw = echo(greetings); >>>> >>>> ------> >>>> >>>> DEBUG vdl:execute2 JOB_START jobid=echo-tnyfi9ak tr=echo >>>> arguments=[greetings:int = 2.0 - Closed] >>>> ... >>>> >>>> On Fri, 13 May 2011, Mihael Hategan wrote: >>>> >>>>> I changed that in trunk. It used to be: >>>>> >>>>> org.griphyn.vdl.mapping.RootDataNode identifier >>>>> dataset:20110512-2343-5rl3b7x5:720000000072 type Sgt with no value at >>>>> dataset=sgt_var (not closed) >>>>> >>>>> That was annoying, noisy, and I had no idea what's what. >>>>> >>>>> It is now: >>>>> >>>>> name:type = value - [Open/Closed] >>>>> >>>>> The provenance data should still be the same, but it may not. So please >>>>> let me know if anything breaks. >>>>> >>>>> _______________________________________________ >>>>> Swift-devel mailing list >>>>> Swift-devel at ci.uchicago.edu >>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>> >>>> >>> >>> >>> >> > > > -- Justin M Wozniak From hategan at mcs.anl.gov Wed May 25 15:03:11 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 25 May 2011 13:03:11 -0700 Subject: [Swift-devel] DataNode.toString() In-Reply-To: References: <1305329222.3494.18.camel@blabla2.none> <1305672666.3986.2.camel@blabla2.none> <1306351578.20662.1.camel@blabla2.none> Message-ID: <1306353791.22321.3.camel@blabla2.none> data.toString() had this hack that if there was a value, it would return that value (which was assumed useable in fuctional code), otherwise it would return the cryptic stuff. So depending on whether we want the old behavior or not is trace and tracef, I can add a .getValue() where needed. So do we? Mihael On Wed, 2011-05-25 at 14:53 -0500, Justin M Wozniak wrote: > Ok, just so we're all on the same page: > > trace("hi", "all"); > > used to say: > > SwiftScript trace: hi, all > > now says: > > SwiftScript trace: ?:string = hi - Closed, ?:string = all - Closed > > On Wed, 25 May 2011, Mihael Hategan wrote: > > > On Wed, 2011-05-25 at 13:20 -0500, Justin M Wozniak wrote: > >> This also affects the behavior of trace(). > > > > Yes. That was the intent. Removing the noise from swiftData.toString(). > > > >> Might it impact other Swift > >> functionality like mappers? > > > > I don't think so, but if you can think of a specific scenario, I'm > > listening. > > > >> > >> On Tue, 17 May 2011, Mihael Hategan wrote: > >> > >>> Indeed. I committed a fix to unwrap swift data when passing values to > >>> execute() (stdin, out, err, and the arguments in particular). > >>> > >>> I don't think toString() should be "overloaded" like it was. > >>> > >>> > >>> On Tue, 2011-05-17 at 16:56 -0500, Justin M Wozniak wrote: > >>>> I think this change affects arguments to apps: > >>>> > >>>> type file; > >>>> > >>>> (file f) echo (int i) { > >>>> app { echo i stdout=@f; } > >>>> } > >>>> > >>>> int greetings = 2; > >>>> file hw = echo(greetings); > >>>> > >>>> ------> > >>>> > >>>> DEBUG vdl:execute2 JOB_START jobid=echo-tnyfi9ak tr=echo > >>>> arguments=[greetings:int = 2.0 - Closed] > >>>> ... > >>>> > >>>> On Fri, 13 May 2011, Mihael Hategan wrote: > >>>> > >>>>> I changed that in trunk. It used to be: > >>>>> > >>>>> org.griphyn.vdl.mapping.RootDataNode identifier > >>>>> dataset:20110512-2343-5rl3b7x5:720000000072 type Sgt with no value at > >>>>> dataset=sgt_var (not closed) > >>>>> > >>>>> That was annoying, noisy, and I had no idea what's what. > >>>>> > >>>>> It is now: > >>>>> > >>>>> name:type = value - [Open/Closed] > >>>>> > >>>>> The provenance data should still be the same, but it may not. So please > >>>>> let me know if anything breaks. > >>>>> > >>>>> _______________________________________________ > >>>>> Swift-devel mailing list > >>>>> Swift-devel at ci.uchicago.edu > >>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>>>> > >>>> > >>> > >>> > >>> > >> > > > > > > > From wozniak at mcs.anl.gov Wed May 25 15:36:53 2011 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Wed, 25 May 2011 15:36:53 -0500 (Central Daylight Time) Subject: [Swift-devel] DataNode.toString() In-Reply-To: <1306353791.22321.3.camel@blabla2.none> References: <1305329222.3494.18.camel@blabla2.none> <1305672666.3986.2.camel@blabla2.none> <1306351578.20662.1.camel@blabla2.none> <1306353791.22321.3.camel@blabla2.none> Message-ID: I think we do want the old behavior for trace(). I can update tracef(). On Wed, 25 May 2011, Mihael Hategan wrote: > data.toString() had this hack that if there was a value, it would return > that value (which was assumed useable in fuctional code), otherwise it > would return the cryptic stuff. > > So depending on whether we want the old behavior or not is trace and > tracef, I can add a .getValue() where needed. > > So do we? > > Mihael > > On Wed, 2011-05-25 at 14:53 -0500, Justin M Wozniak wrote: >> Ok, just so we're all on the same page: >> >> trace("hi", "all"); >> >> used to say: >> >> SwiftScript trace: hi, all >> >> now says: >> >> SwiftScript trace: ?:string = hi - Closed, ?:string = all - Closed >> >> On Wed, 25 May 2011, Mihael Hategan wrote: >> >>> On Wed, 2011-05-25 at 13:20 -0500, Justin M Wozniak wrote: >>>> This also affects the behavior of trace(). >>> >>> Yes. That was the intent. Removing the noise from swiftData.toString(). >>> >>>> Might it impact other Swift >>>> functionality like mappers? >>> >>> I don't think so, but if you can think of a specific scenario, I'm >>> listening. >>> >>>> >>>> On Tue, 17 May 2011, Mihael Hategan wrote: >>>> >>>>> Indeed. I committed a fix to unwrap swift data when passing values to >>>>> execute() (stdin, out, err, and the arguments in particular). >>>>> >>>>> I don't think toString() should be "overloaded" like it was. >>>>> >>>>> >>>>> On Tue, 2011-05-17 at 16:56 -0500, Justin M Wozniak wrote: >>>>>> I think this change affects arguments to apps: >>>>>> >>>>>> type file; >>>>>> >>>>>> (file f) echo (int i) { >>>>>> app { echo i stdout=@f; } >>>>>> } >>>>>> >>>>>> int greetings = 2; >>>>>> file hw = echo(greetings); >>>>>> >>>>>> ------> >>>>>> >>>>>> DEBUG vdl:execute2 JOB_START jobid=echo-tnyfi9ak tr=echo >>>>>> arguments=[greetings:int = 2.0 - Closed] >>>>>> ... >>>>>> >>>>>> On Fri, 13 May 2011, Mihael Hategan wrote: >>>>>> >>>>>>> I changed that in trunk. It used to be: >>>>>>> >>>>>>> org.griphyn.vdl.mapping.RootDataNode identifier >>>>>>> dataset:20110512-2343-5rl3b7x5:720000000072 type Sgt with no value at >>>>>>> dataset=sgt_var (not closed) >>>>>>> >>>>>>> That was annoying, noisy, and I had no idea what's what. >>>>>>> >>>>>>> It is now: >>>>>>> >>>>>>> name:type = value - [Open/Closed] >>>>>>> >>>>>>> The provenance data should still be the same, but it may not. So please >>>>>>> let me know if anything breaks. >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Swift-devel mailing list >>>>>>> Swift-devel at ci.uchicago.edu >>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>>> >>>>>> >>>>> >>>>> >>>>> >>>> >>> >>> >>> >> > > > > -- Justin M Wozniak From hategan at mcs.anl.gov Wed May 25 15:53:21 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 25 May 2011 13:53:21 -0700 Subject: [Swift-devel] DataNode.toString() In-Reply-To: References: <1305329222.3494.18.camel@blabla2.none> <1305672666.3986.2.camel@blabla2.none> <1306351578.20662.1.camel@blabla2.none> <1306353791.22321.3.camel@blabla2.none> Message-ID: <1306356801.24155.0.camel@blabla2.none> Fix in trunk r4530. On Wed, 2011-05-25 at 15:36 -0500, Justin M Wozniak wrote: > I think we do want the old behavior for trace(). I can update tracef(). > > On Wed, 25 May 2011, Mihael Hategan wrote: > > > data.toString() had this hack that if there was a value, it would return > > that value (which was assumed useable in fuctional code), otherwise it > > would return the cryptic stuff. > > > > So depending on whether we want the old behavior or not is trace and > > tracef, I can add a .getValue() where needed. > > > > So do we? > > > > Mihael > > > > On Wed, 2011-05-25 at 14:53 -0500, Justin M Wozniak wrote: > >> Ok, just so we're all on the same page: > >> > >> trace("hi", "all"); > >> > >> used to say: > >> > >> SwiftScript trace: hi, all > >> > >> now says: > >> > >> SwiftScript trace: ?:string = hi - Closed, ?:string = all - Closed > >> > >> On Wed, 25 May 2011, Mihael Hategan wrote: > >> > >>> On Wed, 2011-05-25 at 13:20 -0500, Justin M Wozniak wrote: > >>>> This also affects the behavior of trace(). > >>> > >>> Yes. That was the intent. Removing the noise from swiftData.toString(). > >>> > >>>> Might it impact other Swift > >>>> functionality like mappers? > >>> > >>> I don't think so, but if you can think of a specific scenario, I'm > >>> listening. > >>> > >>>> > >>>> On Tue, 17 May 2011, Mihael Hategan wrote: > >>>> > >>>>> Indeed. I committed a fix to unwrap swift data when passing values to > >>>>> execute() (stdin, out, err, and the arguments in particular). > >>>>> > >>>>> I don't think toString() should be "overloaded" like it was. > >>>>> > >>>>> > >>>>> On Tue, 2011-05-17 at 16:56 -0500, Justin M Wozniak wrote: > >>>>>> I think this change affects arguments to apps: > >>>>>> > >>>>>> type file; > >>>>>> > >>>>>> (file f) echo (int i) { > >>>>>> app { echo i stdout=@f; } > >>>>>> } > >>>>>> > >>>>>> int greetings = 2; > >>>>>> file hw = echo(greetings); > >>>>>> > >>>>>> ------> > >>>>>> > >>>>>> DEBUG vdl:execute2 JOB_START jobid=echo-tnyfi9ak tr=echo > >>>>>> arguments=[greetings:int = 2.0 - Closed] > >>>>>> ... > >>>>>> > >>>>>> On Fri, 13 May 2011, Mihael Hategan wrote: > >>>>>> > >>>>>>> I changed that in trunk. It used to be: > >>>>>>> > >>>>>>> org.griphyn.vdl.mapping.RootDataNode identifier > >>>>>>> dataset:20110512-2343-5rl3b7x5:720000000072 type Sgt with no value at > >>>>>>> dataset=sgt_var (not closed) > >>>>>>> > >>>>>>> That was annoying, noisy, and I had no idea what's what. > >>>>>>> > >>>>>>> It is now: > >>>>>>> > >>>>>>> name:type = value - [Open/Closed] > >>>>>>> > >>>>>>> The provenance data should still be the same, but it may not. So please > >>>>>>> let me know if anything breaks. > >>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> Swift-devel mailing list > >>>>>>> Swift-devel at ci.uchicago.edu > >>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> > >>>> > >>> > >>> > >>> > >> > > > > > > > > > From hategan at mcs.anl.gov Wed May 25 21:22:13 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 25 May 2011 19:22:13 -0700 Subject: [Swift-devel] DataNode.toString() In-Reply-To: <1306356801.24155.0.camel@blabla2.none> References: <1305329222.3494.18.camel@blabla2.none> <1305672666.3986.2.camel@blabla2.none> <1306351578.20662.1.camel@blabla2.none> <1306353791.22321.3.camel@blabla2.none> <1306356801.24155.0.camel@blabla2.none> Message-ID: <1306376533.31856.3.camel@blabla2.none> What it does break, however, is the provenance stuff. So I'd like to know: 1. If anybody uses that 2. If we want it 3. If there is a way to combine the two (i.e. a simple provenance id combined with the current toString()) I would assume "maybe" for 1, "yes" for 2, and I'd like to think "yes" for 3 (Ben, what exactly is relevant WRT the provenance tools in the dshandle identifying string?). Mihael On Wed, 2011-05-25 at 13:53 -0700, Mihael Hategan wrote: > Fix in trunk r4530. > > On Wed, 2011-05-25 at 15:36 -0500, Justin M Wozniak wrote: > > I think we do want the old behavior for trace(). I can update tracef(). > > > > On Wed, 25 May 2011, Mihael Hategan wrote: > > > > > data.toString() had this hack that if there was a value, it would return > > > that value (which was assumed useable in fuctional code), otherwise it > > > would return the cryptic stuff. > > > > > > So depending on whether we want the old behavior or not is trace and > > > tracef, I can add a .getValue() where needed. > > > > > > So do we? > > > > > > Mihael > > > > > > On Wed, 2011-05-25 at 14:53 -0500, Justin M Wozniak wrote: > > >> Ok, just so we're all on the same page: > > >> > > >> trace("hi", "all"); > > >> > > >> used to say: > > >> > > >> SwiftScript trace: hi, all > > >> > > >> now says: > > >> > > >> SwiftScript trace: ?:string = hi - Closed, ?:string = all - Closed > > >> > > >> On Wed, 25 May 2011, Mihael Hategan wrote: > > >> > > >>> On Wed, 2011-05-25 at 13:20 -0500, Justin M Wozniak wrote: > > >>>> This also affects the behavior of trace(). > > >>> > > >>> Yes. That was the intent. Removing the noise from swiftData.toString(). > > >>> > > >>>> Might it impact other Swift > > >>>> functionality like mappers? > > >>> > > >>> I don't think so, but if you can think of a specific scenario, I'm > > >>> listening. > > >>> > > >>>> > > >>>> On Tue, 17 May 2011, Mihael Hategan wrote: > > >>>> > > >>>>> Indeed. I committed a fix to unwrap swift data when passing values to > > >>>>> execute() (stdin, out, err, and the arguments in particular). > > >>>>> > > >>>>> I don't think toString() should be "overloaded" like it was. > > >>>>> > > >>>>> > > >>>>> On Tue, 2011-05-17 at 16:56 -0500, Justin M Wozniak wrote: > > >>>>>> I think this change affects arguments to apps: > > >>>>>> > > >>>>>> type file; > > >>>>>> > > >>>>>> (file f) echo (int i) { > > >>>>>> app { echo i stdout=@f; } > > >>>>>> } > > >>>>>> > > >>>>>> int greetings = 2; > > >>>>>> file hw = echo(greetings); > > >>>>>> > > >>>>>> ------> > > >>>>>> > > >>>>>> DEBUG vdl:execute2 JOB_START jobid=echo-tnyfi9ak tr=echo > > >>>>>> arguments=[greetings:int = 2.0 - Closed] > > >>>>>> ... > > >>>>>> > > >>>>>> On Fri, 13 May 2011, Mihael Hategan wrote: > > >>>>>> > > >>>>>>> I changed that in trunk. It used to be: > > >>>>>>> > > >>>>>>> org.griphyn.vdl.mapping.RootDataNode identifier > > >>>>>>> dataset:20110512-2343-5rl3b7x5:720000000072 type Sgt with no value at > > >>>>>>> dataset=sgt_var (not closed) > > >>>>>>> > > >>>>>>> That was annoying, noisy, and I had no idea what's what. > > >>>>>>> > > >>>>>>> It is now: > > >>>>>>> > > >>>>>>> name:type = value - [Open/Closed] > > >>>>>>> > > >>>>>>> The provenance data should still be the same, but it may not. So please > > >>>>>>> let me know if anything breaks. > > >>>>>>> > > >>>>>>> _______________________________________________ > > >>>>>>> Swift-devel mailing list > > >>>>>>> Swift-devel at ci.uchicago.edu > > >>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > >>>>>>> > > >>>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>> > > >>> > > >>> > > >>> > > >> > > > > > > > > > > > > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From tim.g.armstrong at gmail.com Thu May 26 14:59:46 2011 From: tim.g.armstrong at gmail.com (Tim Armstrong) Date: Thu, 26 May 2011 14:59:46 -0500 Subject: [Swift-devel] recent error on beagle In-Reply-To: <1306090299.2956.1.camel@blabla2.none> References: <2008592710.93053.1306015923349.JavaMail.root@zimbra.anl.gov> <4DD91203.6090000@gmail.com> <1306090299.2956.1.camel@blabla2.none> Message-ID: Hi, I've encountered this issue with SwiftR, running release 0.92 from the svn repository. The issue occurs when GLOBUS::maxWallTime="03:55:00" in tc and maxTime is 4 hours in sites.xml. After 5 minutes (or whatever the difference is between the two times), I get the exception copied below. A tarball is attached with the logs, script, etc. replicate.sh shows how to replicate the issue on PADS. Assuming that my problem is the same as the others, it would be good if the fix could be merged to release 0.92, as I'm trying to bundle stable swift releases with SwiftR. - Tim Swift svn swift-r4336 cog-r3096 (cog modified locally) RunID: 20110526-1317-2c8ybi10 Progress: SwiftScript trace: top of loop: rserver waiting for input on, /tmp/nbest/SwiftR/swift.0827/requestpipe Progress: Active:1 Progress: Finished successfully:1 SwiftScript trace: rserver: got dir, /tmp/nbest/SwiftR/requests.P09626/R0000007 Progress: uninitialized:1 Finished successfully:1 Progress: Submitted:1 Finished successfully:1 Progress: Active:1 Finished successfully:1 Progress: Active:1 Finished successfully:1 Progress: Active:1 Finished successfully:1 Progress: Active:1 Finished successfully:1 Progress: Active:1 Finished successfully:1 Progress: Active:1 Finished successfully:1 Progress: Active:1 Finished successfully:1 Progress: Active:1 Finished successfully:1 Progress: Active:1 Finished successfully:1 queuedsize > 0 but no job dequeued. Queued: {} java.lang.Throwable at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:252) at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:520) at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109) queuedsize > 0 but no job dequeued. Queued: {} java.lang.Throwable at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:252) at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:520) at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109) Progress: Finished successfully:1 Failed but can retry:1 On Sun, May 22, 2011 at 1:51 PM, Mihael Hategan wrote: > The second one looks to me like a coaster problem. Can't say much about > the first issue. > > Can you try with plain pbs if you want to test the pbs provider? > > Mihael > > On Sun, 2011-05-22 at 08:39 -0500, ketan wrote: > > I can confirm that the trunk is not usable for pbs provider. I am using > > trunk for submitting jobs on beagle and I see a few unexpected things: > > > > 1. The stderr is showing inconsistent messages: The results are getting > > written to the output even though stderr doesn't report any. > > 2. qsub jobs being cancelled inadvertantly: I submitted 40 of them > > yesterday, however, only 2 survived today. The log is here: > > > > > http://www.ci.uchicago.edu/~ketan/files/ftdock-20110521-0337-pokpgg89.log > > > > In addition, the ssh-pbs provider does not seem to be working for large > > runs (it worked for a small number of test runs): Getting unexpected > > stdouts. Following is the stdout: > > > > http://www.ci.uchicago.edu/~ketan/files/ssh-pbs.stdout > > > > Following is the log file for the above run: > > > > > http://www.ci.uchicago.edu/~ketan/files/ftdock-20110521-1750-b0cot9sa.log > > > > > > Ketan > > > > On 5/21/11 5:12 PM, Michael Wilde wrote: > > > > > > ----- Original Message ----- > > >> On Sat, 2011-05-21 at 17:06 -0400, Glen Hocky wrote: > > >>> as I mentioned, I've been running with Mike's swift which was > > >>> patched > > >>> for beagle. are all the things that make running on beagle work in > > >>> trunk? > > >> No idea. > > >> > > >> Mike? > > > Justin, working with Ketan, just applied changes to trunk which should > make it work now on Beagle (or any Cray XT5+ or XE). This uses a different > set of sites.xml tags than the prototype in the current Beagle swift 0.92.1 > module. Justin has a note on this at: > > > https://sites.google.com/site/swiftdevel/sites/pbs/cray > > > > > > It was working before for one-node worker jobs; now it should work for > multi-node worker jobs as well. > > > > > > Justin and Ketan should comment on the state of testing and readiness > of this trunk feature. Don't try trunk on Beagle till they give the > go-ahead. > > > > > > - Mike > > > > > >>> If so i'll update to the latest and test. I don't think I'm > > >>> using stable... > > >> Ok > > >> > > >> Mihael > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: swiftR-fail.tgz Type: application/x-gzip Size: 23917 bytes Desc: not available URL: From hategan at mcs.anl.gov Thu May 26 15:41:37 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 26 May 2011 13:41:37 -0700 Subject: [Swift-devel] recent error on beagle In-Reply-To: References: <2008592710.93053.1306015923349.JavaMail.root@zimbra.anl.gov> <4DD91203.6090000@gmail.com> <1306090299.2956.1.camel@blabla2.none> Message-ID: <1306442497.16145.1.camel@blabla2.none> Given that this has now been reported a number of times, it may make sense to backport the fix from trunk and make a patch release for 0.92. Objections? On Thu, 2011-05-26 at 14:59 -0500, Tim Armstrong wrote: > Hi, > I've encountered this issue with SwiftR, running release 0.92 from > the svn repository. The issue occurs when > GLOBUS::maxWallTime="03:55:00" in tc and maxTime is 4 hours in > sites.xml. After 5 minutes (or whatever the difference is between the > two times), I get the exception copied below. A tarball is attached > with the logs, script, etc. replicate.sh shows how to replicate the > issue on PADS. > > Assuming that my problem is the same as the others, it would be good > if the fix could be merged to release 0.92, as I'm trying to bundle > stable swift releases with SwiftR. > > - Tim > > > Swift svn swift-r4336 cog-r3096 (cog modified locally) > > RunID: 20110526-1317-2c8ybi10 > Progress: > SwiftScript trace: top of loop: rserver waiting for input > on, /tmp/nbest/SwiftR/swift.0827/requestpipe > Progress: Active:1 > Progress: Finished successfully:1 > SwiftScript trace: rserver: got > dir, /tmp/nbest/SwiftR/requests.P09626/R0000007 > Progress: uninitialized:1 Finished successfully:1 > Progress: Submitted:1 Finished successfully:1 > Progress: Active:1 Finished successfully:1 > Progress: Active:1 Finished successfully:1 > Progress: Active:1 Finished successfully:1 > Progress: Active:1 Finished successfully:1 > Progress: Active:1 Finished successfully:1 > Progress: Active:1 Finished successfully:1 > Progress: Active:1 Finished successfully:1 > Progress: Active:1 Finished successfully:1 > Progress: Active:1 Finished successfully:1 > queuedsize > 0 but no job dequeued. Queued: {} > java.lang.Throwable > at > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:252) > at > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:520) > at > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109) > queuedsize > 0 but no job dequeued. Queued: {} > java.lang.Throwable > at > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:252) > at > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:520) > at > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109) > Progress: Finished successfully:1 Failed but can retry:1 > > > On Sun, May 22, 2011 at 1:51 PM, Mihael Hategan > wrote: > The second one looks to me like a coaster problem. Can't say > much about > the first issue. > > Can you try with plain pbs if you want to test the pbs > provider? > > Mihael > > > On Sun, 2011-05-22 at 08:39 -0500, ketan wrote: > > I can confirm that the trunk is not usable for pbs provider. > I am using > > trunk for submitting jobs on beagle and I see a few > unexpected things: > > > > 1. The stderr is showing inconsistent messages: The results > are getting > > written to the output even though stderr doesn't report any. > > 2. qsub jobs being cancelled inadvertantly: I submitted 40 > of them > > yesterday, however, only 2 survived today. The log is here: > > > > > http://www.ci.uchicago.edu/~ketan/files/ftdock-20110521-0337-pokpgg89.log > > > > In addition, the ssh-pbs provider does not seem to be > working for large > > runs (it worked for a small number of test runs): Getting > unexpected > > stdouts. Following is the stdout: > > > > http://www.ci.uchicago.edu/~ketan/files/ssh-pbs.stdout > > > > Following is the log file for the above run: > > > > > http://www.ci.uchicago.edu/~ketan/files/ftdock-20110521-1750-b0cot9sa.log > > > > > > Ketan > > > > On 5/21/11 5:12 PM, Michael Wilde wrote: > > > > > > ----- Original Message ----- > > >> On Sat, 2011-05-21 at 17:06 -0400, Glen Hocky wrote: > > >>> as I mentioned, I've been running with Mike's swift > which was > > >>> patched > > >>> for beagle. are all the things that make running on > beagle work in > > >>> trunk? > > >> No idea. > > >> > > >> Mike? > > > Justin, working with Ketan, just applied changes to trunk > which should make it work now on Beagle (or any Cray XT5+ or > XE). This uses a different set of sites.xml tags than the > prototype in the current Beagle swift 0.92.1 module. Justin > has a note on this at: > > > https://sites.google.com/site/swiftdevel/sites/pbs/cray > > > > > > It was working before for one-node worker jobs; now it > should work for multi-node worker jobs as well. > > > > > > Justin and Ketan should comment on the state of testing > and readiness of this trunk feature. Don't try trunk on > Beagle till they give the go-ahead. > > > > > > - Mike > > > > > >>> If so i'll update to the latest and test. I don't > think I'm > > >>> using stable... > > >> Ok > > >> > > >> Mihael > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > From iraicu at cs.iit.edu Fri May 27 04:30:18 2011 From: iraicu at cs.iit.edu (Ioan Raicu) Date: Fri, 27 May 2011 04:30:18 -0500 Subject: [Swift-devel] CFP: The 12th IEEE/ACM International Conference on Grid Computing (Grid2011) Message-ID: <4DDF6F2A.3030107@cs.iit.edu> Call for Papers *The 12th IEEE/ACM International Conference on Grid Computing (Grid2011)* *Lyon, France Sep 21 -- Sep 23, 2011* http://grid2011.mnm-team.org/ Co-located with the EGI Technical Forum Sponsorship in discussion: * The IEEE Computer Society Technical Committee on Scalable Computing (pending) * Association for Computing Machinery Special Interest Group on Computer Architecture (pending) Grid computing enables the sharing of distributed computing and data resources such as processing, network bandwidth and storage capacity to create a cohesive resource environment for executing distributed applications. The Grid conference series is an annual international meeting that brings together a community of researchers, developers, practitioners, and users involved with Grid technology. The objective of the meeting is to serve as both the premier venue for presenting foremost research results in the area and as a forum for introducing and exploring new concepts. The conference will feature invited talks, workshops, and refereed paper presentations. Grid 2011 welcomes paper and poster submissions on innovative work from researchers in academia, industry and government describing original research work in grid computing. Previous events in this series have been successful in attracting high quality papers and a wide international participation. This event will be co-located with the EGI Technical Forum.** *SCOPE* Grid 2011 topics of interest include, but are not limited to: * Applications, including eScience and eBusiness Applications * Architectures and Fabrics * Authentication, Authorization, Auditing and Accounting * Autonomic and Utility Computing on Global Grids * Cloud computing * Cloud, Cluster and Grid Integration * Creation and Management of Virtual Enterprises and Organizations * Critical surveys or reflections on the past decade on grid and distributed computing * Dynamic, Distributed, Data-Intensive Access, Management and Processing * Energy Efficiency and Grid * Grid Economy and Business Models * Infrastructure and Practise of Distributed Computing * Metadata, Ontologies, and Provenance * Middleware and Toolkits * Monitoring, Management and Organization Tools * Networking * Performance Measurement and Modelling * Problem Solving Environments * Programming Models, Tools and Environments * Production Cyberinfrastructure * QoS and SLA Negotiation * Resource Management, Scheduling, and Runtime Environments * Scientific, Industrial and Social Implications * Semantic Grid * Standardization efforts in Grid * Virtualization and grid computing** *TECHNICAL PAPERS* Grid 2011 invites authors to submit original papers (not published or under review elsewhere). Papers should be no more than 8 pages in length (including diagrams and references) and be submitted as a PDF file by using the submission system: URL Submission implies the willingness of at least one of the authors to register and present the paper. A separate conference proceedings will be published and will also be a part of the IEEE Xplore and the CS digital library. For author instructions visit http://grid2011.mnm-team.org . *IMPORTANT DATES* * 10 May 2011 Technical Paper Submission Open * 08 June 2011 Workshop proposal due * 15 June 2011 Workshop acceptance notification * *05 July 2011 Technical Paper Submission due* * 05 August 2011 Paper Acceptance Notifications * 15 August 2011 Full and Revised papers due * 15 August 2011 Poster submissions due * 25 August 2011 Poster Acceptance Notifications *CONFERENCE ORGANISATION* * General Chair: Nils gentschen Felde, MNM-Team / Ludwig-Maximilians-Universit?t M?nchen, Munich * Local Chair: Frederic Suter, IN2P3, Lyon * Program Chair: Shantenu Jha, Rutgers, USA and Edinburgh, UK * Workshop Chair & Poster Chair: Gilles Fedak, INRIA, Lyon * Proceedings and Publications Chair: Rajkumar Buyya, The University of Melbourne and Manjrasoft, Australia Program Vice Chairs: * Clouds and Virtualisation: Roger Barga, Microsoft Research * Distributed Production Cyberinfrastructure and Middleware: Andrew Grimshaw * e-Research and Applications: Daniel S. Katz, Univ. of Chicago and Argonne National Lab, US * Tools & Services, Resource Management & Runtime Environments: Ramin Yahyapour, Dortmund * Distributed Data-Intensive Science and Systems: Erwin Laure, KTH, Sweeden Publicity Chairs: * Suraj Pandey, Univ of Melbourne, Australia * Yoshiyuki Watase, KEK, Japan * Cameron Kiddle, Calgary, Canada * Ioan Raicu, Illinois Institute of Technology and Argonne National Laboratory, USA * Adam Barker, University of St Andrews, UK Program Comittee: * Andreas Aschenbrenner, Austrian Academy of Sciences * Ignacio Blanquer, Universidad Polit?cnica de Valencia, Spain * Jim Dowling, SICS, Sweden * Vangelis Floros, GRNET, Greece * Neil Chue Hong, EPCC, UK * Patrick Fuhrmann, DESY, DE * Jens Jensen, STFC, UK * Peter Kunszt, SystemsX, Switzerland * Hideo Matsuda, University of Osaka, Japan * Ioan Raicu, Illinois Institute of Technology and Argonne National Laboratory, USA * Heiko Schuldt, Basel University, Switzerland * Alex Sim, LBL, US * Osamu Tatebe, Tsukuba University, Japan * Domenico Talia, Universit? della Calabria, Italiy * Erik Elmroth, Ume? University, Sweden * Bastian Koller, HLRS, Germany * Rosa Badia, UPC, Spain * Marco Danelutto, Universit? di Pisa, Italy * Frederic Desprez, INRIA-LIP, France * Thilo Kielmann, Vrije Universiteit, The Netherlands * Satoshi Matsuoka, Tokyo * Alan Sill, Texas-Tech, US * Steven Newhouse, EGI, NL * Eva Deelman, ISI, USC Karolina, US * Sarnowska-Upton, Univ. of Virginia, US * Marty Humphrey, University of Virginia, US * Rob Gillen, Oak Ridge National Laboratory , US * Kate Keahey, Argonne National Laboratory, US * Judy Qiu, Indiana University, US * Douglas Thain, University of Notre Dame, US * Jim Myers, Rensselaer Polytechnic Institute, US * Tevfik Kosar, University at Buffalo, US * David Abramson, Monash University, Australia * Chaitanya Baru, San Diego Supercomputer Center, US * Jaliya Ekanayake, Microsoft Research, US * Miron Livny, Univ. of Wisconsin, US * Ian Foster, Univ. of Chicago, US * Dieter Kranzlmueller, Ludwig-Maximilians-Universit?t M?nchen, Germany * Gabrielle Allen, Louisiana State University, USA * David Bader, Georgia Institute of Technology, USA * Henri Bal, Vrije Universiteit, Netherlands * Eloisa Bentivegna, Max Planck Institute for Gravitational Physics, Germany * Nicolas Kourtellis, University of South Florida, USA * Patricia Kovatch, University of Tennessee, USA * Manish Parashar, Rutgers, USA * Alistair Rendell, Australian National University, Australia * Richard Sinnott, University of Melbourne, Australia * Alan Sussman, University of Maryland, USA * Mark Stillwell, INRIA-Universit? de Lyon-LIP, France * David Wallom, Oxford University, UK -- ================================================================= Ioan Raicu, Ph.D. Assistant Professor, Illinois Institute of Technology (IIT) Guest Research Faculty, Argonne National Laboratory (ANL) ================================================================= Data-Intensive Distributed Systems Laboratory, CS/IIT Distributed Systems Laboratory, MCS/ANL ================================================================= Cel: 1-847-722-0876 Office: 1-312-567-5704 Email: iraicu at cs.iit.edu Web: http://www.cs.iit.edu/~iraicu/ Web: http://datasys.cs.iit.edu/ ================================================================= ================================================================= -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Fri May 27 13:54:50 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 27 May 2011 11:54:50 -0700 Subject: [Swift-devel] log4j isLevelEnabled() Message-ID: <1306522490.8346.31.camel@blabla2.none> A bit of trivia on the use of: if (logger.isDebugEnabled()) { logger.debug("There are " + n + " birds on pole " + p); } While it may appear as unnecessary to add the if statement (and functionally it is unnecessary), the main reason for it is that if the above appears in some often invoked piece of code, and debug is disabled, the "if" check can be very quick, whereas without the if, that check has to happen anyway, but in addition, since the debug() method is only invoked after its arguments have been constructed, it involves an additional string creation (which in some cases may involve complex calls to some custom toString()), etc. So without the "if" it is likely slower and it unnecessarily involves the heap. The only case when this would be the worse choice is when the string passed to debug() was not the result of a concatenation (e.g. logger.debug("No birds")). From turam at mcs.anl.gov Fri May 27 14:15:00 2011 From: turam at mcs.anl.gov (Thomas Uram) Date: Fri, 27 May 2011 14:15:00 -0500 Subject: [Swift-devel] Status of swiftconfig Message-ID: <31553BCC-730A-44D1-81CB-6180BC965055@mcs.anl.gov> I need to build swift config files in a fashion to which swiftconfig appears well suited. In trying to run it, using swift 0.92 (Yes, I know about 0.92), I get: turam at login1 bin 1106$ !sw swiftconfig -template ssh Unable to find template for ssh In this case, what swiftconfig fails to find is this file: /autonfs/home/turam/dev/swift-0.92/etc/sites/ssh/sites.xml So it appears to assume that the template names are directories with a sites.xml file thereunder. In fact, the template names are themselves sites files. I've modified this in swiftconfig, which ends up throwing errors but eventually working, but I wonder: What is the status of swiftconfig? I checked SVN, and this disagreement over whether templates are directories or files persists there. Thanks, Tom From davidkelly999 at gmail.com Fri May 27 14:48:42 2011 From: davidkelly999 at gmail.com (David Kelly) Date: Fri, 27 May 2011 14:48:42 -0500 Subject: [Swift-devel] Status of swiftconfig In-Reply-To: <31553BCC-730A-44D1-81CB-6180BC965055@mcs.anl.gov> References: <31553BCC-730A-44D1-81CB-6180BC965055@mcs.anl.gov> Message-ID: Hello Tom, I think, in general, swiftconfig is being replaced with another utility called gensites. Gensites and swiftconfig are similar in that they create sites.xml files based on templates. Gensites can replace values in templates with environment variables or with PBS style comments in a swift configuration file. There is some preliminary documentation about how it works at https://sites.google.com/site/swiftguide/home/managingsites. I need to expand that documentation, but hopefully it should give you an idea of how it works. The template issue you are running into I think is due to there being a mix of both swiftconfig and gensites templates in etc/sites. The gensites templates are single files, where the swiftconfig templates are directories. I just cleaned this up a bit in trunk, but 0.92 may still have a strange mix of both. Please let me know if you have any questions about how it works. Thanks, David On Fri, May 27, 2011 at 2:15 PM, Thomas Uram wrote: > > I need to build swift config files in a fashion to which swiftconfig > appears well suited. In trying to run it, using swift 0.92 (Yes, I know > about 0.92), I get: > > turam at login1 bin 1106$ !sw > swiftconfig -template ssh > Unable to find template for ssh > > In this case, what swiftconfig fails to find is this file: > > /autonfs/home/turam/dev/swift-0.92/etc/sites/ssh/sites.xml > > So it appears to assume that the template names are directories with a > sites.xml file thereunder. In fact, the template names are themselves sites > files. I've modified this in swiftconfig, which ends up throwing errors but > eventually working, but I wonder: > > What is the status of swiftconfig? > > I checked SVN, and this disagreement over whether templates are directories > or files persists there. > > Thanks, > Tom > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: gensites Type: application/octet-stream Size: 5479 bytes Desc: not available URL: