From wilde at mcs.anl.gov Tue Jun 1 09:39:09 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 1 Jun 2010 09:39:09 -0500 (CDT) Subject: [Swift-devel] Re: Check tutorial/guide edits in and let student(s) know? In-Reply-To: Message-ID: <7099629.253261275403149687.JavaMail.root@zimbra> David, also to note: there is a procedure to push the content to the web server immediately; it would be good to test if that works for you, for future changes. Also, before I forget: we now (as of about 6 months or so?) have both a development trunk and a stable branch in SVN. We need to decide which of these the online doc content lives in. At some point, we may need a separate doc version for both, and/or for each release. So would be good to plan for the directory structure for that. - Mike ----- "David Kelly" wrote: > Hello Mike, > > I checked in the changes to the tutorial and documentation through SVN > this morning. From what I understand, everything should be updated on > the site by tomorrow. I will send out an email to the other students > to let them know. > > David > > > On Sat, May 29, 2010 at 11:39 AM, Michael Wilde < wilde at mcs.anl.gov > > wrote: > > > David, since the current online tutorial is already broken, and yours > is likely a big improvement, you should go ahead and check it in and > make sure it gets posted to the live guides. (might need to add you to > some unix group for that to work) > > Then work with the other summer students and the student that posted > to the list on Friday, to get some feedback as to whether the tutorial > is now error-free. > > Another thing that occurs: as the set of posted examples/tutorials > grows, can we create a "recipe index" that indexes the examples by a > categorized outline of commonly needed techniques and FAQs? > > Lastly: We may want to separate out examples (mainly for enhancing the > user guide) from tutorials, where the latter would be mainly formatted > as exercises that one could actually walk through, as opposed to > examples that one simple reads, copies and tries. > > The latter (tutorials) are much more work and harder to test, and thus > we could simply enhance the tutorials that already exist while we > focus more on writing a larger set of tested and annotated examples. > > - Mike -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Tue Jun 1 09:44:37 2010 From: wilde at mcs.anl.gov (wilde at mcs.anl.gov) Date: Tue, 1 Jun 2010 09:44:37 -0500 (CDT) Subject: [Swift-devel] Possible bug with iteration in stable branch In-Reply-To: <28725618.253561275403368403.JavaMail.root@zimbra> Message-ID: <6681855.253671275403477468.JavaMail.root@zimbra> David, you can find some discussion on this issue in the swift-devel and/or swift-user archives. Im forwarding two relevant messages on it, below. We should review these and decide if this issue should just be documented for now, including an example of the typical problem and its remedy. We need to think about whether this behavior could and should be improved. - Mike ----- Forwarded Message ----- From: "Ben Clifford" To: "Erin Hodgess" Cc: swift-user at ci.uchicago.edu Sent: Tuesday, June 16, 2009 5:26:25 PM GMT -06:00 US/Canada Central Subject: Re: [Swift-user] still stuck on fold9.swift On Tue, 16 Jun 2009, Hodgess, Erin wrote: > Ok. We're back to fold9.swift again, but it's saying that there are > multiple writers. > These are in /home/erin/swift1. > > Is it because of the recursive nature of the a[v+1] setup, maybe? This is a static compile time analysis problem - Swift looks at the source code and sees that you are assigning to the a[] array in one place (the a[0] statement, outside of iterate) and again in another place (the a[v+1] place inside of iterate). Its bothered me in the past that this hasn't worked, but I hadn't realised that it did at one stage actually work (which it must have done to be written in the tutorial). Its probably useful to file a bug about this, then - its a comment I have from doing things with the 3rd provenance challenge over the past couple of months too. I think in the long term its a use that should be accepted, but the syntax of Swift makes this kind of analysis incredibly awkward to get right. You can maybe around this by something like this (untested): replace a[v+1] = countstep(a[v]); with if(v==0) { a[v+1] = countstep(startfile); } else { a[v+1] = countstep(a[v]); } and replace the a[0] assignment line with: countfile startfile = echo("793578934574893"); -- _______________________________________________ Swift-user mailing list Swift-user at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-user ---- ----- Forwarded Message ----- From: "Michael Wilde" To: "Tiberiu Stef-Praun" Cc: "swift-devel" , swift-user at ci.uchicago.edu Sent: Wednesday, February 17, 2010 10:56:33 AM GMT -06:00 US/Canada Central Subject: Re: [Swift-devel] Problem with iterate Tibi, while it may not be the most elegant approach, I think you need to put all statements that set the array statePathFiles *inside* the loop, and conditionally execute the setting of element 0 using an if statement. - Mike ----- "Tiberiu Stef-Praun" wrote: > Hi Guys > > I have put together this really simple iteration example, which I > expected to work (and it used to work in the past): > > ============== > type file; > > app (file initOut) initFunc (string inputString){ > echo inputString stdout=@filename(initOut); > } > > app (file catOut) catFunc (file catIn){ > cat @filename(catIn) @filename(catOut); > } > > runLoop(){ > > file statePathFiles[] suffix=".mat">; > > statePathFiles[0]=initFunc("hello"); > > iterate it{ > trace(@strcat("Iteration: ",it)); > statePathFiles[it+1]=catFunc(statePathFiles[it]); > } until (it==0); > } > > runLoop(); > ================ > > However, I get an error like this: > > Running UC Eval > Could not start execution. > variable statePathFiles has multiple writers. > > > Please suggest solutions for it . > > Thank you > Tibi > > > > -- > Tiberiu (Tibi) Stef-Praun, PhD > Computational Sciences Researcher > Computation Institute > 5640 S. Ellis Ave, #405 > University of Chicago > http://www-unix.mcs.anl.gov/~tiberius/ > > ____________________ -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory ----- "David Kelly" wrote: > Hello, > > I believe there may be a bug with iteration in the stable branch of > Swift. Below is the code I am using (which came from the swift > tutorial): > > iterate.swift > ----------------- > type counterfile; > > (counterfile t) echo(string m) { > app { > echo m stdout=@filename(t); > } > } > > (counterfile t) countstep(counterfile i) { > app { > wcl @filename(i) @filename(t); > } > } > > counterfile a[] ; > > a[0] = echo("793578934574893"); > > iterate v { > a[v+1] = countstep(a[v]); > trace("extract int value ", at extractint(a[v+1])); > } until (@extractint(a[v+1]) <= 1); > ---------------- > > wcl > --------- > #!/bin/bash > echo -n $(wc -c < $1) > $2 > --------- > > Using the development version of swift the script works correctly, > with the following output: > > Swift svn swift-r3335 cog-r2752 > > RunID: 20100528-1243-6on02joa > Progress: > SwiftScript trace: extract int value , 16.0 > SwiftScript trace: extract int value , 2.0 > SwiftScript trace: extract int value , 1.0 > Final status: Finished successfully:4 > > However, when I use the stable branch (either using the tar.gz or by > downloading from svn) I get: > > Could not start execution. > variable a has multiple writers. > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Tue Jun 1 09:59:22 2010 From: wilde at mcs.anl.gov (wilde at mcs.anl.gov) Date: Tue, 1 Jun 2010 09:59:22 -0500 (CDT) Subject: [Swift-devel] Fixing and testing Swift on local and common clusters In-Reply-To: <10078930.254631275404305414.JavaMail.root@zimbra> Message-ID: <17802695.254721275404362695.JavaMail.root@zimbra> Hi David, Jon, Arjun, Dennis, This week I'd like to have us all coordinate around trying, fixing, documenting, and testing the behavior of Swift on several clusters that many local users use: TeraPort, PADS, Fusion, Abe, QueenBee, Ranger, Godzilla, and SisBoomBah I'll explain more later in the day and week, but I wanted to give you a heads-up on this. We'll do this in a way that ties in to each of your different project focus areas. The first 5 of these clusters are PBS, the last 3 are SGE. On the PBS cluster, the issue is that the clusters vary in scheduling policy (allocating resources in cores vs nodes). On the latter 3, our SGE driver is fairly new, and still has issues related to interpretation of cores per node and of the local scheduler configuration. First exercise in this regard is to focus on the first 3 clusters on the list above, and observe the behavior both with and without coasters, with both the stable branch and a locally modified version that I will point you to. David, how well are the exercises oriented to cluster usage at this point? Lets discuss what the new-user roadmap should be for trying Swift on a cluster, and how to provide the needed environment-specific info for CI and Argonne users vs the broader user community. - Mike From wilde at mcs.anl.gov Tue Jun 1 22:06:42 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 1 Jun 2010 22:06:42 -0500 (CDT) Subject: [Swift-devel] OSG Client installation guide Message-ID: <10169258.286701275448002836.JavaMail.root@zimbra> The OSG Client installation guide is at: https://twiki.grid.iu.edu/bin/view/ReleaseDocumentation/ClientInstallationGuide For use when we get to that stage. - Mike From wilde at mcs.anl.gov Tue Jun 1 22:59:31 2010 From: wilde at mcs.anl.gov (wilde at mcs.anl.gov) Date: Tue, 1 Jun 2010 22:59:31 -0500 (CDT) Subject: [Swift-devel] Daily skype conference for student work In-Reply-To: <23668942.287061275450493394.JavaMail.root@zimbra> Message-ID: <30043520.287101275451171883.JavaMail.root@zimbra> Hi All, I'd like to try a daily telecon to get our student projects started. This isnt meant to replace constant discussion on the swift-devel list, but rather as a way to accelerate the startup phase of your work and address questions that are on your mind, and to work around any problems that are blocking you. Lets try for 1100 CDT (1200 EDT; 1300 Brazil BRT). On occasion I wont be able to join, but I'll do my best to be a regular. I think it makes sense to hold the call whether Im available or not. We'll try to keep it under 30 mins, with 15 mins being more ideal, and longer being OK if the call is productive. After a few weeks we'll assess how helpful this is. Justin indicated he wanted to join the call; anyone else working on Swift (using or developing) is welcome. Lets try to do our first call this Thu, and use Skype so Thiago can hopefully join without phone charges. Will this work for all 5 students, Justin, and any others who want to join? I'll see if I can set up a skype conference that people can just join in, ideally. Else I or another host will need to connect to all participants. - Mike From wozniak at mcs.anl.gov Wed Jun 2 09:29:44 2010 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Wed, 2 Jun 2010 09:29:44 -0500 (CDT) Subject: [Swift-devel] Daily skype conference for student work In-Reply-To: <30043520.287101275451171883.JavaMail.root@zimbra> References: <30043520.287101275451171883.JavaMail.root@zimbra> Message-ID: On Tue, 1 Jun 2010, wilde at mcs.anl.gov wrote: > Will this work for all 5 students, Justin, and any others who want to > join? I'll see if I can set up a skype conference that people can just > join in, ideally. Else I or another host will need to connect to all > participants. Works for me. -- Justin M Wozniak From wozniak at mcs.anl.gov Wed Jun 2 12:48:30 2010 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Wed, 2 Jun 2010 12:48:30 -0500 (CDT) Subject: [Swift-devel] coaster worker syntax errors In-Reply-To: References: Message-ID: I came across the same issues and have some preliminary fixes- more to come... On Mon, 24 May 2010, Allan Espinosa wrote: > I would like to use trunk to be able to use the CDM features when > running on OSG resources. In the meantime i'll tryout cog-stable and > swift-trunk. > > swift-r3288 cog-r2752 > > RunID: 20100524-1955-h9to7jt4 > Progress: > Progress: Selecting site:1 Initializing site shared directory:1 Stage in:1 > Progress: Submitting:2 Submitted:1 > Progress: Submitted:2 Active:1 > Failed to transfer wrapper log from > 066-many-20100524-1955-h9to7jt4/info/n on TERAPORT > Progress: Submitted:2 Failed:1 > Execution failed: > Exception in sleep: > Arguments: [300] > Host: TERAPORT > Directory: 066-many-20100524-1955-h9to7jt4/jobs/n/sleep-nivmsfsj > stderr.txt: > > stdout.txt: > > ---- > > Caused by: > Task failed: 0524-550712-000000 Block task ended prematurely > Use of uninitialized value in concatenation (.) or string at > /home/osgvo/engage/.globus/coasters/cscript2747241842007159708.pl line > 192. > Failed to connect: Illegal seek at > /home/osgvo/engage/.globus/coasters/cscript2747241842007159708.pl line > 169. > > > Cleaning up... > Shutting down service at https://128.135.125.118:57300 > Got channel MetaChannel: 988943951 -> GSSSChannel-01884231335(1) > + Done -- Justin M Wozniak From thiago.manel at gmail.com Wed Jun 2 12:50:44 2010 From: thiago.manel at gmail.com (Thiago Manel) Date: Wed, 2 Jun 2010 14:50:44 -0300 Subject: [Swift-devel] Daily skype conference for student work In-Reply-To: References: <30043520.287101275451171883.JavaMail.root@zimbra> Message-ID: Hi guys, I cannot make it tomorrow (it is a holiday here), but I'll join you in the next opportunity. On Wed, Jun 2, 2010 at 11:29 AM, Justin M Wozniak wrote: > On Tue, 1 Jun 2010, wilde at mcs.anl.gov wrote: > > Will this work for all 5 students, Justin, and any others who want to >> join? I'll see if I can set up a skype conference that people can just join >> in, ideally. Else I or another host will need to connect to all >> participants. >> > > Works for me. > > -- > Justin M Wozniak > -- Thiago Emmanuel Pereira da Cunha Silva ----------------------------------------------- www.lsd.ufcg.edu.br/~thiagoepdc silibrina.blogspot.com ----------------------------------------------- Campinenses de todos os pa?ses, uni-vos -------------- next part -------------- An HTML attachment was scrubbed... URL: From aespinosa at cs.uchicago.edu Wed Jun 2 16:45:08 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Wed, 2 Jun 2010 16:45:08 -0500 Subject: [Swift-devel] Heap space being exhausted In-Reply-To: <5036170442587741713@unknownmsgid> References: <5036170442587741713@unknownmsgid> Message-ID: <20100602214508.GA18156@communicado.ci.uchicago.edu> Hi, Ramping up my workflow to 400k+ jobs, I now encounter heap errors. In these runs, the HEAPMAX variable is set to 1024M. Attached is the tarball describing the session output and log files. -Allan -------------- next part -------------- A non-text attachment was scrubbed... Name: heap_errors.tar.gz Type: application/x-gzip Size: 5936680 bytes Desc: not available URL: From aespinosa at cs.uchicago.edu Wed Jun 2 17:32:47 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Wed, 2 Jun 2010 17:32:47 -0500 Subject: [Swift-devel] Re: Heap space being exhausted In-Reply-To: References: <5036170442587741713@unknownmsgid> Message-ID: btw foreach.maxthreads=1024 2010/6/2 Allan Espinosa : > Hi, > > Ramping up my workflow to 400k+ jobs, I now encounter heap errors. In these > runs, the HEAPMAX variable is set to 1024M. > > Attached is the tarball describing the session output and log files. > > -Allan From hategan at mcs.anl.gov Wed Jun 2 17:47:04 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 02 Jun 2010 17:47:04 -0500 Subject: [Swift-devel] Re: Heap space being exhausted In-Reply-To: References: <5036170442587741713@unknownmsgid> Message-ID: <1275518824.13554.0.camel@blabla2.none> On Wed, 2010-06-02 at 17:32 -0500, Allan Espinosa wrote: > btw > > foreach.maxthreads=1024 or more heapmax. > > 2010/6/2 Allan Espinosa : > > Hi, > > > > Ramping up my workflow to 400k+ jobs, I now encounter heap errors. In these > > runs, the HEAPMAX variable is set to 1024M. > > > > Attached is the tarball describing the session output and log files. > > > > -Allan > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From hategan at mcs.anl.gov Wed Jun 2 17:48:23 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 02 Jun 2010 17:48:23 -0500 Subject: [Swift-devel] Re: Heap space being exhausted In-Reply-To: <1275518824.13554.0.camel@blabla2.none> References: <5036170442587741713@unknownmsgid> <1275518824.13554.0.camel@blabla2.none> Message-ID: <1275518903.13647.0.camel@blabla2.none> On Wed, 2010-06-02 at 17:47 -0500, Mihael Hategan wrote: > On Wed, 2010-06-02 at 17:32 -0500, Allan Espinosa wrote: > > btw > > > > foreach.maxthreads=1024 > > or more heapmax. Ehm, I though you found a solution :) What's the swift script? From aespinosa at cs.uchicago.edu Wed Jun 2 18:40:19 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Wed, 2 Jun 2010 18:40:19 -0500 Subject: [Swift-devel] Re: Heap space being exhausted In-Reply-To: <1275518903.13647.0.camel@blabla2.none> References: <5036170442587741713@unknownmsgid> <1275518824.13554.0.camel@blabla2.none> <1275518903.13647.0.camel@blabla2.none> Message-ID: I tried a HEAPMAX of 4GB. No memory problems so far :) 2010/6/2 Mihael Hategan : > On Wed, 2010-06-02 at 17:47 -0500, Mihael Hategan wrote: >> On Wed, 2010-06-02 at 17:32 -0500, Allan Espinosa wrote: >> > btw >> > >> > foreach.maxthreads=1024 >> >> or more heapmax. > > Ehm, I though you found a solution :) > > What's the swift script? From hategan at mcs.anl.gov Wed Jun 2 18:57:46 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 02 Jun 2010 18:57:46 -0500 Subject: [Swift-devel] Re: Heap space being exhausted In-Reply-To: References: <5036170442587741713@unknownmsgid> <1275518824.13554.0.camel@blabla2.none> <1275518903.13647.0.camel@blabla2.none> Message-ID: <1275523066.15361.0.camel@blabla2.none> On Wed, 2010-06-02 at 18:40 -0500, Allan Espinosa wrote: > I tried a HEAPMAX of 4GB. > > No memory problems so far :) Still odd. What's the swift script? I'm asking because foreach.max.threads should work, but it applies to each individual foreach rather than globally. > > 2010/6/2 Mihael Hategan : > > On Wed, 2010-06-02 at 17:47 -0500, Mihael Hategan wrote: > >> On Wed, 2010-06-02 at 17:32 -0500, Allan Espinosa wrote: > >> > btw > >> > > >> > foreach.maxthreads=1024 > >> > >> or more heapmax. > > > > Ehm, I though you found a solution :) > > > > What's the swift script? From aespinosa at cs.uchicago.edu Wed Jun 2 19:03:39 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Wed, 2 Jun 2010 19:03:39 -0500 Subject: [Swift-devel] Re: Heap space being exhausted In-Reply-To: <1275523066.15361.0.camel@blabla2.none> References: <5036170442587741713@unknownmsgid> <1275518824.13554.0.camel@blabla2.none> <1275518903.13647.0.camel@blabla2.none> <1275523066.15361.0.camel@blabla2.none> Message-ID: My Cybershake workflow. its basically a 2 level for loop with varying inner loop sizes. foreach i in (~4k elements) { x = f(); foreach (20-2k elements) { ... } } 2010/6/2 Mihael Hategan : > On Wed, 2010-06-02 at 18:40 -0500, Allan Espinosa wrote: >> I tried a HEAPMAX of 4GB. >> >> No memory problems so far :) > > Still odd. What's the swift script? > > I'm asking because foreach.max.threads should work, but it applies to > each individual foreach rather than globally. > >> >> 2010/6/2 Mihael Hategan : >> > On Wed, 2010-06-02 at 17:47 -0500, Mihael Hategan wrote: >> >> On Wed, 2010-06-02 at 17:32 -0500, Allan Espinosa wrote: >> >> > btw >> >> > >> >> > foreach.maxthreads=1024 >> >> >> >> or more heapmax. >> > >> > Ehm, I though you found a solution :) >> > >> > What's the swift script? > From hategan at mcs.anl.gov Wed Jun 2 19:27:24 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 02 Jun 2010 19:27:24 -0500 Subject: [Swift-devel] Re: Heap space being exhausted In-Reply-To: References: <5036170442587741713@unknownmsgid> <1275518824.13554.0.camel@blabla2.none> <1275518903.13647.0.camel@blabla2.none> <1275523066.15361.0.camel@blabla2.none> Message-ID: <1275524844.16375.1.camel@blabla2.none> On Wed, 2010-06-02 at 19:03 -0500, Allan Espinosa wrote: > My Cybershake workflow. its basically a 2 level for loop with varying > inner loop sizes. > > foreach i in (~4k elements) { > x = f(); > foreach (20-2k elements) { > ... > } > } Yep. You have a winner. Max threads = 1024 * 1024. You should adjust that parameter accordingly. I.e. foreach.max.threads = sqrt(maxTotalThreads). > > 2010/6/2 Mihael Hategan : > > On Wed, 2010-06-02 at 18:40 -0500, Allan Espinosa wrote: > >> I tried a HEAPMAX of 4GB. > >> > >> No memory problems so far :) > > > > Still odd. What's the swift script? > > > > I'm asking because foreach.max.threads should work, but it applies to > > each individual foreach rather than globally. > > > >> > >> 2010/6/2 Mihael Hategan : > >> > On Wed, 2010-06-02 at 17:47 -0500, Mihael Hategan wrote: > >> >> On Wed, 2010-06-02 at 17:32 -0500, Allan Espinosa wrote: > >> >> > btw > >> >> > > >> >> > foreach.maxthreads=1024 > >> >> > >> >> or more heapmax. > >> > > >> > Ehm, I though you found a solution :) > >> > > >> > What's the swift script? > > From aespinosa at cs.uchicago.edu Wed Jun 2 21:47:06 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Wed, 2 Jun 2010 21:47:06 -0500 Subject: [Swift-devel] Re: Heap space being exhausted In-Reply-To: <1275524844.16375.1.camel@blabla2.none> References: <5036170442587741713@unknownmsgid> <1275518824.13554.0.camel@blabla2.none> <1275518903.13647.0.camel@blabla2.none> <1275523066.15361.0.camel@blabla2.none> <1275524844.16375.1.camel@blabla2.none> Message-ID: Ahhh, I always thought its for all the foreach statemtns. Thanks for that clarification! :) -Allan 2010/6/2 Mihael Hategan : > On Wed, 2010-06-02 at 19:03 -0500, Allan Espinosa wrote: >> My Cybershake workflow. ?its basically a 2 level for loop with varying >> inner ?loop sizes. >> >> foreach i in (~4k elements) { >> ? x = f(); >> ? foreach (20-2k elements) { >> ? ? ?... >> ?} >> } > > Yep. You have a winner. Max threads = 1024 * 1024. > > You should adjust that parameter accordingly. I.e. foreach.max.threads = > sqrt(maxTotalThreads). > >> >> 2010/6/2 Mihael Hategan : >> > On Wed, 2010-06-02 at 18:40 -0500, Allan Espinosa wrote: >> >> I tried a HEAPMAX of 4GB. >> >> >> >> No memory problems so far :) >> > >> > Still odd. What's the swift script? >> > >> > I'm asking because foreach.max.threads should work, but it applies to >> > each individual foreach rather than globally. >> > >> >> >> >> 2010/6/2 Mihael Hategan : >> >> > On Wed, 2010-06-02 at 17:47 -0500, Mihael Hategan wrote: >> >> >> On Wed, 2010-06-02 at 17:32 -0500, Allan Espinosa wrote: >> >> >> > btw >> >> >> > >> >> >> > foreach.maxthreads=1024 >> >> >> >> >> >> or more heapmax. >> >> > >> >> > Ehm, I though you found a solution :) >> >> > >> >> > What's the swift script? From hategan at mcs.anl.gov Thu Jun 3 01:20:04 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 03 Jun 2010 01:20:04 -0500 Subject: [Swift-devel] Re: Heap space being exhausted In-Reply-To: References: <5036170442587741713@unknownmsgid> <1275518824.13554.0.camel@blabla2.none> <1275518903.13647.0.camel@blabla2.none> <1275523066.15361.0.camel@blabla2.none> <1275524844.16375.1.camel@blabla2.none> Message-ID: <1275546004.23084.21.camel@blabla2.none> On Wed, 2010-06-02 at 21:47 -0500, Allan Espinosa wrote: > Ahhh, I always thought its for all the foreach statemtns. No. That would open the door to unpredictable deadlocks. Imagine consumer and producer, consumer filling the thread count and waiting on data from producer who would be unable to produce because of the thread limit. From wilde at mcs.anl.gov Thu Jun 3 10:55:42 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 3 Jun 2010 10:55:42 -0500 (CDT) Subject: [Swift-devel] Re: Daily skype conference for student work In-Reply-To: <30043520.287101275451171883.JavaMail.root@zimbra> Message-ID: <30006582.342011275580542168.JavaMail.root@zimbra> For todays call I'll connect to Arjun, David, Ioan, Jon, Justin, and Thiago. Dennis will start (and join the calls) next week. If anyone wants to join the call, text me in skype, and I'll connect to you. Send me email if Im not already in your contacts. I havent seen a way yet to post a skype conference that you can connect in to. We'll have to see how the voice quality works. - Mike ----- wilde at mcs.anl.gov wrote: > Hi All, > > I'd like to try a daily telecon to get our student projects started. > This isnt meant to replace constant discussion on the swift-devel > list, but rather as a way to accelerate the startup phase of your work > and address questions that are on your mind, and to work around any > problems that are blocking you. > > Lets try for 1100 CDT (1200 EDT; 1300 Brazil BRT). On occasion I wont > be able to join, but I'll do my best to be a regular. I think it makes > sense to hold the call whether Im available or not. > > We'll try to keep it under 30 mins, with 15 mins being more ideal, and > longer being OK if the call is productive. > > After a few weeks we'll assess how helpful this is. > > Justin indicated he wanted to join the call; anyone else working on > Swift (using or developing) is welcome. > > Lets try to do our first call this Thu, and use Skype so Thiago can > hopefully join without phone charges. > > Will this work for all 5 students, Justin, and any others who want to > join? I'll see if I can set up a skype conference that people can just > join in, ideally. Else I or another host will need to connect to all > participants. > > - Mike -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From dk0966 at cs.ship.edu Thu Jun 3 11:50:19 2010 From: dk0966 at cs.ship.edu (David Kelly) Date: Thu, 3 Jun 2010 12:50:19 -0400 Subject: [Swift-devel] Re: Notes on maintaining the swift web and docs In-Reply-To: References: <1274982181.6126.2.camel@blabla2.none> <13507461.162241274983748198.JavaMail.root@zimbra> Message-ID: I have updated the wiki for maintaining swift web content. I changed it to reflect the problems that I ran into last week along with their solutions. In case anyone is interested it can be found at: http://www.ci.uchicago.edu/wiki/bin/view/SWFT/MaintainingSwiftWebContent On Thu, May 27, 2010 at 2:24 PM, David Kelly wrote: > Thanks, that did the trick. PDFs are now being created. The diamond issue > in the PHPs was also corrected by your patch. Everything looks good now. > I'll update the wiki with these instructions. > > Regards, > David -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Thu Jun 3 12:15:12 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 3 Jun 2010 12:15:12 -0500 (CDT) Subject: [Swift-devel] Re: Notes on maintaining the swift web and docs In-Reply-To: Message-ID: <6217204.347671275585312545.JavaMail.root@zimbra> Cool, thanks, David. This page and the procedure could use further refinement I think. Things like: - put all tools needed into svn (or a method in SVN to obtain them, not depending on anyone's private login) - clarify/improve the procedure to make the pages locally testable; best to just make all the pages naturally relative to the local copy, so no changes needed before committing - test this for the whole swift web, not just the doc pages. Does that already work? (note that our main page is woefully out of date, and should be spruced up this summer; I'll start on that if I can easily test changes before committing). Thanks, Mike ----- "David Kelly" wrote: > I have updated the wiki for maintaining swift web content. I changed > it to reflect the problems that I ran into last week along with their > solutions. In case anyone is interested it can be found at: > > http://www.ci.uchicago.edu/wiki/bin/view/SWFT/MaintainingSwiftWebContent > > > On Thu, May 27, 2010 at 2:24 PM, David Kelly < dk0966 at cs.ship.edu > > wrote: > > > Thanks, that did the trick. PDFs are now being created. The diamond > issue in the PHPs was also corrected by your patch. Everything looks > good now. I'll update the wiki with these instructions. > > Regards, > David -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Thu Jun 3 14:23:23 2010 From: wilde at mcs.anl.gov (wilde at mcs.anl.gov) Date: Thu, 3 Jun 2010 14:23:23 -0500 (CDT) Subject: [Swift-devel] Error in pbs job submission with coasters in PADS fast queue In-Reply-To: <22970102.357361275592894288.JavaMail.root@zimbra> Message-ID: <6948335.357491275593003465.JavaMail.root@zimbra> Wenjun, The error that was causing your "cat.swift" script to fail in the PADS fast queue as soon as the swift script generates more jobs than "workersPerNode" is that that fast queue is limited to one node. Trapping the pbs submit script (I think debug=true in the login1$ qsub PBS5612813565711306842.submit which has: #PBS -l nodes=2 #PBS -q fast Gives: qsub: Job exceeds queue resource limits MSG=cannot satisfy queue max nodect requirement We (swift developers) need to find out why the error message wasn't made prominent. We need to both document this limitation, make sure the error message gets to the user, and document procedures for finding and debugging Swift's generated .submit files. - Mike From bugzilla-daemon at mcs.anl.gov Thu Jun 3 14:29:26 2010 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Thu, 3 Jun 2010 14:29:26 -0500 (CDT) Subject: [Swift-devel] [Bug 224] New: PBS job submission when maxnodes greater than queue limit Message-ID: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=224 Summary: PBS job submission when maxnodes greater than queue limit Product: Swift Version: unspecified Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Specific site issues AssignedTo: dk0966 at cs.ship.edu ReportedBy: wilde at mcs.anl.gov Swift scripts will fail in the PADS "fast" queue as soon as the swift script generates more jobs than "workersPerNode" because the fast queue is limited to one node. Trapping the pbs submit script (I think debug=true in the login1$ qsub PBS5612813565711306842.submit which has: #PBS -l nodes=2 #PBS -q fast Gives: qsub: Job exceeds queue resource limits MSG=cannot satisfy queue max nodect requirement We (swift developers) need to find out why the error message wasn't made prominent. We need to both document this limitation, make sure the error message gets to the user, and document procedures for finding and debugging Swift's generated .submit files. -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the reporter. From wilde at mcs.anl.gov Thu Jun 3 14:42:39 2010 From: wilde at mcs.anl.gov (wilde at mcs.anl.gov) Date: Thu, 3 Jun 2010 14:42:39 -0500 (CDT) Subject: [Swift-devel] Re: Error in pbs job submission with coasters in PADS fast queue In-Reply-To: <4821688.358121275593616047.JavaMail.root@zimbra> Message-ID: <9151295.358361275594159580.JavaMail.root@zimbra> Forgot to add: for now, a remedy is to change your config: 3000 8 1 1 10 short 0.5 10000 /home/wilde/swift/lab/wwjbug.2010.0602 from queue fast to queue "short" or to use slots 10 and maxnodes 1 This will enable you (and me) to pop up one level and debug the original coaster reply timeout problem that we were trying to re-create here. - Mike ----- wilde at mcs.anl.gov wrote: > Wenjun, > > The error that was causing your "cat.swift" script to fail in the PADS > fast queue as soon as the swift script generates more jobs than > "workersPerNode" is that that fast queue is limited to one node. > > Trapping the pbs submit script (I think debug=true in the > > login1$ qsub PBS5612813565711306842.submit > > which has: > #PBS -l nodes=2 > #PBS -q fast > > Gives: > > qsub: Job exceeds queue resource limits MSG=cannot satisfy queue max > nodect requirement > > We (swift developers) need to find out why the error message wasn't > made prominent. > > We need to both document this limitation, make sure the error message > gets to the user, and document procedures for finding and debugging > Swift's generated .submit files. > > - Mike -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From aespinosa at cs.uchicago.edu Thu Jun 3 19:07:30 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Thu, 3 Jun 2010 19:07:30 -0500 Subject: [Swift-devel] Re: Heap space being exhausted In-Reply-To: <1275546004.23084.21.camel@blabla2.none> References: <5036170442587741713@unknownmsgid> <1275518824.13554.0.camel@blabla2.none> <1275518903.13647.0.camel@blabla2.none> <1275523066.15361.0.camel@blabla2.none> <1275524844.16375.1.camel@blabla2.none> <1275546004.23084.21.camel@blabla2.none> Message-ID: I see. so in this workflow there will be a slow start: foreach ai,i in a[] { y[i] = f(x[i]); foreach bj,j in b[] { z[j] = g(y[i], bj); } } at the start the number of threads will be <= maxthreads, then when some y[i]'s become available there will now be maxthreads*maxthreads threads running. 2010/6/3 Mihael Hategan : > On Wed, 2010-06-02 at 21:47 -0500, Allan Espinosa wrote: >> Ahhh, I always thought its for all the foreach statemtns. > > No. That would open the door to unpredictable deadlocks. Imagine > consumer and producer, consumer filling the thread count and waiting on > data from producer who would be unable to produce because of the thread > limit. From hategan at mcs.anl.gov Thu Jun 3 21:02:47 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 03 Jun 2010 21:02:47 -0500 Subject: [Swift-devel] Re: Heap space being exhausted In-Reply-To: References: <5036170442587741713@unknownmsgid> <1275518824.13554.0.camel@blabla2.none> <1275518903.13647.0.camel@blabla2.none> <1275523066.15361.0.camel@blabla2.none> <1275524844.16375.1.camel@blabla2.none> <1275546004.23084.21.camel@blabla2.none> Message-ID: <1275616967.833.7.camel@blabla2.none> On Thu, 2010-06-03 at 19:07 -0500, Allan Espinosa wrote: > I see. so in this workflow there will be a slow start: > > foreach ai,i in a[] { > y[i] = f(x[i]); > foreach bj,j in b[] { > z[j] = g(y[i], bj); > } > } > > at the start the number of threads will be <= maxthreads, then when > some y[i]'s become available there will now be maxthreads*maxthreads > threads running. Right. The restriction is on each foreach instance. Basically each different scope is an instance. I.e. think of a foreach as a parallelFor in Karajan, which defines a different scope for each iteration. From mandaya at rose-hulman.edu Fri Jun 4 11:33:12 2010 From: mandaya at rose-hulman.edu (Arjun Comar) Date: Fri, 4 Jun 2010 11:33:12 -0500 Subject: [Swift-devel] Progress so far Message-ID: Hey all, I haven't been able to talk in the last two skype calls for the summer students conference calls so Mike asked me to mail the list with what I've been working on, etc. So what I've done so far: I've mostly been going through Swift/OSG/etc. tutorials with the goal of getting myself up to speed as best as I can, as quickly as possible. I've also spent a bunch of time getting the requisite certificates, authentications, etc., and getting them installed in the appropriate places. It took longer that I expected, but the problems I had were more along the lines of me trying to do things manually when I needed to go through web forms (like with ssh keys). I was trying to stick them directly in the appropriate files, but that's not an allowed approach. What I'm working on today: I'd really like to get remote execution of swift scripts working, and hopefully execution on multiple remote machines. I wanted to get it working yesterday but the OSG tutorial took up most of my time. Problems thus far: Nothing major, just a few things where I expected things to work a certain way but they in fact worked a different way. Nothing google and reading the proper FAQs couldn't solve. We'll see if I run into anything major today. -- Arjun Comar, Rose-Hulman '12 -------------- next part -------------- An HTML attachment was scrubbed... URL: From wozniak at mcs.anl.gov Fri Jun 4 13:46:12 2010 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Fri, 4 Jun 2010 13:46:12 -0500 (Central Daylight Time) Subject: [Swift-devel] coaster worker syntax errors In-Reply-To: References: Message-ID: This should be fixed in the latest CoG. On Wed, 2 Jun 2010, Justin M Wozniak wrote: > I came across the same issues and have some preliminary fixes- more to > come... > > On Mon, 24 May 2010, Allan Espinosa wrote: > >> I would like to use trunk to be able to use the CDM features when >> running on OSG resources. In the meantime i'll tryout cog-stable and >> swift-trunk. >> >> swift-r3288 cog-r2752 >> >> RunID: 20100524-1955-h9to7jt4 >> Progress: >> Progress: Selecting site:1 Initializing site shared directory:1 Stage >> in:1 >> Progress: Submitting:2 Submitted:1 >> Progress: Submitted:2 Active:1 >> Failed to transfer wrapper log from >> 066-many-20100524-1955-h9to7jt4/info/n on TERAPORT >> Progress: Submitted:2 Failed:1 >> Execution failed: >> Exception in sleep: >> Arguments: [300] >> Host: TERAPORT >> Directory: 066-many-20100524-1955-h9to7jt4/jobs/n/sleep-nivmsfsj >> stderr.txt: >> >> stdout.txt: >> >> ---- >> >> Caused by: >> Task failed: 0524-550712-000000 Block task ended prematurely >> Use of uninitialized value in concatenation (.) or string at >> /home/osgvo/engage/.globus/coasters/cscript2747241842007159708.pl line >> 192. >> Failed to connect: Illegal seek at >> /home/osgvo/engage/.globus/coasters/cscript2747241842007159708.pl line >> 169. >> >> >> Cleaning up... >> Shutting down service at https://128.135.125.118:57300 >> Got channel MetaChannel: 988943951 -> GSSSChannel-01884231335(1) >> + Done -- Justin M Wozniak From wilde at mcs.anl.gov Fri Jun 4 21:00:56 2010 From: wilde at mcs.anl.gov (wilde at mcs.anl.gov) Date: Fri, 4 Jun 2010 21:00:56 -0500 (CDT) Subject: [Swift-devel] Using coaster provider with jobmanager ssh:pbs In-Reply-To: <13190989.410111275702777947.JavaMail.root@zimbra> Message-ID: <12696451.410261275703256233.JavaMail.root@zimbra> When you use this configuration for running jobs from a submit host to a PBS cluster using ssh to launch the coaster service on the PBS login host, you need to create a GSI proxy (using grid-proxy-init) on both the client and on the remote login host, using the same certificate. 3000 8 1 1 1 fast 0.5 10000 /home/wilde/swift/lab Arjun, this is, I think, what was causing your workflow to fail. I thought, that in the past, we used to get at least a GSI (grid security infrastructure) error in the detailed log file. But I don't see that in this case. Let me know if creating proxies on both sides works for you. Be sure to create it on the right PADS login host. David and Arjun, can you coordinate on integrating this use case into the tutorial (and eventually the Users Guide)? I suggested we do a series of "profiles" (with diagrams) to show the various ways of running Swift locally and remotely, and provide accompanying site file entries. Dennis, when you get started next week and try these cases, we'll want to find a way to do automated tests for them. Thanks, Mike -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From mandaya at rose-hulman.edu Fri Jun 4 23:32:41 2010 From: mandaya at rose-hulman.edu (Arjun Comar) Date: Fri, 4 Jun 2010 23:32:41 -0500 Subject: [Swift-devel] Re: Using coaster provider with jobmanager ssh:pbs In-Reply-To: References: <13190989.410111275702777947.JavaMail.root@zimbra> <12696451.410261275703256233.JavaMail.root@zimbra> Message-ID: Just realized I only sent this to Mike. I'm resending it to swift-devel. On Fri, Jun 4, 2010 at 10:11 PM, Arjun Comar wrote: > Nope, no luck. Here's grid-proxy-info from both: > > pads: > subject : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar 693820/CN=53942264 > issuer : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar 693820 > identity : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar 693820 > type : RFC 3820 compliant impersonation proxy > strength : 512 bits > path : /tmp/x509up_u1857 > timeleft : 11:52:08 > > bridled: > subject : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar > 693820/CN=1363223477 > issuer : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar 693820 > identity : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar 693820 > type : RFC 3820 compliant impersonation proxy > strength : 512 bits > path : /tmp/x509up_u1857 > timeleft : 11:57:52 > > Used the same passphrase to get both proxies,and set no options on > grid-proxy-init. > > Arjun > > > On Fri, Jun 4, 2010 at 9:00 PM, wilde at mcs.anl.gov wrote: > >> When you use this configuration for running jobs from a submit host to a >> PBS cluster using ssh to launch the coaster service on the PBS login host, >> you need to create a GSI proxy (using grid-proxy-init) on both the client >> and on the remote login host, using the same certificate. >> >> >> > jobmanager="ssh:pbs"/> >> 3000 >> 8 >> 1 >> 1 >> 1 >> fast >> 0.5 >> 10000 >> >> /home/wilde/swift/lab >> >> >> Arjun, this is, I think, what was causing your workflow to fail. >> >> I thought, that in the past, we used to get at least a GSI (grid security >> infrastructure) error in the detailed log file. But I don't see that in this >> case. >> >> Let me know if creating proxies on both sides works for you. Be sure to >> create it on the right PADS login host. >> >> David and Arjun, can you coordinate on integrating this use case into the >> tutorial (and eventually the Users Guide)? I suggested we do a series of >> "profiles" (with diagrams) to show the various ways of running Swift locally >> and remotely, and provide accompanying site file entries. Dennis, when you >> get started next week and try these cases, we'll want to find a way to do >> automated tests for them. >> >> Thanks, >> >> Mike >> >> -- >> >> Michael Wilde >> Computation Institute, University of Chicago >> Mathematics and Computer Science Division >> Argonne National Laboratory >> >> > > > -- > Arjun Comar, Rose-Hulman '12 > -- Arjun Comar, Rose-Hulman '12 -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Sat Jun 5 09:53:12 2010 From: wilde at mcs.anl.gov (wilde at mcs.anl.gov) Date: Sat, 5 Jun 2010 09:53:12 -0500 (CDT) Subject: [Swift-devel] Re: Using coaster provider with jobmanager ssh:pbs In-Reply-To: <32926725.417461275749490707.JavaMail.root@zimbra> Message-ID: <25910365.417481275749592759.JavaMail.root@zimbra> Looking at your latest logs, in particular coaster.log in your ~/.globus/coasters dir, Swift is still unable to create a secure connection using GSI: it thinks there is not a valid proxy in /tmp/x509/: Looking at your sites.xml files, this is because you are telling Swift to run at the hostname "login.ci.uchicago.edu" - a load balancing virtual DNS host rotors between login1 and login2 I suspect that the coaster service tried to start on login2 while you made the proxy on login1, or something similar. Its a good exercise for you to examine all the logs involved to confirm or disprove this theory. Look at: - the detailed swift .log file - the $HOME/.globus/coasters/coasters.log file - the $HOME/.globus/scripts PBS submit file, stdout/err, and exitcode files - your proxy files in the local /tmp dirs of the machines that grid-proxy-init was run on - ifconfig (note that pads login hosts have multiple networks) --- login1.pads.ci.uchicago.edu login1$ ls -lt /tmp/x* | head -rw------- 1 arjun ci-users 2995 Jun 4 22:01 /tmp/x509up_u1857 --- I dont have time at the moment to trace this all back for you, but I suggest two steps: 1) specify login1 everywhere you have "login" in sites.xml and auth.defaults 2) look at the logs in your ~/.globus/coasters and /scripts directory, perhaps moving the old logs out to a save/ directory each time (save them for debugging till you resolve this). You'll be able to tell from host names and IP addresses You may need to set GLOBUS_HOSTNAME, but I am not sure about that (see the users guide and swift-user and devel lists for more info on that, then ask on the list if still not clear). If the problem persists after you set everything to use the specific login host login1, then be sure to send the the exact error message your are getting and the locations of all the log files, as even though the top-level error seems the same to you, the logs may indicate that the underlying error changes as you correct various aspects of the configuration and security context. - Mike login1$ grep login.pads *.xml sites.xml: sites.xml: testsites.xml: testsites.xml: ----- "Arjun Comar" wrote: > Just realized I only sent this to Mike. I'm resending it to > swift-devel. > > > On Fri, Jun 4, 2010 at 10:11 PM, Arjun Comar < mandaya at rose-hulman.edu > > wrote: > > > Nope, no luck. Here's grid-proxy-info from both: > > pads: > subject : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar > 693820/CN=53942264 > issuer : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar 693820 > identity : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar 693820 > type : RFC 3820 compliant impersonation proxy > strength : 512 bits > path : /tmp/x509up_u1857 > timeleft : 11:52:08 > > bridled: > subject : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar > 693820/CN=1363223477 > issuer : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar 693820 > identity : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar 693820 > type : RFC 3820 compliant impersonation proxy > strength : 512 bits > path : /tmp/x509up_u1857 > timeleft : 11:57:52 > > Used the same passphrase to get both proxies,and set no options on > grid-proxy-init. > > Arjun > > > > > > On Fri, Jun 4, 2010 at 9:00 PM, wilde at mcs.anl.gov < wilde at mcs.anl.gov > > wrote: > > > When you use this configuration for running jobs from a submit host to > a PBS cluster using ssh to launch the coaster service on the PBS login > host, you need to create a GSI proxy (using grid-proxy-init) on both > the client and on the remote login host, using the same certificate. > > > jobmanager="ssh:pbs"/> > 3000 > 8 > 1 > 1 > 1 > fast > 0.5 > 10000 > > /home/wilde/swift/lab > > > Arjun, this is, I think, what was causing your workflow to fail. > > I thought, that in the past, we used to get at least a GSI (grid > security infrastructure) error in the detailed log file. But I don't > see that in this case. > > Let me know if creating proxies on both sides works for you. Be sure > to create it on the right PADS login host. > > David and Arjun, can you coordinate on integrating this use case into > the tutorial (and eventually the Users Guide)? I suggested we do a > series of "profiles" (with diagrams) to show the various ways of > running Swift locally and remotely, and provide accompanying site file > entries. Dennis, when you get started next week and try these cases, > we'll want to find a way to do automated tests for them. > > Thanks, > > Mike > > -- > > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > > > > -- > Arjun Comar, Rose-Hulman '12 > > > > -- > Arjun Comar, Rose-Hulman '12 -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From mandaya at rose-hulman.edu Sun Jun 6 22:50:43 2010 From: mandaya at rose-hulman.edu (Arjun Comar) Date: Sun, 6 Jun 2010 22:50:43 -0500 Subject: [Swift-devel] Re: Using coaster provider with jobmanager ssh:pbs In-Reply-To: <25910365.417481275749592759.JavaMail.root@zimbra> References: <32926725.417461275749490707.JavaMail.root@zimbra> <25910365.417481275749592759.JavaMail.root@zimbra> Message-ID: Alright, I've been playing with this for a few hours, but I can't manage to get any further. The sites.xml file isn't up to date, the one you want to see is sites-pads-pbs-coasters.xml. So I ran it a couple times, saving logs, etc. and noticed that in the .globus/coasters/coasters.log file, the jvm was being started with a -DGLOBUS_HOSTNAME=login.pads.ci.uchicago. So I tried setting GLOBUS_HOSTNAME to login1.pads.ci.uchicago. But even after that, the log file still showed the former. And the log shows an exception being thrown. So my hunch is to figure out how to force GLOBUS_HOSTNAME to get set. Anyone have any thoughts? Am I barking up the wrong tree? Arjun On Sat, Jun 5, 2010 at 9:53 AM, wilde at mcs.anl.gov wrote: > Looking at your latest logs, in particular coaster.log in your > ~/.globus/coasters dir, Swift is still unable to create a secure connection > using GSI: it thinks there is not a valid proxy in /tmp/x509/: > > Looking at your sites.xml files, this is because you are telling Swift to > run at the hostname "login.ci.uchicago.edu" - a load balancing virtual DNS > host rotors between login1 and login2 > > I suspect that the coaster service tried to start on login2 while you made > the proxy on login1, or something similar. Its a good exercise for you to > examine all the logs involved to confirm or disprove this theory. Look at: > > - the detailed swift .log file > - the $HOME/.globus/coasters/coasters.log file > - the $HOME/.globus/scripts PBS submit file, stdout/err, and exitcode files > - your proxy files in the local /tmp dirs of the machines that > grid-proxy-init was run on > - ifconfig (note that pads login hosts have multiple networks) > > --- > login1.pads.ci.uchicago.edu > login1$ ls -lt /tmp/x* | head > -rw------- 1 arjun ci-users 2995 Jun 4 22:01 /tmp/x509up_u1857 > --- > > I dont have time at the moment to trace this all back for you, but I > suggest two steps: > > 1) specify login1 everywhere you have "login" in sites.xml and > auth.defaults > > 2) look at the logs in your ~/.globus/coasters and /scripts directory, > perhaps moving the old logs out to a save/ directory each time (save them > for debugging till you resolve this). You'll be able to tell from host names > and IP addresses > > You may need to set GLOBUS_HOSTNAME, but I am not sure about that (see the > users guide and swift-user and devel lists for more info on that, then ask > on the list if still not clear). > > If the problem persists after you set everything to use the specific login > host login1, then be sure to send the the exact error message your are > getting and the locations of all the log files, as even though the top-level > error seems the same to you, the logs may indicate that the underlying error > changes as you correct various aspects of the configuration and security > context. > > - Mike > > > > login1$ grep login.pads *.xml > sites.xml: provider="ssh"/> > sites.xml: > testsites.xml: > testsites.xml: > > > > ----- "Arjun Comar" wrote: > > > Just realized I only sent this to Mike. I'm resending it to > > swift-devel. > > > > > > On Fri, Jun 4, 2010 at 10:11 PM, Arjun Comar < mandaya at rose-hulman.edu > > > wrote: > > > > > > Nope, no luck. Here's grid-proxy-info from both: > > > > pads: > > subject : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar > > 693820/CN=53942264 > > issuer : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar 693820 > > identity : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar 693820 > > type : RFC 3820 compliant impersonation proxy > > strength : 512 bits > > path : /tmp/x509up_u1857 > > timeleft : 11:52:08 > > > > bridled: > > subject : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar > > 693820/CN=1363223477 > > issuer : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar 693820 > > identity : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar 693820 > > type : RFC 3820 compliant impersonation proxy > > strength : 512 bits > > path : /tmp/x509up_u1857 > > timeleft : 11:57:52 > > > > Used the same passphrase to get both proxies,and set no options on > > grid-proxy-init. > > > > Arjun > > > > > > > > > > > > On Fri, Jun 4, 2010 at 9:00 PM, wilde at mcs.anl.gov < wilde at mcs.anl.gov > > > wrote: > > > > > > When you use this configuration for running jobs from a submit host to > > a PBS cluster using ssh to launch the coaster service on the PBS login > > host, you need to create a GSI proxy (using grid-proxy-init) on both > > the client and on the remote login host, using the same certificate. > > > > > > > jobmanager="ssh:pbs"/> > > 3000 > > 8 > > 1 > > 1 > > 1 > > fast > > 0.5 > > 10000 > > > > /home/wilde/swift/lab > > > > > > Arjun, this is, I think, what was causing your workflow to fail. > > > > I thought, that in the past, we used to get at least a GSI (grid > > security infrastructure) error in the detailed log file. But I don't > > see that in this case. > > > > Let me know if creating proxies on both sides works for you. Be sure > > to create it on the right PADS login host. > > > > David and Arjun, can you coordinate on integrating this use case into > > the tutorial (and eventually the Users Guide)? I suggested we do a > > series of "profiles" (with diagrams) to show the various ways of > > running Swift locally and remotely, and provide accompanying site file > > entries. Dennis, when you get started next week and try these cases, > > we'll want to find a way to do automated tests for them. > > > > Thanks, > > > > Mike > > > > -- > > > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > > > > > > > -- > > Arjun Comar, Rose-Hulman '12 > > > > > > > > -- > > Arjun Comar, Rose-Hulman '12 > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > -- Arjun Comar, Rose-Hulman '12 -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Mon Jun 7 00:13:48 2010 From: wilde at mcs.anl.gov (wilde at mcs.anl.gov) Date: Mon, 7 Jun 2010 00:13:48 -0500 (CDT) Subject: [Swift-devel] Re: Using coaster provider with jobmanager ssh:pbs In-Reply-To: <6014158.436281275886873305.JavaMail.root@zimbra> Message-ID: <29681735.436631275887628657.JavaMail.root@zimbra> Arjun, looking briefly at your logs, it seems like the run you tried at about 18:36 on Friday came close - it shows in your coasters.log file that it failed because there was no valid proxy on login 1. After that, you reverted from using the more recent stable branch code (from /home/wilde/swift/src/stable/.../dist/ back tp the old 0.9 release in /common. Like I mentioned Friday the old 0.9 release does not have the latest ssh provider code and thus doesnt recognize your auth.default parameters. So use my swift (or build your own from stable branch), make sure you have a valid proxy on both sides, and try again. I suspect that will progress further. You can see that after you reverted back to 0.9, Swift never again got as far as starting coasters (from your ~/.globus/coasters/coasters.log file) because the ssh likely failed (I suspect). - Mike >From your .log files: login1$ fgrep .home $(ls -1t hello*.log | head -20) helloworld-20100606-2209-uuldx126.log: vds.home = /software/common/swift-0.9-r1/bin/.. helloworld-20100606-2207-n9aul0q5.log: vds.home = /software/common/swift-0.9-r1/bin/.. helloworld-20100606-2204-f2x1rm9f.log: vds.home = /software/common/swift-0.9-r1/bin/.. helloworld-20100606-1958-zf7ppjl6.log: vds.home = /software/common/swift-0.9-r1/bin/.. helloworld-20100604-2208-omool1yb.log: vds.home = /software/common/swift-0.9-r1/bin/.. helloworld-20100604-2206-17fmgozg.log: vds.home = /software/common/swift-0.9-r1/bin/.. helloworld-20100604-1836-jp5jbuy5.log: swift.home = /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/.. helloworld-20100604-1835-83mngdfe.log: swift.home = /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/.. helloworld-20100604-1835-mvmb56f5.log: swift.home = /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/.. helloworld-20100604-1834-833fef14.log: swift.home = /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/.. helloworld-20100604-1833-7tgi5o87.log: swift.home = /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/.. helloworld-20100604-1832-gbenp2xa.log: swift.home = /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/.. helloworld-20100604-1831-044dbd38.log: swift.home = /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/.. helloworld-20100604-1830-ua5qxocg.log: swift.home = /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/.. helloworld-20100604-1827-b31yuh98.log: swift.home = /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/.. helloworld-20100604-1826-zxygui3c.log: swift.home = /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/.. helloworld-20100604-1824-iym4edt3.log: swift.home = /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/.. helloworld-20100604-1820-74936sp7.log: swift.home = /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/.. login1$ ----- "Arjun Comar" wrote: > Alright, I've been playing with this for a few hours, but I can't > manage to get any further. The sites.xml file isn't up to date, the > one you want to see is sites-pads-pbs-coasters.xml. So I ran it a > couple times, saving logs, etc. and noticed that in the > .globus/coasters/coasters.log file, the jvm was being started with a > -DGLOBUS_HOSTNAME=login.pads.ci.uchicago. So I tried setting > GLOBUS_HOSTNAME to login1.pads.ci.uchicago. But even after that, the > log file still showed the former. And the log shows an exception being > thrown. So my hunch is to figure out how to force GLOBUS_HOSTNAME to > get set. Anyone have any thoughts? Am I barking up the wrong tree? > > Arjun > > > On Sat, Jun 5, 2010 at 9:53 AM, wilde at mcs.anl.gov < wilde at mcs.anl.gov > > wrote: > > > Looking at your latest logs, in particular coaster.log in your > ~/.globus/coasters dir, Swift is still unable to create a secure > connection using GSI: it thinks there is not a valid proxy in > /tmp/x509/: > > Looking at your sites.xml files, this is because you are telling Swift > to run at the hostname " login.ci.uchicago.edu " - a load balancing > virtual DNS host rotors between login1 and login2 > > I suspect that the coaster service tried to start on login2 while you > made the proxy on login1, or something similar. Its a good exercise > for you to examine all the logs involved to confirm or disprove this > theory. Look at: > > - the detailed swift .log file > - the $HOME/.globus/coasters/coasters.log file > - the $HOME/.globus/scripts PBS submit file, stdout/err, and exitcode > files > - your proxy files in the local /tmp dirs of the machines that > grid-proxy-init was run on > - ifconfig (note that pads login hosts have multiple networks) > > --- > > login1.pads.ci.uchicago.edu > login1$ ls -lt /tmp/x* | head > -rw------- 1 arjun ci-users 2995 Jun 4 22:01 /tmp/x509up_u1857 > --- > > I dont have time at the moment to trace this all back for you, but I > suggest two steps: > > 1) specify login1 everywhere you have "login" in sites.xml and > auth.defaults > > 2) look at the logs in your ~/.globus/coasters and /scripts directory, > perhaps moving the old logs out to a save/ directory each time (save > them for debugging till you resolve this). You'll be able to tell from > host names and IP addresses > > You may need to set GLOBUS_HOSTNAME, but I am not sure about that (see > the users guide and swift-user and devel lists for more info on that, > then ask on the list if still not clear). > > If the problem persists after you set everything to use the specific > login host login1, then be sure to send the the exact error message > your are getting and the locations of all the log files, as even > though the top-level error seems the same to you, the logs may > indicate that the underlying error changes as you correct various > aspects of the configuration and security context. > > - Mike > > > > login1$ grep login.pads *.xml > sites.xml: provider="ssh"/> > sites.xml: provider="ssh"/> > testsites.xml: > testsites.xml: > > > > > > > ----- "Arjun Comar" < mandaya at rose-hulman.edu > wrote: > > > Just realized I only sent this to Mike. I'm resending it to > > swift-devel. > > > > > > On Fri, Jun 4, 2010 at 10:11 PM, Arjun Comar < > mandaya at rose-hulman.edu > > > wrote: > > > > > > Nope, no luck. Here's grid-proxy-info from both: > > > > pads: > > subject : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar > > 693820/CN=53942264 > > issuer : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar 693820 > > identity : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar 693820 > > type : RFC 3820 compliant impersonation proxy > > strength : 512 bits > > path : /tmp/x509up_u1857 > > timeleft : 11:52:08 > > > > bridled: > > subject : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar > > 693820/CN=1363223477 > > issuer : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar 693820 > > identity : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar 693820 > > type : RFC 3820 compliant impersonation proxy > > strength : 512 bits > > path : /tmp/x509up_u1857 > > timeleft : 11:57:52 > > > > Used the same passphrase to get both proxies,and set no options on > > grid-proxy-init. > > > > Arjun > > > > > > > > > > > > On Fri, Jun 4, 2010 at 9:00 PM, wilde at mcs.anl.gov < > wilde at mcs.anl.gov > > > wrote: > > > > > > When you use this configuration for running jobs from a submit host > to > > a PBS cluster using ssh to launch the coaster service on the PBS > login > > host, you need to create a GSI proxy (using grid-proxy-init) on both > > the client and on the remote login host, using the same certificate. > > > > > > > jobmanager="ssh:pbs"/> > > 3000 > > 8 > > 1 > > 1 > > 1 > > fast > > 0.5 > > 10000 > > > > /home/wilde/swift/lab > > > > > > Arjun, this is, I think, what was causing your workflow to fail. > > > > I thought, that in the past, we used to get at least a GSI (grid > > security infrastructure) error in the detailed log file. But I don't > > see that in this case. > > > > Let me know if creating proxies on both sides works for you. Be sure > > to create it on the right PADS login host. > > > > David and Arjun, can you coordinate on integrating this use case > into > > the tutorial (and eventually the Users Guide)? I suggested we do a > > series of "profiles" (with diagrams) to show the various ways of > > running Swift locally and remotely, and provide accompanying site > file > > entries. Dennis, when you get started next week and try these cases, > > we'll want to find a way to do automated tests for them. > > > > Thanks, > > > > Mike > > > > -- > > > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > > > > > > > -- > > Arjun Comar, Rose-Hulman '12 > > > > > > > > -- > > Arjun Comar, Rose-Hulman '12 > > -- > > > > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > > > > -- > Arjun Comar, Rose-Hulman '12 -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From dk0966 at cs.ship.edu Mon Jun 7 05:53:07 2010 From: dk0966 at cs.ship.edu (David Kelly) Date: Mon, 7 Jun 2010 06:53:07 -0400 Subject: [Swift-devel] swiftconfig Message-ID: Hello all, I am working on a utility to modify configuration files called swiftconfig. This is still in the early stages, so there is a lot of room for changes and new ideas. I believe there is some overlap between this project and what some other students will be doing this summer, so if anyone would like to work with me on this, please feel free. I envision swiftconfig as a simple text-based configuration program. It will be written in Perl and use the curses library for easier editing. It should hopefully make swift configuration a little easier and prevent silly mistakes like typos in xml which could keep swift from running. Everything that can be done within the editor should also be able to be done directly from the command line. This should make it easier to expand upon in the future. For example, a web or GUI based application could be written fairly quickly that would only need to call swiftconfig with the correct command line options. There are three files swiftconfig can modify: tc.data, sites.xml, and auth.defaults. The options for transformation mode include -host # Host name -name # Translation name -path # Path to executable -profile # Profile arguments, defaults to null -tcfile # Location of tc file. If not specified, find tc.data based on location of swift -overwrite # If a duplicate is found, overwrite the old entry without prompting Since platform and installation status are no longer used, they will default to INTEL32::LINUX and INSTALLED. Here is an example of swiftconfig in transformation mode. $ swiftconfig -host localhost -name wc -path /usr/bin/wc tc.data should then have the line: localhost wc /usr/bin/wc INSTALLED INTEL32::LINUX null If there is already an entry with the name wc, it should prompt the user to answer yes/no if the user wants to overwrite it (unless -overwrite is given) For sites.xml, swiftconfig should allow the user to use existing examples or specify their own. Here are the options: -template # Use existing commented example for defaults (skynet, teraport, etc) -entry # Name of new entry (pool handle) -gridftp # Specify gridftp url -jobuniverse # Specify jobmanager universe -joburl # Specify jobmanager url -jobmajor # Specify jobmanager major value -jobminor # Specify jobmanager minor value -directory # Work directory -exprovider # Execution provider -exmanager # Execution job manager -exurl # Execution url -remove # Remove (comment out) an entry from sites.xml So, for example suppose a user has the following entry in sites.xml by default: The command: $ swiftconfig -template teraport Which would uncomment that from sites.xml as is. The user could also modify just a part of it: $ swiftconfig -template teraport -directory /tmp That should modify only the workdirectory and leave everything else the same. To create your own config, use -entry instead of -template $ swiftconfig -entry mynetwork -gridftp ftp.foo -exprovider gt4 (.. and so on) The final mode of swiftconfig is for auth.log in ssh configurations. -auth # Set to auth mode -sshhost # Name of remote ssh host -sshmode # Either password or passphrase -sshuser # SSH username -sshpassword # SSH password -sshpassphrase # SSH passphrase -sshkey # Location of SSH key Any other ideas or suggestions on how swiftconfig should work are welcome. David -------------- next part -------------- An HTML attachment was scrubbed... URL: From benc at hawaga.org.uk Mon Jun 7 06:20:52 2010 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 7 Jun 2010 11:20:52 +0000 (GMT) Subject: [Swift-devel] swiftconfig In-Reply-To: References: Message-ID: You could make profile handling richer: you can have profile entries in sites.xml. In both tc.data and sites.xml, there can be multiple profile entries. It would look nice if you have options to manipulate those directly, rather than setting all the profile entries at once. -- http://www.hawaga.org.uk/ben/ From mandaya at rose-hulman.edu Mon Jun 7 06:37:01 2010 From: mandaya at rose-hulman.edu (Arjun Comar) Date: Mon, 7 Jun 2010 06:37:01 -0500 Subject: [Swift-devel] Re: Using coaster provider with jobmanager ssh:pbs In-Reply-To: <29681735.436631275887628657.JavaMail.root@zimbra> References: <6014158.436281275886873305.JavaMail.root@zimbra> <29681735.436631275887628657.JavaMail.root@zimbra> Message-ID: You're right, I'd thought I stuck the PATH info to bashrc but looks like I forgot to. I fixed it and reran, and now I've got a totally new problem, though I suspect my internet connection on this one. When I try and run the script this time, rather than crash, it just loops on "Initializing site shared directory" a la: [arjun at bridled ~]$ swift -sites.file .swift/sites-pads-pbs-coasters.xml -tc.file .swift/tc.data helloworld.swift Swift svn swift-r3258 cog-r2726 RunID: 20100607-0624-5dz82mtc Progress: Progress: Initializing site shared directory:1 Progress: Initializing site shared directory:1 Progress: Initializing site shared directory:1 Progress: Initializing site shared directory:1 Progress: Initializing site shared directory:1 Progress: Initializing site shared directory:1 Progress: Initializing site shared directory:1 Progress: Initializing site shared directory:1 Progress: Initializing site shared directory:1 Progress: Initializing site shared directory:1 Progress: Initializing site shared directory:1 Progress: Initializing site shared directory:1 Progress: Initializing site shared directory:1 Progress: Initializing site shared directory:1 Progress: Initializing site shared directory:1 ad nauseaum. I've had internet issues all night so I'm wondering if it's not a problem due to that, so I'll confirm once I come to Argonne in a couple hours. Haven't checked the logs yet, I'll do that at Argonne. Arjun On Mon, Jun 7, 2010 at 12:13 AM, wilde at mcs.anl.gov wrote: > Arjun, looking briefly at your logs, it seems like the run you tried at > about 18:36 on Friday came close - it shows in your coasters.log file that > it failed because there was no valid proxy on login 1. > > After that, you reverted from using the more recent stable branch code > (from /home/wilde/swift/src/stable/.../dist/ back tp the old 0.9 release in > /common. > > Like I mentioned Friday the old 0.9 release does not have the latest ssh > provider code and thus doesnt recognize your auth.default parameters. > > So use my swift (or build your own from stable branch), make sure you have > a valid proxy on both sides, and try again. I suspect that will progress > further. > > You can see that after you reverted back to 0.9, Swift never again got as > far as starting coasters (from your ~/.globus/coasters/coasters.log file) > because the ssh likely failed (I suspect). > > - Mike > > From your .log files: > > login1$ fgrep .home $(ls -1t hello*.log | head -20) > > helloworld-20100606-2209-uuldx126.log: vds.home = > /software/common/swift-0.9-r1/bin/.. > helloworld-20100606-2207-n9aul0q5.log: vds.home = > /software/common/swift-0.9-r1/bin/.. > helloworld-20100606-2204-f2x1rm9f.log: vds.home = > /software/common/swift-0.9-r1/bin/.. > helloworld-20100606-1958-zf7ppjl6.log: vds.home = > /software/common/swift-0.9-r1/bin/.. > helloworld-20100604-2208-omool1yb.log: vds.home = > /software/common/swift-0.9-r1/bin/.. > helloworld-20100604-2206-17fmgozg.log: vds.home = > /software/common/swift-0.9-r1/bin/.. > helloworld-20100604-1836-jp5jbuy5.log: swift.home = > /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/.. > helloworld-20100604-1835-83mngdfe.log: swift.home = > /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/.. > helloworld-20100604-1835-mvmb56f5.log: swift.home = > /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/.. > helloworld-20100604-1834-833fef14.log: swift.home = > /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/.. > helloworld-20100604-1833-7tgi5o87.log: swift.home = > /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/.. > helloworld-20100604-1832-gbenp2xa.log: swift.home = > /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/.. > helloworld-20100604-1831-044dbd38.log: swift.home = > /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/.. > helloworld-20100604-1830-ua5qxocg.log: swift.home = > /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/.. > helloworld-20100604-1827-b31yuh98.log: swift.home = > /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/.. > helloworld-20100604-1826-zxygui3c.log: swift.home = > /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/.. > helloworld-20100604-1824-iym4edt3.log: swift.home = > /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/.. > helloworld-20100604-1820-74936sp7.log: swift.home = > /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/.. > login1$ > > > > ----- "Arjun Comar" wrote: > > > Alright, I've been playing with this for a few hours, but I can't > > manage to get any further. The sites.xml file isn't up to date, the > > one you want to see is sites-pads-pbs-coasters.xml. So I ran it a > > couple times, saving logs, etc. and noticed that in the > > .globus/coasters/coasters.log file, the jvm was being started with a > > -DGLOBUS_HOSTNAME=login.pads.ci.uchicago. So I tried setting > > GLOBUS_HOSTNAME to login1.pads.ci.uchicago. But even after that, the > > log file still showed the former. And the log shows an exception being > > thrown. So my hunch is to figure out how to force GLOBUS_HOSTNAME to > > get set. Anyone have any thoughts? Am I barking up the wrong tree? > > > > Arjun > > > > > > On Sat, Jun 5, 2010 at 9:53 AM, wilde at mcs.anl.gov < wilde at mcs.anl.gov > > > wrote: > > > > > > Looking at your latest logs, in particular coaster.log in your > > ~/.globus/coasters dir, Swift is still unable to create a secure > > connection using GSI: it thinks there is not a valid proxy in > > /tmp/x509/: > > > > Looking at your sites.xml files, this is because you are telling Swift > > to run at the hostname " login.ci.uchicago.edu " - a load balancing > > virtual DNS host rotors between login1 and login2 > > > > I suspect that the coaster service tried to start on login2 while you > > made the proxy on login1, or something similar. Its a good exercise > > for you to examine all the logs involved to confirm or disprove this > > theory. Look at: > > > > - the detailed swift .log file > > - the $HOME/.globus/coasters/coasters.log file > > - the $HOME/.globus/scripts PBS submit file, stdout/err, and exitcode > > files > > - your proxy files in the local /tmp dirs of the machines that > > grid-proxy-init was run on > > - ifconfig (note that pads login hosts have multiple networks) > > > > --- > > > > login1.pads.ci.uchicago.edu > > login1$ ls -lt /tmp/x* | head > > -rw------- 1 arjun ci-users 2995 Jun 4 22:01 /tmp/x509up_u1857 > > --- > > > > I dont have time at the moment to trace this all back for you, but I > > suggest two steps: > > > > 1) specify login1 everywhere you have "login" in sites.xml and > > auth.defaults > > > > 2) look at the logs in your ~/.globus/coasters and /scripts directory, > > perhaps moving the old logs out to a save/ directory each time (save > > them for debugging till you resolve this). You'll be able to tell from > > host names and IP addresses > > > > You may need to set GLOBUS_HOSTNAME, but I am not sure about that (see > > the users guide and swift-user and devel lists for more info on that, > > then ask on the list if still not clear). > > > > If the problem persists after you set everything to use the specific > > login host login1, then be sure to send the the exact error message > > your are getting and the locations of all the log files, as even > > though the top-level error seems the same to you, the logs may > > indicate that the underlying error changes as you correct various > > aspects of the configuration and security context. > > > > - Mike > > > > > > > > login1$ grep login.pads *.xml > > sites.xml: > provider="ssh"/> > > sites.xml: > provider="ssh"/> > > testsites.xml: > > testsites.xml: > > > > > > > > > > > > > > ----- "Arjun Comar" < mandaya at rose-hulman.edu > wrote: > > > > > Just realized I only sent this to Mike. I'm resending it to > > > swift-devel. > > > > > > > > > On Fri, Jun 4, 2010 at 10:11 PM, Arjun Comar < > > mandaya at rose-hulman.edu > > > > wrote: > > > > > > > > > Nope, no luck. Here's grid-proxy-info from both: > > > > > > pads: > > > subject : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar > > > 693820/CN=53942264 > > > issuer : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar 693820 > > > identity : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar 693820 > > > type : RFC 3820 compliant impersonation proxy > > > strength : 512 bits > > > path : /tmp/x509up_u1857 > > > timeleft : 11:52:08 > > > > > > bridled: > > > subject : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar > > > 693820/CN=1363223477 > > > issuer : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar 693820 > > > identity : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar 693820 > > > type : RFC 3820 compliant impersonation proxy > > > strength : 512 bits > > > path : /tmp/x509up_u1857 > > > timeleft : 11:57:52 > > > > > > Used the same passphrase to get both proxies,and set no options on > > > grid-proxy-init. > > > > > > Arjun > > > > > > > > > > > > > > > > > > On Fri, Jun 4, 2010 at 9:00 PM, wilde at mcs.anl.gov < > > wilde at mcs.anl.gov > > > > wrote: > > > > > > > > > When you use this configuration for running jobs from a submit host > > to > > > a PBS cluster using ssh to launch the coaster service on the PBS > > login > > > host, you need to create a GSI proxy (using grid-proxy-init) on both > > > the client and on the remote login host, using the same certificate. > > > > > > > > > > > jobmanager="ssh:pbs"/> > > > 3000 > > > 8 > > > 1 > > > 1 > > > 1 > > > fast > > > 0.5 > > > 10000 > > > > > > /home/wilde/swift/lab > > > > > > > > > Arjun, this is, I think, what was causing your workflow to fail. > > > > > > I thought, that in the past, we used to get at least a GSI (grid > > > security infrastructure) error in the detailed log file. But I don't > > > see that in this case. > > > > > > Let me know if creating proxies on both sides works for you. Be sure > > > to create it on the right PADS login host. > > > > > > David and Arjun, can you coordinate on integrating this use case > > into > > > the tutorial (and eventually the Users Guide)? I suggested we do a > > > series of "profiles" (with diagrams) to show the various ways of > > > running Swift locally and remotely, and provide accompanying site > > file > > > entries. Dennis, when you get started next week and try these cases, > > > we'll want to find a way to do automated tests for them. > > > > > > Thanks, > > > > > > Mike > > > > > > -- > > > > > > Michael Wilde > > > Computation Institute, University of Chicago > > > Mathematics and Computer Science Division > > > Argonne National Laboratory > > > > > > > > > > > > > > > -- > > > Arjun Comar, Rose-Hulman '12 > > > > > > > > > > > > -- > > > Arjun Comar, Rose-Hulman '12 > > > > -- > > > > > > > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > > > > > > > -- > > Arjun Comar, Rose-Hulman '12 > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > -- Arjun Comar, Rose-Hulman '12 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mandaya at rose-hulman.edu Mon Jun 7 09:01:09 2010 From: mandaya at rose-hulman.edu (Arjun Comar) Date: Mon, 7 Jun 2010 09:01:09 -0500 Subject: [Swift-devel] Re: Using coaster provider with jobmanager ssh:pbs In-Reply-To: References: <6014158.436281275886873305.JavaMail.root@zimbra> <29681735.436631275887628657.JavaMail.root@zimbra> Message-ID: Ok, so I'm still having the issue, meaning it wasn't just a screwy connection. I peeked into the logs and the first thing that's popping out at me are these lines: 2010-06-07 08:48:18,929-0500 INFO SshPrivateKeyFile Parsing private key file 2010-06-07 08:48:18,935-0500 INFO SshPrivateKeyFile Private key is not in the default format, attempting parse with other supported formats 2010-06-07 08:48:18,944-0500 INFO PublicKeyAuthenticationClient Generating data to sign 2010-06-07 08:48:18,945-0500 INFO PublicKeyAuthenticationClient Preparing public key authentication request 2010-06-07 08:48:19,006-0500 INFO TransportProtocolCommon Sending SSH_MSG_USERAUTH_REQUEST 2010-06-07 08:48:19,051-0500 INFO TransportProtocolCommon Received SSH_MSG_USERAUTH_SUCCESS 2010-06-07 08:48:19,051-0500 INFO ConnectionProtocol Registering connection protocol messages 2010-06-07 08:48:19,052-0500 INFO Service ssh-connection has been requested 2010-06-07 08:48:19,052-0500 INFO Service Starting ssh-connection service thread 2010-06-07 08:48:19,053-0500 INFO AuthenticationProtocolClient Requesting authentication methods 2010-06-07 08:48:19,053-0500 INFO TransportProtocolCommon Sending SSH_MSG_USERAUTH_REQUEST 2010-06-07 08:48:19,056-0500 INFO TransportProtocolCommon Received SSH_MSG_UNIMPLEMENTED And that's the end of the log file. To test things, I tried sticking the wrong password into the auth.defaults file to see if it would give me the same error, but it didn't. This is the same private/public key pair I've been using to ssh in for an interactive shell so I'm pretty sure the key's not at fault. But from what I can tell, it hits that last INFO message, and then produces no further logs. At least, I can't find any more. No files are being produce and stuck into the directory that's created for the run. And no directory is created under the work directory. Anyone have any thoughts? As far as I can tell, all logging stops as soon as that " INFO TransportProtocolCommon Received SSH_MSG_UNIMPLEMENTED" line is reached, and the progress indicator just loops printing "Progress: Initializing site shared directory:1" repeatedly. Arjun On Mon, Jun 7, 2010 at 6:37 AM, Arjun Comar wrote: > You're right, I'd thought I stuck the PATH info to bashrc but looks like I > forgot to. I fixed it and reran, and now I've got a totally new problem, > though I suspect my internet connection on this one. When I try and run the > script this time, rather than crash, it just loops on "Initializing site > shared directory" a la: > [arjun at bridled ~]$ swift -sites.file .swift/sites-pads-pbs-coasters.xml > -tc.file .swift/tc.data helloworld.swift > Swift svn swift-r3258 cog-r2726 > > RunID: 20100607-0624-5dz82mtc > Progress: > Progress: Initializing site shared directory:1 > Progress: Initializing site shared directory:1 > Progress: Initializing site shared directory:1 > Progress: Initializing site shared directory:1 > Progress: Initializing site shared directory:1 > Progress: Initializing site shared directory:1 > Progress: Initializing site shared directory:1 > Progress: Initializing site shared directory:1 > Progress: Initializing site shared directory:1 > Progress: Initializing site shared directory:1 > Progress: Initializing site shared directory:1 > Progress: Initializing site shared directory:1 > Progress: Initializing site shared directory:1 > Progress: Initializing site shared directory:1 > Progress: Initializing site shared directory:1 > > ad nauseaum. I've had internet issues all night so I'm wondering if it's > not a problem due to that, so I'll confirm once I come to Argonne in a > couple hours. Haven't checked the logs yet, I'll do that at Argonne. > > Arjun > > > On Mon, Jun 7, 2010 at 12:13 AM, wilde at mcs.anl.gov wrote: > >> Arjun, looking briefly at your logs, it seems like the run you tried at >> about 18:36 on Friday came close - it shows in your coasters.log file that >> it failed because there was no valid proxy on login 1. >> >> After that, you reverted from using the more recent stable branch code >> (from /home/wilde/swift/src/stable/.../dist/ back tp the old 0.9 release in >> /common. >> >> Like I mentioned Friday the old 0.9 release does not have the latest ssh >> provider code and thus doesnt recognize your auth.default parameters. >> >> So use my swift (or build your own from stable branch), make sure you have >> a valid proxy on both sides, and try again. I suspect that will progress >> further. >> >> You can see that after you reverted back to 0.9, Swift never again got as >> far as starting coasters (from your ~/.globus/coasters/coasters.log file) >> because the ssh likely failed (I suspect). >> >> - Mike >> >> From your .log files: >> >> login1$ fgrep .home $(ls -1t hello*.log | head -20) >> >> helloworld-20100606-2209-uuldx126.log: vds.home = >> /software/common/swift-0.9-r1/bin/.. >> helloworld-20100606-2207-n9aul0q5.log: vds.home = >> /software/common/swift-0.9-r1/bin/.. >> helloworld-20100606-2204-f2x1rm9f.log: vds.home = >> /software/common/swift-0.9-r1/bin/.. >> helloworld-20100606-1958-zf7ppjl6.log: vds.home = >> /software/common/swift-0.9-r1/bin/.. >> helloworld-20100604-2208-omool1yb.log: vds.home = >> /software/common/swift-0.9-r1/bin/.. >> helloworld-20100604-2206-17fmgozg.log: vds.home = >> /software/common/swift-0.9-r1/bin/.. >> helloworld-20100604-1836-jp5jbuy5.log: swift.home = >> /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/.. >> helloworld-20100604-1835-83mngdfe.log: swift.home = >> /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/.. >> helloworld-20100604-1835-mvmb56f5.log: swift.home = >> /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/.. >> helloworld-20100604-1834-833fef14.log: swift.home = >> /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/.. >> helloworld-20100604-1833-7tgi5o87.log: swift.home = >> /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/.. >> helloworld-20100604-1832-gbenp2xa.log: swift.home = >> /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/.. >> helloworld-20100604-1831-044dbd38.log: swift.home = >> /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/.. >> helloworld-20100604-1830-ua5qxocg.log: swift.home = >> /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/.. >> helloworld-20100604-1827-b31yuh98.log: swift.home = >> /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/.. >> helloworld-20100604-1826-zxygui3c.log: swift.home = >> /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/.. >> helloworld-20100604-1824-iym4edt3.log: swift.home = >> /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/.. >> helloworld-20100604-1820-74936sp7.log: swift.home = >> /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin/.. >> login1$ >> >> >> >> ----- "Arjun Comar" wrote: >> >> > Alright, I've been playing with this for a few hours, but I can't >> > manage to get any further. The sites.xml file isn't up to date, the >> > one you want to see is sites-pads-pbs-coasters.xml. So I ran it a >> > couple times, saving logs, etc. and noticed that in the >> > .globus/coasters/coasters.log file, the jvm was being started with a >> > -DGLOBUS_HOSTNAME=login.pads.ci.uchicago. So I tried setting >> > GLOBUS_HOSTNAME to login1.pads.ci.uchicago. But even after that, the >> > log file still showed the former. And the log shows an exception being >> > thrown. So my hunch is to figure out how to force GLOBUS_HOSTNAME to >> > get set. Anyone have any thoughts? Am I barking up the wrong tree? >> > >> > Arjun >> > >> > >> > On Sat, Jun 5, 2010 at 9:53 AM, wilde at mcs.anl.gov < wilde at mcs.anl.gov >> > > wrote: >> > >> > >> > Looking at your latest logs, in particular coaster.log in your >> > ~/.globus/coasters dir, Swift is still unable to create a secure >> > connection using GSI: it thinks there is not a valid proxy in >> > /tmp/x509/: >> > >> > Looking at your sites.xml files, this is because you are telling Swift >> > to run at the hostname " login.ci.uchicago.edu " - a load balancing >> > virtual DNS host rotors between login1 and login2 >> > >> > I suspect that the coaster service tried to start on login2 while you >> > made the proxy on login1, or something similar. Its a good exercise >> > for you to examine all the logs involved to confirm or disprove this >> > theory. Look at: >> > >> > - the detailed swift .log file >> > - the $HOME/.globus/coasters/coasters.log file >> > - the $HOME/.globus/scripts PBS submit file, stdout/err, and exitcode >> > files >> > - your proxy files in the local /tmp dirs of the machines that >> > grid-proxy-init was run on >> > - ifconfig (note that pads login hosts have multiple networks) >> > >> > --- >> > >> > login1.pads.ci.uchicago.edu >> > login1$ ls -lt /tmp/x* | head >> > -rw------- 1 arjun ci-users 2995 Jun 4 22:01 /tmp/x509up_u1857 >> > --- >> > >> > I dont have time at the moment to trace this all back for you, but I >> > suggest two steps: >> > >> > 1) specify login1 everywhere you have "login" in sites.xml and >> > auth.defaults >> > >> > 2) look at the logs in your ~/.globus/coasters and /scripts directory, >> > perhaps moving the old logs out to a save/ directory each time (save >> > them for debugging till you resolve this). You'll be able to tell from >> > host names and IP addresses >> > >> > You may need to set GLOBUS_HOSTNAME, but I am not sure about that (see >> > the users guide and swift-user and devel lists for more info on that, >> > then ask on the list if still not clear). >> > >> > If the problem persists after you set everything to use the specific >> > login host login1, then be sure to send the the exact error message >> > your are getting and the locations of all the log files, as even >> > though the top-level error seems the same to you, the logs may >> > indicate that the underlying error changes as you correct various >> > aspects of the configuration and security context. >> > >> > - Mike >> > >> > >> > >> > login1$ grep login.pads *.xml >> > sites.xml: > > provider="ssh"/> >> > sites.xml: > > provider="ssh"/> >> > testsites.xml: >> > testsites.xml: >> > >> > >> > >> > >> > >> > >> > ----- "Arjun Comar" < mandaya at rose-hulman.edu > wrote: >> > >> > > Just realized I only sent this to Mike. I'm resending it to >> > > swift-devel. >> > > >> > > >> > > On Fri, Jun 4, 2010 at 10:11 PM, Arjun Comar < >> > mandaya at rose-hulman.edu >> > > > wrote: >> > > >> > > >> > > Nope, no luck. Here's grid-proxy-info from both: >> > > >> > > pads: >> > > subject : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar >> > > 693820/CN=53942264 >> > > issuer : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar 693820 >> > > identity : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar 693820 >> > > type : RFC 3820 compliant impersonation proxy >> > > strength : 512 bits >> > > path : /tmp/x509up_u1857 >> > > timeleft : 11:52:08 >> > > >> > > bridled: >> > > subject : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar >> > > 693820/CN=1363223477 >> > > issuer : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar 693820 >> > > identity : /DC=org/DC=doegrids/OU=People/CN=Arjun Comar 693820 >> > > type : RFC 3820 compliant impersonation proxy >> > > strength : 512 bits >> > > path : /tmp/x509up_u1857 >> > > timeleft : 11:57:52 >> > > >> > > Used the same passphrase to get both proxies,and set no options on >> > > grid-proxy-init. >> > > >> > > Arjun >> > > >> > > >> > > >> > > >> > > >> > > On Fri, Jun 4, 2010 at 9:00 PM, wilde at mcs.anl.gov < >> > wilde at mcs.anl.gov >> > > > wrote: >> > > >> > > >> > > When you use this configuration for running jobs from a submit host >> > to >> > > a PBS cluster using ssh to launch the coaster service on the PBS >> > login >> > > host, you need to create a GSI proxy (using grid-proxy-init) on both >> > > the client and on the remote login host, using the same certificate. >> > > >> > > >> > > > > > jobmanager="ssh:pbs"/> >> > > 3000 >> > > 8 >> > > 1 >> > > 1 >> > > 1 >> > > fast >> > > 0.5 >> > > 10000 >> > > >> > > /home/wilde/swift/lab >> > > >> > > >> > > Arjun, this is, I think, what was causing your workflow to fail. >> > > >> > > I thought, that in the past, we used to get at least a GSI (grid >> > > security infrastructure) error in the detailed log file. But I don't >> > > see that in this case. >> > > >> > > Let me know if creating proxies on both sides works for you. Be sure >> > > to create it on the right PADS login host. >> > > >> > > David and Arjun, can you coordinate on integrating this use case >> > into >> > > the tutorial (and eventually the Users Guide)? I suggested we do a >> > > series of "profiles" (with diagrams) to show the various ways of >> > > running Swift locally and remotely, and provide accompanying site >> > file >> > > entries. Dennis, when you get started next week and try these cases, >> > > we'll want to find a way to do automated tests for them. >> > > >> > > Thanks, >> > > >> > > Mike >> > > >> > > -- >> > > >> > > Michael Wilde >> > > Computation Institute, University of Chicago >> > > Mathematics and Computer Science Division >> > > Argonne National Laboratory >> > > >> > > >> > > >> > > >> > > -- >> > > Arjun Comar, Rose-Hulman '12 >> > > >> > > >> > > >> > > -- >> > > Arjun Comar, Rose-Hulman '12 >> > >> > -- >> > >> > >> > >> > Michael Wilde >> > Computation Institute, University of Chicago >> > Mathematics and Computer Science Division >> > Argonne National Laboratory >> > >> > >> > >> > >> > -- >> > Arjun Comar, Rose-Hulman '12 >> >> -- >> Michael Wilde >> Computation Institute, University of Chicago >> Mathematics and Computer Science Division >> Argonne National Laboratory >> >> > > > -- > Arjun Comar, Rose-Hulman '12 > -- Arjun Comar, Rose-Hulman '12 -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Mon Jun 7 09:17:47 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 7 Jun 2010 09:17:47 -0500 (CDT) Subject: [Swift-devel] swiftconfig In-Reply-To: Message-ID: <20236402.442211275920267828.JavaMail.root@zimbra> Davis, good start. Attached is an example of a "run" script from a Swift application (protein folding). Its the one I mentioned that sets up sites.xml, tc.data, and swift.properties using shell "here documents" with variable substitution. Maybe you can get a few ideas from it. A few thoughts on below: - unless its very little work, I'd stay away from curses. Use either plain command like, or simple prompts if interaction is needed. Rather than curses, I'd think youre better off going to the web app that you originally proposed. Maybe thats not such a bad idea. - things that less frequently changed, like the ssh auth parameters or swift.properties, seem more amenable to interactive or forms-based configuration. I'll try to provide more substantive comments later. Mike ----- "David Kelly" wrote: > Hello all, > > I am working on a utility to modify configuration files called > swiftconfig. This is still in the early stages, so there is a lot of > room for changes and new ideas. I believe there is some overlap > between this project and what some other students will be doing this > summer, so if anyone would like to work with me on this, please feel > free. > > I envision swiftconfig as a simple text-based configuration program. > It will be written in Perl and use the curses library for easier > editing. It should hopefully make swift configuration a little easier > and prevent silly mistakes like typos in xml which could keep swift > from running. Everything that can be done within the editor should > also be able to be done directly from the command line. This should > make it easier to expand upon in the future. For example, a web or GUI > based application could be written fairly quickly that would only need > to call swiftconfig with the correct command line options. > > There are three files swiftconfig can modify: tc.data, sites.xml, and > auth.defaults. > > The options for transformation mode include > > -host # Host name > -name # Translation name > -path # Path to executable > -profile # Profile arguments, defaults to null > -tcfile # Location of tc file. If not specified, find tc.data based on > location of swift > -overwrite # If a duplicate is found, overwrite the old entry without > prompting > > Since platform and installation status are no longer used, they will > default to INTEL32::LINUX and INSTALLED. > Here is an example of swiftconfig in transformation mode. > > $ swiftconfig -host localhost -name wc -path /usr/bin/wc > > tc.data should then have the line: > localhost wc /usr/bin/wc INSTALLED INTEL32::LINUX null > > If there is already an entry with the name wc, it should prompt the > user to answer yes/no if the user wants to overwrite it (unless > -overwrite is given) > > For sites.xml, swiftconfig should allow the user to use existing > examples or specify their own. Here are the options: > > -template # Use existing commented example for defaults (skynet, > teraport, etc) > -entry # Name of new entry (pool handle) > -gridftp # Specify gridftp url > -jobuniverse # Specify jobmanager universe > -joburl # Specify jobmanager url > -jobmajor # Specify jobmanager major value > -jobminor # Specify jobmanager minor value > -directory # Work directory > -exprovider # Execution provider > -exmanager # Execution job manager > -exurl # Execution url > -remove # Remove (comment out) an entry from sites.xml > > So, for example suppose a user has the following entry in sites.xml by > default: > > > > The command: > > $ swiftconfig -template teraport > > Which would uncomment that from sites.xml as is. The user could also > modify just a part of it: > > $ swiftconfig -template teraport -directory /tmp > > That should modify only the workdirectory and leave everything else > the same. > > To create your own config, use -entry instead of -template > > $ swiftconfig -entry mynetwork -gridftp ftp.foo -exprovider gt4 (.. > and so on) > > The final mode of swiftconfig is for auth.log in ssh configurations. > > -auth # Set to auth mode > -sshhost # Name of remote ssh host > -sshmode # Either password or passphrase > -sshuser # SSH username > -sshpassword # SSH password > -sshpassphrase # SSH passphrase > -sshkey # Location of SSH key > > Any other ideas or suggestions on how swiftconfig should work are > welcome. > > David > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From mandaya at rose-hulman.edu Mon Jun 7 09:14:40 2010 From: mandaya at rose-hulman.edu (Arjun Comar) Date: Mon, 7 Jun 2010 09:14:40 -0500 Subject: [Swift-devel] Re: swiftconfig Message-ID: Hey David, I'd love to see the source for this, is it in svn? If you'd like, I can write the ncurses frontend (I think that's what you meant, correct me if I'm wrong) while I work on the various quickset profiles functional. Just in case you haven't been filled in yet, Mike's having me work on creating a few quickset profiles for swiftconfig, the idea being that sometimes its just easier to drop in a working set of configurations and tweaking from there rather than starting from scratch every time. So far here are the profiles Mike came up with, and I've been working on: 1) Remote execution via ssh, single site 2) Local execution via PBS, single site 3) Local execution via Coasters+PBS, single site 4) Remote execution via Coasters+SSH:PBS, single site Templates for the sites.xml files for profiles 1-3 are ready, tested, and functional. I'm having issues getting 4 to work (see my correspondence with Mike in swift-devel), but that should hopefully get resolved today. After that, my major priority is to create and test a variety of multi-site profiles. The variety for these profiles is primarily going to come from the number of sites, the coaster settings, the specific scheduler used at each site, etc.. I don't have any particular thoughts on the specifics for these profiles, but I'll be working on that today. In any case, let me know if I can help with anything. Arjun > Hello all, > > I am working on a utility to modify configuration files called swiftconfig. > This is still in the early stages, so there is a lot of room for changes > and > new ideas. I believe there is some overlap between this project and what > some other students will be doing this summer, so if anyone would like to > work with me on this, please feel free. > > I envision swiftconfig as a simple text-based configuration program. It > will > be written in Perl and use the curses library for easier editing. It should > hopefully make swift configuration a little easier and prevent silly > mistakes like typos in xml which could keep swift from running. Everything > that can be done within the editor should also be able to be done directly > from the command line. This should make it easier to expand upon in the > future. For example, a web or GUI based application could be written fairly > quickly that would only need to call swiftconfig with the correct command > line options. > > There are three files swiftconfig can modify: tc.data, sites.xml, and > auth.defaults. > > The options for transformation mode include > > -host # Host name > -name # Translation name > -path # Path to executable > -profile # Profile arguments, defaults to null > -tcfile # Location of tc file. If not specified, find tc.data > based on location of swift > -overwrite # If a duplicate is found, overwrite the old entry without > prompting > > Since platform and installation status are no longer used, they will > default > to INTEL32::LINUX and INSTALLED. > Here is an example of swiftconfig in transformation mode. > > $ swiftconfig -host localhost -name wc -path /usr/bin/wc > > tc.data should then have the line: > localhost wc /usr/bin/wc INSTALLED INTEL32::LINUX null > > If there is already an entry with the name wc, it should prompt the user to > answer yes/no if the user wants to overwrite it (unless -overwrite is > given) > > For sites.xml, swiftconfig should allow the user to use existing examples > or > specify their own. Here are the options: > > -template # Use existing commented example for defaults (skynet, > teraport, etc) > -entry # Name of new entry (pool handle) > -gridftp # Specify gridftp url > -jobuniverse # Specify jobmanager universe > -joburl # Specify jobmanager url > -jobmajor # Specify jobmanager major value > -jobminor # Specify jobmanager minor value > -directory # Work directory > -exprovider # Execution provider > -exmanager # Execution job manager > -exurl # Execution url > -remove # Remove (comment out) an entry from sites.xml > > So, for example suppose a user has the following entry in sites.xml by > default: > > > > The command: > > $ swiftconfig -template teraport > > Which would uncomment that from sites.xml as is. The user could also modify > just a part of it: > > $ swiftconfig -template teraport -directory /tmp > > That should modify only the workdirectory and leave everything else the > same. > > To create your own config, use -entry instead of -template > > $ swiftconfig -entry mynetwork -gridftp ftp.foo -exprovider gt4 (.. and so > on) > > The final mode of swiftconfig is for auth.log in ssh configurations. > > -auth # Set to auth mode > -sshhost # Name of remote ssh host > -sshmode # Either password or passphrase > -sshuser # SSH username > -sshpassword # SSH password > -sshpassphrase # SSH passphrase > -sshkey # Location of SSH key > > Any other ideas or suggestions on how swiftconfig should work are welcome. > > David > > > -- Arjun Comar, Rose-Hulman '12 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bugzilla-daemon at mcs.anl.gov Mon Jun 7 10:56:57 2010 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Mon, 7 Jun 2010 10:56:57 -0500 (CDT) Subject: [Swift-devel] [Bug 225] New: SSH Based Runs fail in current trunk; successful in 0.9 Message-ID: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=225 Summary: SSH Based Runs fail in current trunk; successful in 0.9 Product: Swift Version: unspecified Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: General AssignedTo: hategan at mcs.anl.gov ReportedBy: mandaya at rose-hulman.edu CC: wilde at mcs.anl.gov, wozniak at mcs.anl.gov Created an attachment (id=288) --> (http://bugzilla.mcs.anl.gov/swift/attachment.cgi?id=288) helloworld-20100607-1042-fdsk7h87.log A plain ssh run of swift fails in current trunk. Remote execution, say from bridled to pads, through ssh (no coasters, pbs, etc., though adding them in doesn't change output) causes the ssh connection to hang, causing swift to loop indefinitely producing the following output: [arjun at bridled ~]$ swift -sites.file .swift/sites.xml -tc.file .swift/tc.data helloworld.swift Swift svn swift-r3258 cog-r2726 RunID: 20100607-1042-fdsk7h87 Progress: Progress: Initializing site shared directory:1 Progress: Initializing site shared directory:1 Progress: Initializing site shared directory:1 The only log file produced is the basic log file for the run (helloworld-20100607-*.log, full log attached) that halts at the following section of output: 2010-06-07 10:42:03,061-0500 INFO SshPrivateKeyFile Parsing private key file 2010-06-07 10:42:03,066-0500 INFO SshPrivateKeyFile Private key is not in the default format, attempting parse with other supported formats 2010-06-07 10:42:03,076-0500 INFO PublicKeyAuthenticationClient Generating data to sign 2010-06-07 10:42:03,076-0500 INFO PublicKeyAuthenticationClient Preparing public key authentication request 2010-06-07 10:42:03,139-0500 INFO TransportProtocolCommon Sending SSH_MSG_USERAUTH_REQUEST 2010-06-07 10:42:03,186-0500 INFO TransportProtocolCommon Received SSH_MSG_USERAUTH_SUCCESS 2010-06-07 10:42:03,187-0500 INFO ConnectionProtocol Registering connection protocol messages 2010-06-07 10:42:03,188-0500 INFO Service ssh-connection has been requested 2010-06-07 10:42:03,188-0500 INFO Service Starting ssh-connection service thread 2010-06-07 10:42:03,188-0500 INFO AuthenticationProtocolClient Requesting authentication methods 2010-06-07 10:42:03,188-0500 INFO TransportProtocolCommon Sending SSH_MSG_USERAUTH_REQUEST 2010-06-07 10:42:03,192-0500 INFO TransportProtocolCommon Received SSH_MSG_UNIMPLEMENTED This lead me to suspect the private key pass, etc., but all check out. The private key still functions for interactive ssh logins, and the password is correct. Setting the auth.defaults file's host.password setting to something different produces an exception as expected. It's also important to note that no further log files are produced. There's nothing created (not even the folder) in the work directory. There are no scripts added in .globus/scripts. There are no files added to helloworld-20100607-*.d/. Swift hangs and infinite loops, perpetually adding lines of "Progress: Initializing site shared directory:1" into stdout. This behavior does not exist in the swift-0.9 release, where the script runs perfectly without issues, locally and remotely via SSH. sites.xml: /home/arjun 0 Let me know if anyone can confirm. Arjun -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. You are watching someone on the CC list of the bug. From wilde at mcs.anl.gov Mon Jun 7 12:25:41 2010 From: wilde at mcs.anl.gov (wilde at mcs.anl.gov) Date: Mon, 7 Jun 2010 12:25:41 -0500 (CDT) Subject: [Swift-devel] swiftconfig In-Reply-To: <31022116.455581275931536943.JavaMail.root@zimbra> Message-ID: <30153961.455601275931541141.JavaMail.root@zimbra> With attachment (and sorry for mistyping your name, David ;) -- Davis, good start. Attached is an example of a "run" script from a Swift application (protein folding). Its the one I mentioned that sets up sites.xml, tc.data, and swift.properties using shell "here documents" with variable substitution. Maybe you can get a few ideas from it. A few thoughts on below: - unless its very little work, I'd stay away from curses. Use either plain command like, or simple prompts if interaction is needed. Rather than curses, I'd think youre better off going to the web app that you originally proposed. Maybe thats not such a bad idea. - things that less frequently changed, like the ssh auth parameters or swift.properties, seem more amenable to interactive or forms-based configuration. I'll try to provide more substantive comments later. Mike ----- "David Kelly" wrote: > Hello all, > > I am working on a utility to modify configuration files called > swiftconfig. This is still in the early stages, so there is a lot of > room for changes and new ideas. I believe there is some overlap > between this project and what some other students will be doing this > summer, so if anyone would like to work with me on this, please feel > free. > > I envision swiftconfig as a simple text-based configuration program. > It will be written in Perl and use the curses library for easier > editing. It should hopefully make swift configuration a little easier > and prevent silly mistakes like typos in xml which could keep swift > from running. Everything that can be done within the editor should > also be able to be done directly from the command line. This should > make it easier to expand upon in the future. For example, a web or GUI > based application could be written fairly quickly that would only need > to call swiftconfig with the correct command line options. > > There are three files swiftconfig can modify: tc.data, sites.xml, and > auth.defaults. > > The options for transformation mode include > > -host # Host name > -name # Translation name > -path # Path to executable > -profile # Profile arguments, defaults to null > -tcfile # Location of tc file. If not specified, find tc.data based on > location of swift > -overwrite # If a duplicate is found, overwrite the old entry without > prompting > > Since platform and installation status are no longer used, they will > default to INTEL32::LINUX and INSTALLED. > Here is an example of swiftconfig in transformation mode. > > $ swiftconfig -host localhost -name wc -path /usr/bin/wc > > tc.data should then have the line: > localhost wc /usr/bin/wc INSTALLED INTEL32::LINUX null > > If there is already an entry with the name wc, it should prompt the > user to answer yes/no if the user wants to overwrite it (unless > -overwrite is given) > > For sites.xml, swiftconfig should allow the user to use existing > examples or specify their own. Here are the options: > > -template # Use existing commented example for defaults (skynet, > teraport, etc) > -entry # Name of new entry (pool handle) > -gridftp # Specify gridftp url > -jobuniverse # Specify jobmanager universe > -joburl # Specify jobmanager url > -jobmajor # Specify jobmanager major value > -jobminor # Specify jobmanager minor value > -directory # Work directory > -exprovider # Execution provider > -exmanager # Execution job manager > -exurl # Execution url > -remove # Remove (comment out) an entry from sites.xml > > So, for example suppose a user has the following entry in sites.xml by > default: > > > > The command: > > $ swiftconfig -template teraport > > Which would uncomment that from sites.xml as is. The user could also > modify just a part of it: > > $ swiftconfig -template teraport -directory /tmp > > That should modify only the workdirectory and leave everything else > the same. > > To create your own config, use -entry instead of -template > > $ swiftconfig -entry mynetwork -gridftp ftp.foo -exprovider gt4 (.. > and so on) > > The final mode of swiftconfig is for auth.log in ssh configurations. > > -auth # Set to auth mode > -sshhost # Name of remote ssh host > -sshmode # Either password or passphrase > -sshuser # SSH username > -sshpassword # SSH password > -sshpassphrase # SSH passphrase > -sshkey # Location of SSH key > > Any other ideas or suggestions on how swiftconfig should work are > welcome. > > David > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: run.raptorloops.sh Type: application/x-shellscript Size: 12247 bytes Desc: not available URL: From wilde at mcs.anl.gov Mon Jun 7 17:48:45 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 7 Jun 2010 17:48:45 -0500 (CDT) Subject: [Swift-devel] Clone provider-wonky to create a queue simulator for scheduling simulation Message-ID: <17371416.482721275950925883.JavaMail.root@zimbra> Arjun, If you pursue Mihael's suggestion from his talk last Friday on evaluating the Swift scheduler's behavior with a simulator, you might want to clone provider-wonky to make a provider-queuesim. While provider-wonky creates artificial failures, provider-queuesim could create artificial queuing delays based on various models. - Mike From wilde at mcs.anl.gov Mon Jun 7 22:11:22 2010 From: wilde at mcs.anl.gov (wilde at mcs.anl.gov) Date: Mon, 7 Jun 2010 22:11:22 -0500 (CDT) Subject: [Swift-devel] swiftconfig In-Reply-To: <21818475.486001275966678457.JavaMail.root@zimbra> Message-ID: <11154785.486021275966682364.JavaMail.root@zimbra> David, after reading the command line syntax you proposed, I realize this is going to take more thought and discussion. What I was originally envisioning was a swiftconfig command that stored a set of templates for various sites, and adjusted them based on the individual user (eg the and user-selected options (like local vs Globus vs ssh access; coasters vs non-coasters; and for coasters, a reduced set of options that could tailor the options similarly into a few common profiles). Justin suggested in discussion today that we start by just building a manual catalog of a good set of examples that covers the local and grid systems that many of use use regularly, so that, at least, people can copy a documented example and adjust as needed. This might a good exercise from which we could identify that patterns that a swiftconfig command could use to simplify the process. I started trying to outline some of the patterns; we should match these up against profiles for the 8 or so systems that I mentioned in a message to this list last week (pads, fusion, ranger, etc). Im starting to feel myself swayed by your suggests for an interactive interface, but I find it hard at the moment to see what that would look like. Possible a set of drop-downs or range selection boxes that adjust as the user narrows a site's profile in a fashion that mimics the outline below? - Mike 1. Local immediate execution 1.1 without coasters 1.2 with coasters 2. Local scheduled execution 2.1 without coasters 2.1.1 PBS (eg: TeraPort, PADS, Fusion, many TG sites) 2.1.2 SGE (eg: Ranger, godzilla, sisboombah) 2.1.3 Condor (eg: Purdue TeraGrid Condor pool; HNL condor pool; UC Condor pool???) 1.2 with coasters 2.2.1 PBS (eg: TeraPort, PADS, Fusion, many TG sites 2.2.2 SGE (eg: Ranger, godzilla, sisboombah) 2.2.3 Condor (eg: Purdue TeraGrid Condor pool; HNL condor pool; UC Condor pool???) 3. ssh to remote sites 3.1 to local immediate 3.2 to coasters to coasters pbs, sge, and condor 4. GT2 to remote sites 4.1 Non-Condor-G 4.1.1 Plain 4.1.2 Coasters 4.2 Condor-G 4.2.1 Plain 4.2.2 Coasters ----- "David Kelly" wrote: > Hello all, > > I am working on a utility to modify configuration files called > swiftconfig. This is still in the early stages, so there is a lot of > room for changes and new ideas. I believe there is some overlap > between this project and what some other students will be doing this > summer, so if anyone would like to work with me on this, please feel > free. > > I envision swiftconfig as a simple text-based configuration program. > It will be written in Perl and use the curses library for easier > editing. It should hopefully make swift configuration a little easier > and prevent silly mistakes like typos in xml which could keep swift > from running. Everything that can be done within the editor should > also be able to be done directly from the command line. This should > make it easier to expand upon in the future. For example, a web or GUI > based application could be written fairly quickly that would only need > to call swiftconfig with the correct command line options. > > There are three files swiftconfig can modify: tc.data, sites.xml, and > auth.defaults. > > The options for transformation mode include > > -host # Host name > -name # Translation name > -path # Path to executable > -profile # Profile arguments, defaults to null > -tcfile # Location of tc file. If not specified, find tc.data based on > location of swift > -overwrite # If a duplicate is found, overwrite the old entry without > prompting > > Since platform and installation status are no longer used, they will > default to INTEL32::LINUX and INSTALLED. > Here is an example of swiftconfig in transformation mode. > > $ swiftconfig -host localhost -name wc -path /usr/bin/wc > > tc.data should then have the line: > localhost wc /usr/bin/wc INSTALLED INTEL32::LINUX null > > If there is already an entry with the name wc, it should prompt the > user to answer yes/no if the user wants to overwrite it (unless > -overwrite is given) > > For sites.xml, swiftconfig should allow the user to use existing > examples or specify their own. Here are the options: > > -template # Use existing commented example for defaults (skynet, > teraport, etc) > -entry # Name of new entry (pool handle) > -gridftp # Specify gridftp url > -jobuniverse # Specify jobmanager universe > -joburl # Specify jobmanager url > -jobmajor # Specify jobmanager major value > -jobminor # Specify jobmanager minor value > -directory # Work directory > -exprovider # Execution provider > -exmanager # Execution job manager > -exurl # Execution url > -remove # Remove (comment out) an entry from sites.xml > > So, for example suppose a user has the following entry in sites.xml by > default: > > > > The command: > > $ swiftconfig -template teraport > > Which would uncomment that from sites.xml as is. The user could also > modify just a part of it: > > $ swiftconfig -template teraport -directory /tmp > > That should modify only the workdirectory and leave everything else > the same. > > To create your own config, use -entry instead of -template > > $ swiftconfig -entry mynetwork -gridftp ftp.foo -exprovider gt4 (.. > and so on) > > The final mode of swiftconfig is for auth.log in ssh configurations. > > -auth # Set to auth mode > -sshhost # Name of remote ssh host > -sshmode # Either password or passphrase > -sshuser # SSH username > -sshpassword # SSH password > -sshpassphrase # SSH passphrase > -sshkey # Location of SSH key > > Any other ideas or suggestions on how swiftconfig should work are > welcome. > > David > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From aespinosa at cs.uchicago.edu Mon Jun 7 22:20:25 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Mon, 7 Jun 2010 22:20:25 -0500 Subject: [Swift-devel] swiftconfig In-Reply-To: <11154785.486021275966682364.JavaMail.root@zimbra> References: <21818475.486001275966678457.JavaMail.root@zimbra> <11154785.486021275966682364.JavaMail.root@zimbra> Message-ID: Yup, the CS department has a condor pool 2010/6/7 : > David, after reading the command line syntax you proposed, I realize this is going to take more thought and discussion. > > What I was originally envisioning was a swiftconfig command that stored a set of templates for various sites, and adjusted them based on the individual user (eg the and user-selected options (like local vs Globus vs ssh access; coasters vs non-coasters; and for coasters, a reduced set of options that could tailor the options similarly into a few common profiles). > > Justin suggested in discussion today that we start by just building a manual catalog of a good set of examples that covers the local and grid systems that many of use use regularly, so that, at least, people can copy a documented example and adjust as needed. > > This might a good exercise from which we could identify that patterns that a swiftconfig command could use to simplify the process. > > I started trying to outline some of the patterns; we should match these up against profiles for the 8 or so systems that I mentioned in a message to this list last week (pads, fusion, ranger, etc). > > Im starting to feel myself swayed by your suggests for an interactive interface, but I find it hard at the moment to see what that would look like. Possible a set of drop-downs or range selection boxes that adjust as the user narrows a site's profile in a fashion that mimics the outline below? > > - Mike > > 1. Local immediate execution > > ? ?1.1 without coasters > ? ?1.2 with coasters > > 2. Local scheduled execution > > ? ?2.1 without coasters > > ? ? ? ?2.1.1 PBS (eg: TeraPort, PADS, Fusion, many TG sites) > ? ? ? ?2.1.2 SGE (eg: Ranger, godzilla, sisboombah) > ? ? ? ?2.1.3 Condor (eg: Purdue TeraGrid Condor pool; HNL condor pool; UC Condor pool???) > > ? ?1.2 with coasters > > ? ? ? ?2.2.1 PBS (eg: TeraPort, PADS, Fusion, many TG sites > ? ? ? ?2.2.2 SGE (eg: Ranger, godzilla, sisboombah) > ? ? ? ?2.2.3 Condor (eg: Purdue TeraGrid Condor pool; HNL condor pool; UC Condor pool???) > > 3. ssh to remote sites > > ? ?3.1 to local immediate > ? ?3.2 to coasters > ? ? ? ?to coasters pbs, sge, and condor > > 4. GT2 to remote sites > > ? ?4.1 Non-Condor-G > ? ? ? ?4.1.1 Plain > ? ? ? ?4.1.2 Coasters > ? ?4.2 Condor-G > ? ? ? ?4.2.1 Plain > ? ? ? ?4.2.2 Coasters > From benc at hawaga.org.uk Tue Jun 8 01:00:44 2010 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 8 Jun 2010 06:00:44 +0000 (GMT) Subject: [Swift-devel] swiftconfig In-Reply-To: <11154785.486021275966682364.JavaMail.root@zimbra> References: <11154785.486021275966682364.JavaMail.root@zimbra> Message-ID: > Justin suggested in discussion today that we start by > just > building a > manual catalog of a good set of examples That was done a couple of times before - the main downside is that it rots within a few months because its not so much a one-off project but an ongoing maintenance exercise. -- From mandaya at rose-hulman.edu Tue Jun 8 07:31:50 2010 From: mandaya at rose-hulman.edu (Arjun Comar) Date: Tue, 8 Jun 2010 07:31:50 -0500 Subject: [Swift-devel] Re: Clone provider-wonky to create a queue simulator for scheduling simulation In-Reply-To: <17371416.482721275950925883.JavaMail.root@zimbra> References: <17371416.482721275950925883.JavaMail.root@zimbra> Message-ID: I'm not sure I follow, what is provider-wonky, and where is it located for me to take a look? Arjun On Mon, Jun 7, 2010 at 5:48 PM, Michael Wilde wrote: > Arjun, > > If you pursue Mihael's suggestion from his talk last Friday on evaluating > the Swift scheduler's behavior with a simulator, you might want to clone > provider-wonky to make a provider-queuesim. > > While provider-wonky creates artificial failures, provider-queuesim could > create artificial queuing delays based on various models. > > - Mike > -- Arjun Comar, Rose-Hulman '12 -------------- next part -------------- An HTML attachment was scrubbed... URL: From benc at hawaga.org.uk Tue Jun 8 08:52:47 2010 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 8 Jun 2010 13:52:47 +0000 (GMT) Subject: [Swift-devel] Re: Clone provider-wonky to create a queue simulator for scheduling simulation In-Reply-To: References: <17371416.482721275950925883.JavaMail.root@zimbra> Message-ID: > I'm not sure I follow, what is provider-wonky, and where is it located for > me to take a look? provider-wonky is a modified version of the local execution provider that was intended to help test Swift's reliability mechanisms. The idea is that, although it runs real programs on your local machine, it does so in a way that shows some of the problems with executing on a remote machine - for example, by causing a certain percentage of jobs to fail, or be delayed, or for certain unusual remote-site configurations to be simulated. Its in the swift SVN in the provider-wonky/ directory. It already has some random delay code in there, but there is definitely scope for making that more realistic - I think at present it does something like a normal distribution of delays for each place a delay can occur, which I think is probably not realistic. There doesn't seem much utility in forking queue simulation into a separate provider-queuesim vs keeping all interesting simulation behaviour all in one provider. A note I wrote about it can be found here: http://mail.ci.uchicago.edu/pipermail/swift-devel/2008-May/003140.html That is, however, out of date - there are parameters that you can set now, through some mechanism that I do not recall. -- From wilde at mcs.anl.gov Tue Jun 8 09:52:06 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 8 Jun 2010 09:52:06 -0500 (CDT) Subject: [Swift-devel] swiftconfig In-Reply-To: Message-ID: <5764813.496651276008726211.JavaMail.root@zimbra> One way to prevent the rot may be to publish the configs out of a test suite that validates them on a daily basis. - Mike ----- "Ben Clifford" wrote: > > Justin suggested in discussion today that we start by > > > just > > > building a > > manual catalog of a good set of examples > > That was done a couple of times before - the main downside is that it > rots > within a few months because its not so much a one-off project but an > ongoing maintenance exercise. > > -- From benc at hawaga.org.uk Tue Jun 8 09:54:02 2010 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 8 Jun 2010 14:54:02 +0000 (GMT) Subject: [Swift-devel] swiftconfig In-Reply-To: <5764813.496651276008726211.JavaMail.root@zimbra> References: <5764813.496651276008726211.JavaMail.root@zimbra> Message-ID: > One way to prevent the rot may be to publish the configs out of a test > suite that validates them on a daily basis. Right. That happened somewhat informally before with the entries under tests/sites/ -- From wilde at mcs.anl.gov Tue Jun 8 09:54:49 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 8 Jun 2010 09:54:49 -0500 (CDT) Subject: [Swift-devel] Re: Clone provider-wonky to create a queue simulator for scheduling simulation In-Reply-To: Message-ID: <26739034.496911276008889022.JavaMail.root@zimbra> Thanks, Ben - good info. Keeping it all in provider-wonky sounds great. - Mike ----- "Ben Clifford" wrote: > > I'm not sure I follow, what is provider-wonky, and where is it > located for > > me to take a look? > > provider-wonky is a modified version of the local execution provider > that was intended to help test Swift's reliability mechanisms. > > The idea is that, although it runs real programs on your local > machine, it > does so in a way that shows some of the problems with executing on a > remote machine - for example, by causing a certain percentage of jobs > to > fail, or be delayed, or for certain unusual remote-site configurations > to > be simulated. > > Its in the swift SVN in the provider-wonky/ directory. > > It already has some random delay code in there, but there is > definitely > scope for making that more realistic - I think at present it does > something like a normal distribution of delays for each place a delay > can > occur, which I think is probably not realistic. > > There doesn't seem much utility in forking queue simulation into a > separate provider-queuesim vs keeping all interesting simulation > behaviour > all in one provider. > > A note I wrote about it can be found here: > > http://mail.ci.uchicago.edu/pipermail/swift-devel/2008-May/003140.html > > That is, however, out of date - there are parameters that you can set > now, > through some mechanism that I do not recall. > > -- -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Wed Jun 9 09:10:25 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 9 Jun 2010 09:10:25 -0500 (CDT) Subject: [Swift-devel] Use stable branch Message-ID: <18095305.539671276092625696.JavaMail.root@zimbra> Hi All, As you go through your start-up exercises, you should build Swift from source and use the stable branch. You'll find that running on multiple clusters over ssh to coasters with PBS (eg to run from bridled to PADS and TeraPort) only works in the stable branch. Ive been pointing some of you to my modified stable branch: /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin (put that at the front of your PATH) This build has a few other fixes and some useful default properties. In the coming weeks we'll work to get these fixes out into either trunk and/or stable. - Mike From wilde at mcs.anl.gov Wed Jun 9 10:19:08 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 9 Jun 2010 10:19:08 -0500 (CDT) Subject: [Swift-devel] Start up exercises In-Reply-To: <18095305.539671276092625696.JavaMail.root@zimbra> Message-ID: <6745688.544461276096748464.JavaMail.root@zimbra> David and Arjun, can you pool the configurations and procedures you used for the following initial learning exercises, and work to get them into a form where Dennis, Jon, and Thiago can use them, and in doing so, test them as general tutorial exercises? - local multicore (done by David; should add info on throttling) - ssh to multiple multicore hosts - local pbs - local pbs w/ coasters - ssh to multiple pbs-coasters These would be the first of the "profiles" we discussed on yesterday's conf call. - Mike ----- "Michael Wilde" wrote: > Hi All, > > As you go through your start-up exercises, you should build Swift from > source and use the stable branch. You'll find that running on multiple > clusters over ssh to coasters with PBS (eg to run from bridled to PADS > and TeraPort) only works in the stable branch. > > Ive been pointing some of you to my modified stable branch: > /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin > (put that at the front of your PATH) > > This build has a few other fixes and some useful default properties. > In the coming weeks we'll work to get these fixes out into either > trunk and/or stable. > > - Mike -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From mandaya at rose-hulman.edu Wed Jun 9 17:12:22 2010 From: mandaya at rose-hulman.edu (Arjun Comar) Date: Wed, 9 Jun 2010 17:12:22 -0500 Subject: [Swift-devel] Re: Start up exercises In-Reply-To: <6745688.544461276096748464.JavaMail.root@zimbra> References: <18095305.539671276092625696.JavaMail.root@zimbra> <6745688.544461276096748464.JavaMail.root@zimbra> Message-ID: Sure, give me a few minutes to separate out and test my profiles. I have local pbs, local pbs /w coasters, remote via ssh to pbs + coasters (single machine), and remote via ssh (single machine). I'm still working on getting multiple machines working (because I'm getting an exception trying to run swift + coasters on teraport, and I'm trying to figure out why. I've tracked it to a message in a log file that's complaining about certificates, but those seem in order). Arjun On Wed, Jun 9, 2010 at 10:19 AM, Michael Wilde wrote: > David and Arjun, can you pool the configurations and procedures you used > for the following initial learning exercises, and work to get them into a > form where Dennis, Jon, and Thiago can use them, and in doing so, test them > as general tutorial exercises? > > - local multicore (done by David; should add info on throttling) > - ssh to multiple multicore hosts > - local pbs > - local pbs w/ coasters > - ssh to multiple pbs-coasters > > These would be the first of the "profiles" we discussed on yesterday's conf > call. > > - Mike > > ----- "Michael Wilde" wrote: > > > Hi All, > > > > As you go through your start-up exercises, you should build Swift from > > source and use the stable branch. You'll find that running on multiple > > clusters over ssh to coasters with PBS (eg to run from bridled to PADS > > and TeraPort) only works in the stable branch. > > > > Ive been pointing some of you to my modified stable branch: > > /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin > > (put that at the front of your PATH) > > > > This build has a few other fixes and some useful default properties. > > In the coming weeks we'll work to get these fixes out into either > > trunk and/or stable. > > > > - Mike > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > -- Arjun Comar, Rose-Hulman '12 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mandaya at rose-hulman.edu Wed Jun 9 17:45:22 2010 From: mandaya at rose-hulman.edu (Arjun Comar) Date: Wed, 9 Jun 2010 17:45:22 -0500 Subject: [Swift-devel] Re: Start up exercises In-Reply-To: References: <18095305.539671276092625696.JavaMail.root@zimbra> <6745688.544461276096748464.JavaMail.root@zimbra> Message-ID: Alright, here they are. I'll send out the teraport configurations once I figure them out. I'm including my tc.data file so that you don't have to sit around messing with that. Make sure to change the workdirectory tag in each sites.xml file appropriately. And for the last profile (profile_4, remote ssh to coasters+pbs on pads) make sure to grid-proxy-init on both bridled (or whatever other machine you're running from) and on pads (its configured to run on login1 so use that). Let me know if you need any more help. Arjun On Wed, Jun 9, 2010 at 5:12 PM, Arjun Comar wrote: > Sure, give me a few minutes to separate out and test my profiles. I have > local pbs, local pbs /w coasters, remote via ssh to pbs + coasters (single > machine), and remote via ssh (single machine). I'm still working on getting > multiple machines working (because I'm getting an exception trying to run > swift + coasters on teraport, and I'm trying to figure out why. I've tracked > it to a message in a log file that's complaining about certificates, but > those seem in order). > > Arjun > > > On Wed, Jun 9, 2010 at 10:19 AM, Michael Wilde wrote: > >> David and Arjun, can you pool the configurations and procedures you used >> for the following initial learning exercises, and work to get them into a >> form where Dennis, Jon, and Thiago can use them, and in doing so, test them >> as general tutorial exercises? >> >> - local multicore (done by David; should add info on throttling) >> - ssh to multiple multicore hosts >> - local pbs >> - local pbs w/ coasters >> - ssh to multiple pbs-coasters >> >> These would be the first of the "profiles" we discussed on yesterday's >> conf call. >> >> - Mike >> >> ----- "Michael Wilde" wrote: >> >> > Hi All, >> > >> > As you go through your start-up exercises, you should build Swift from >> > source and use the stable branch. You'll find that running on multiple >> > clusters over ssh to coasters with PBS (eg to run from bridled to PADS >> > and TeraPort) only works in the stable branch. >> > >> > Ive been pointing some of you to my modified stable branch: >> > /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin >> > (put that at the front of your PATH) >> > >> > This build has a few other fixes and some useful default properties. >> > In the coming weeks we'll work to get these fixes out into either >> > trunk and/or stable. >> > >> > - Mike >> >> -- >> Michael Wilde >> Computation Institute, University of Chicago >> Mathematics and Computer Science Division >> Argonne National Laboratory >> >> > > > -- > Arjun Comar, Rose-Hulman '12 > -- Arjun Comar, Rose-Hulman '12 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: profiles.zip Type: application/zip Size: 2436 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: tc.data Type: application/octet-stream Size: 2162 bytes Desc: not available URL: From mandaya at rose-hulman.edu Wed Jun 9 18:37:42 2010 From: mandaya at rose-hulman.edu (Arjun Comar) Date: Wed, 9 Jun 2010 18:37:42 -0500 Subject: [Swift-devel] Re: Start up exercises In-Reply-To: References: <18095305.539671276092625696.JavaMail.root@zimbra> <6745688.544461276096748464.JavaMail.root@zimbra> Message-ID: Good news everyone! I got the teraport configurations working, so I added single site profiles for teraport and two multisite profiles. I'm still learning gridftp, etc., so I don't have any working OSG profiles yet. I've tested everything, so they should all work, but let me know if you have any problems. The tc.data file from my last email should work fine. Arjun On Wed, Jun 9, 2010 at 5:45 PM, Arjun Comar wrote: > Alright, here they are. I'll send out the teraport configurations once I > figure them out. I'm including my tc.data file so that you don't have to sit > around messing with that. Make sure to change the workdirectory tag in each > sites.xml file appropriately. And for the last profile (profile_4, remote > ssh to coasters+pbs on pads) make sure to grid-proxy-init on both bridled > (or whatever other machine you're running from) and on pads (its configured > to run on login1 so use that). Let me know if you need any more help. > > Arjun > > > On Wed, Jun 9, 2010 at 5:12 PM, Arjun Comar wrote: > >> Sure, give me a few minutes to separate out and test my profiles. I have >> local pbs, local pbs /w coasters, remote via ssh to pbs + coasters (single >> machine), and remote via ssh (single machine). I'm still working on getting >> multiple machines working (because I'm getting an exception trying to run >> swift + coasters on teraport, and I'm trying to figure out why. I've tracked >> it to a message in a log file that's complaining about certificates, but >> those seem in order). >> >> Arjun >> >> >> On Wed, Jun 9, 2010 at 10:19 AM, Michael Wilde wrote: >> >>> David and Arjun, can you pool the configurations and procedures you used >>> for the following initial learning exercises, and work to get them into a >>> form where Dennis, Jon, and Thiago can use them, and in doing so, test them >>> as general tutorial exercises? >>> >>> - local multicore (done by David; should add info on throttling) >>> - ssh to multiple multicore hosts >>> - local pbs >>> - local pbs w/ coasters >>> - ssh to multiple pbs-coasters >>> >>> These would be the first of the "profiles" we discussed on yesterday's >>> conf call. >>> >>> - Mike >>> >>> ----- "Michael Wilde" wrote: >>> >>> > Hi All, >>> > >>> > As you go through your start-up exercises, you should build Swift from >>> > source and use the stable branch. You'll find that running on multiple >>> > clusters over ssh to coasters with PBS (eg to run from bridled to PADS >>> > and TeraPort) only works in the stable branch. >>> > >>> > Ive been pointing some of you to my modified stable branch: >>> > /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin >>> > (put that at the front of your PATH) >>> > >>> > This build has a few other fixes and some useful default properties. >>> > In the coming weeks we'll work to get these fixes out into either >>> > trunk and/or stable. >>> > >>> > - Mike >>> >>> -- >>> Michael Wilde >>> Computation Institute, University of Chicago >>> Mathematics and Computer Science Division >>> Argonne National Laboratory >>> >>> >> >> >> -- >> Arjun Comar, Rose-Hulman '12 >> > > > > -- > Arjun Comar, Rose-Hulman '12 > -- Arjun Comar, Rose-Hulman '12 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: profiles.zip Type: application/zip Size: 5382 bytes Desc: not available URL: From mandaya at rose-hulman.edu Wed Jun 9 21:23:46 2010 From: mandaya at rose-hulman.edu (Arjun Comar) Date: Wed, 9 Jun 2010 21:23:46 -0500 Subject: [Swift-devel] TC File Generator Message-ID: Hey all, I'm starting to play with swift across the OSG, and ran into the problem of needing to add every site to my tc.data file. So I wrote a python script that takes a sites.xml file and generates a tc.data file to accompany it. If the input file is named sites-filename.xml then the output file is named sites-filename.tc.data. The script is attached. It does basically everything I need, but I don't think it's general purpose enough to work as a maintainer script for a tc.data file (i.e. applications are hard-coded, and there's no way short of modifying the source to add new apps). At some point I'll modify it to actually be able to add apps to an existing tc.data file if David's perl config script doesn't beat me to it. Let me know what you think and if you find it useful. Arjun Comar, Rose-Hulman '12 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: tc-generator.py Type: text/x-python Size: 1018 bytes Desc: not available URL: From hategan at mcs.anl.gov Wed Jun 9 21:24:44 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 09 Jun 2010 21:24:44 -0500 Subject: [Swift-devel] TC File Generator In-Reply-To: References: Message-ID: <1276136684.5094.1.camel@blabla2.none> There's also a lesser known secret that might work if tc.data info was to somehow be specifiable in sites.xml: sites.xml is a karajan script, so things like foreach could theoretically be used in it (if you said "" somewhere in the beginning of the file). On Wed, 2010-06-09 at 21:23 -0500, Arjun Comar wrote: > Hey all, > I'm starting to play with swift across the OSG, and ran into the > problem of needing to add every site to my tc.data file. So I wrote a > python script that takes a sites.xml file and generates a tc.data file > to accompany it. If the input file is named sites-filename.xml then > the output file is named sites-filename.tc.data. The script is > attached. It does basically everything I need, but I don't think it's > general purpose enough to work as a maintainer script for a tc.data > file (i.e. applications are hard-coded, and there's no way short of > modifying the source to add new apps). At some point I'll modify it to > actually be able to add apps to an existing tc.data file if David's > perl config script doesn't beat me to it. Let me know what you think > and if you find it useful. > > > Arjun Comar, Rose-Hulman '12 > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From benc at hawaga.org.uk Thu Jun 10 02:28:03 2010 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 10 Jun 2010 07:28:03 +0000 (GMT) Subject: [Swift-devel] TC File Generator In-Reply-To: References: Message-ID: often the tc.data file is a mostly needless hassle, such as when pointing at stuff that is on the path and is installed on every site that you will use. but it does contain some information that cannot be automatically discovered - for example I want to use version X of software package A on one site, and version Y of software package A on another site, and those packages are not in the default path. tc.data is possibly (probably?) not the easiest way to express that (for users, rather than for the software to use to lookup). for example, you could say "mogrify and convert are in ImageMagick. and to use ImageMagick first you need to source setup.sh in its etc directory. On site A, imagemagick is at /opt/ImageMagick/1.0/ and on site B it is in /usr and on site C it is not present".". without needing to require the end user to multiple out the package description x executable lines in tc.data then you end up with something that looks a bit more like a multisite softenv, describing packages rather than individual executables. -- http://www.hawaga.org.uk/ben/ From dk0966 at cs.ship.edu Thu Jun 10 08:10:55 2010 From: dk0966 at cs.ship.edu (David Kelly) Date: Thu, 10 Jun 2010 09:10:55 -0400 Subject: [Swift-devel] Re: swiftconfig In-Reply-To: References: Message-ID: Hello, I have attached a preliminary version of swiftconfig. It is a tar.gz file and can be extracted directly within a swift directory. There's still work to do, but the main working feature right now is the templates/profiles. Arjun's profiles are working with swiftconfig. The profiles have been concatenated into one file called etc/sites-template.xml. The handles just had to be changed slightly to avoid duplication. Here is an example: $ swiftconfig -template teraport-remote-ssh This will grab the teraport-remote-ssh profile from etc/sites-templates.xml and update sites.xml. It looks for a working sites.xml based on either $SWIFT_HOME, the location of the swift binary or from a command line option. The command line options in the first email should mostly all be working. So to modify the working directory in the first example: $ swiftconfig -template teraport-remote-ssh -directory /var/tmp Should do the trick. If you need to change an option, you can just run it again with different options (it won't cause multiple entries). Getting rid of an entry can be done with $ swiftconfig -remove teraport-remote-ssh I added a new option, -templates, which prints a list of all available profiles. Transformation catalog should be working as described (to edit existing entries or add new ones) Still to do: Ability to edit sites.xml "profile" options with switches, an editor, web interface, ability to edit other config files like swift.properties and auth.login. Removal of entries from tc.data. More testing. Bugs: The xml module chokes if sites.xml is missing or empty. Comments get stripped. If you find others, feel free to email. Probably a good idea to backup your config files before using. David -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: swiftconfig.tar.gz Type: application/x-gzip Size: 33159 bytes Desc: not available URL: From mandaya at rose-hulman.edu Fri Jun 11 12:56:27 2010 From: mandaya at rose-hulman.edu (Arjun Comar) Date: Fri, 11 Jun 2010 12:56:27 -0500 Subject: [Swift-devel] Site Tester Message-ID: Hey Mihael, You mentioned that you knew the location of a site tester. Is that already in svn? If so, what's it under? To everybody else: Mike suggested that I try and collect together a bunch of site testers. So to that end, if you know the location of any other site testers, let me know. Thanks, Arjun -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Fri Jun 11 13:19:25 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 11 Jun 2010 13:19:25 -0500 Subject: [Swift-devel] Re: Site Tester In-Reply-To: References: Message-ID: <1276280365.30254.1.camel@blabla2.none> On Fri, 2010-06-11 at 12:56 -0500, Arjun Comar wrote: > Hey Mihael, > You mentioned that you knew the location of a site tester. Is that > already in svn? If so, what's it under? There's a basic thing in bin/checksites.k written in Karajan. It goes through a sites file, tries some basic stuff and handles timeouts. But you may want to write something in Java or so. Mihael From hategan at mcs.anl.gov Fri Jun 11 19:23:24 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 11 Jun 2010 19:23:24 -0500 Subject: [Swift-devel] Re: Site Tester In-Reply-To: <1276280365.30254.1.camel@blabla2.none> References: <1276280365.30254.1.camel@blabla2.none> Message-ID: <1276302204.1856.3.camel@blabla2.none> Also, I'm using the following for I2U2: https://trac.ci.uchicago.edu/i2u2/browser/branches/1.3/common/src/jsp/monitor/monitor.k https://trac.ci.uchicago.edu/i2u2/browser/branches/1.3/common/src/jsp/monitor/tests.k It generates machine friendly output which is used by: http://www18.i2u2.org/elab/cosmic/monitor/ (you should probably not try that with I.E.) Mihael On Fri, 2010-06-11 at 13:19 -0500, Mihael Hategan wrote: > On Fri, 2010-06-11 at 12:56 -0500, Arjun Comar wrote: > > Hey Mihael, > > You mentioned that you knew the location of a site tester. Is that > > already in svn? If so, what's it under? > > There's a basic thing in bin/checksites.k written in Karajan. It goes > through a sites file, tries some basic stuff and handles timeouts. > > But you may want to write something in Java or so. > > Mihael > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From wilde at mcs.anl.gov Mon Jun 14 10:49:01 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 14 Jun 2010 10:49:01 -0500 (CDT) Subject: [Swift-devel] Fwd: [pads-notify] Changes to scheduling policy In-Reply-To: Message-ID: <9889466.677331276530541931.JavaMail.root@zimbra> This is relevant to our scheduling and configuration discussions... - Mike ----- Forwarded Message ----- From: "Ti Leggett" To: pads-notify at ci.uchicago.edu Sent: Monday, June 14, 2010 10:37:00 AM GMT -06:00 US/Canada Central Subject: [pads-notify] Changes to scheduling policy After listening to the feedback from you about how jobs flow on the PADS cluster, and after some monitoring and observations by us, we've made some changes to the scheduling policy on PADS. The PADS wiki documentation should already be updated, but I'll explain it here as well. We welcome all feedback, so please don't be shy about praising or criticizing these changes. Our goal is to make PADS a useful development and analysis resource that accommodates a wide range of jobs fairly. We've increased the number of nodes in the development reservation from 3 non-gpu compute nodes to 7, bringing the total of nodes available for "development" jobs - those that are less than 1 hour - to 8: 7 non-gpu nodes and 1 gpu node. This standing reservation is in place from 8am - 7pm Monday thru Friday. Next we changed the priorities of the queues: fast: 3120 short: 2880 long: 1440 extended: 0 What do these numbers really mean? Assuming all things equal and knowing that a job's priority increases by one every minute it's in the queue it will take a job submitted to the extended queue 1 day before it has the same priority of a job submitted to the long queue. A job submitted to the long queue will take 1 day before it has the same priority of a job submitted to the short queue. And a job submitted to the short queue will take 4 hours before it has the same priority as a job submitted to the fast queue. And it will take 2 days, 4 hours before an extended job has the same priority as a job submitted to the fast queue. Keep in mind that these priorities are static and do not change based on how long the job sits idle in the queue so their impact on a job's place in the queue diminishes the longer a job sits idle. The longer a job sits idle in the queue the other 2 factors - queue time and fairshare - have a bigger and bigger impact. All these queue priorities do is give a shorter, smaller job a head start over bigger jobs, but they won't always preempt longer, bigger jobs if those jobs have been waiting in the queue for some time. Which brings us to the topic of fairshare. We've also changed the fairshare window to be the last 7 days instead of 3.5 so more history will be used for determining fairshare usage. Next, we've changed users' fairshare usage to be a target instead of a ceiling. Before, your job's priority would only be decreased if you exceeded fairshare. This is still the case, but now if you are under the fairshare target, your job priority will be increased. We've now implemented a fairshare ceiling for project utilization. Now if your project, as a whole, exceeds a fairshare usage, your job's priority will be decreased but not as much as if you, as a user, exceed your fairshare usage. The project ceiling is also higher that the per user target. Before it was common to see job priorities in the range from -85,000 to +6,000. This seemed a bit excessive. With these new policies in places the range is -600 to +3,000 for the same queued jobs. We hope that these changes will make your PADS experience better overall. _______________________________________________ pads-notify mailing list pads-notify at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/pads-notify -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From mandaya at rose-hulman.edu Tue Jun 15 12:38:39 2010 From: mandaya at rose-hulman.edu (Arjun Comar) Date: Tue, 15 Jun 2010 12:38:39 -0500 Subject: [Swift-devel] TC File Generator 2.0 Message-ID: Hey all, I got a little annoyed with the fact that applications were hard coded into the tc-generator I sent out last week, so I changed it up real fast to read applications from a plaintext file. Usage: /path/to/tc-generator.py sites.xml looks as usual, apps.txt should just be a list of applications like so: app_name /path/to/app Don't use any spaces in the app_name or in the path, and everything is cool. For reference, here's the apps.txt I'm working with right now: echo /bin/echo cat /bin/cat ls /bin/ls grep /bin/grep sort /bin/sort paste /bin/paste Mihael: If you explain how to work the tc specification into the sites.xml file (I assume it involves mucking with swift itself?), I can update this tool for that. David: Let me know how you want to go about merging this into swiftconfig. Perhaps swiftconfig should just call this? Let me know what you think. Let me know if anyone has any problems with this. Arjun Comar, Rose-Hulman '12 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: tc-generator.py Type: text/x-python Size: 1048 bytes Desc: not available URL: From hategan at mcs.anl.gov Tue Jun 15 12:41:50 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 15 Jun 2010 12:41:50 -0500 Subject: [Swift-devel] Re: TC File Generator 2.0 In-Reply-To: References: Message-ID: <1276623710.2860.0.camel@blabla2.none> On Tue, 2010-06-15 at 12:38 -0500, Arjun Comar wrote: > Mihael: If you explain how to work the tc specification into the > sites.xml file (I assume it involves mucking with swift itself?), I > can update this tool for that. It's not there yet. I just thought at various points in time that it should be. From mandaya at rose-hulman.edu Tue Jun 15 13:16:47 2010 From: mandaya at rose-hulman.edu (Arjun Comar) Date: Tue, 15 Jun 2010 13:16:47 -0500 Subject: [Swift-devel] Re: TC File Generator 2.0 In-Reply-To: <1276623710.2860.0.camel@blabla2.none> References: <1276623710.2860.0.camel@blabla2.none> Message-ID: Ah, alright. Well, if a decision is made to move in that direction, I'll update the tool to match that then. Arjun On Tue, Jun 15, 2010 at 12:41 PM, Mihael Hategan wrote: > On Tue, 2010-06-15 at 12:38 -0500, Arjun Comar wrote: > > > Mihael: If you explain how to work the tc specification into the > > sites.xml file (I assume it involves mucking with swift itself?), I > > can update this tool for that. > > It's not there yet. I just thought at various points in time that it > should be. > > -- Arjun Comar, Rose-Hulman '12 -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Wed Jun 16 11:02:56 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 16 Jun 2010 11:02:56 -0500 (CDT) Subject: [Swift-devel] Swift configuration interface Message-ID: <22726276.770581276704176196.JavaMail.root@zimbra> David, Wenjun, and Tom, I wanted to introduce you all and start a discussion of the Swift configuration interface. Wenjun and Tom, can you prepare a web page or PPT (on the SWFT wiki if no better place is apparent) that introduces your general science gateway, maybe shows a set of screenshots, and tells a user how to try it? Regarding the Swift configuration interface, specifically: - can you suggest to David a pallet of tools (web server, interface builder, etc) that you think might be both productive and compatible with your portal? (At least in the sense of minimizing the number of new technologies that we jointly require). - Mark Hereld raised a good point when I explained that the complexity of a command line interface was driving us to a web interface. He suggested the the "svn like" approach of a set of subcommands that maintain a database might help reduce that complexity, so that no single command needed was very complex, but together, the commands enabled you to do everything needed. *One* example of this might be to maintain for each user a small SQLite database of config information, along with perhaps a global one (part of the release, or fetched via the web), from which purpose-built sites and tc files were generated. Then users could say things like: swift cf add PADS Fusion swift cf apps blast dock convert swift cf set throttle 200 swift run myscript.swift Perhaps another thought to pursue. This would be similar, David, to the command syntax you proposed, with the key aspect being that it is stateful and can generate a variety of configs. Both routes (web and cmd line) have merits, but perhaps this again lets us defer the web a bit, and also helps us better model the underlying model behind both. - Mike -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From dk0966 at cs.ship.edu Wed Jun 16 18:22:42 2010 From: dk0966 at cs.ship.edu (David Kelly) Date: Wed, 16 Jun 2010 19:22:42 -0400 Subject: [Swift-devel] Re: Swift configuration interface In-Reply-To: <22726276.770581276704176196.JavaMail.root@zimbra> References: <22726276.770581276704176196.JavaMail.root@zimbra> Message-ID: On Wed, Jun 16, 2010 at 12:02 PM, Michael Wilde wrote: *One* example of this might be to maintain for each user a small SQLite > database of config information, along with perhaps a global one (part of the > release, or fetched via the web), from which purpose-built sites and tc > files were generated. > > Then users could say things like: > > swift cf add PADS Fusion > swift cf apps blast dock convert > swift cf set throttle 200 > swift run myscript.swift > Both routes (web and cmd line) have merits, but perhaps this again lets us > defer the web a bit, and also helps us better model the underlying model > behind both. > I like this idea and can see how it will make things simpler. I will shift my focus from the user interfaces to a database driven swiftconfig. I'll send out an updated swiftconfig for testing as soon as possible. Thanks, David -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Thu Jun 17 10:38:39 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 17 Jun 2010 10:38:39 -0500 (CDT) Subject: [Swift-devel] May need VOMS proxy for many OSG sites In-Reply-To: Message-ID: <29165947.811051276789119044.JavaMail.root@zimbra> Arjun, this may be the reason that your access to many OSG sites is failing. Find a site that fails using grid-proxy-init from say teraport. Then try that same site, using voms-proxy-init (sp?) on engage-login. We'll both need to dig into the full meaning of a "VOMS" proxy, but basically it appends extra "role" information to the proxy to indicate that you are activing as a member of a specific VO (in your case, the "engage" VO). I dont recall if we added that to Swift yet (I think not). Mihael, do you recal? If not, you'll need to do more of the initial testing from engage-login until we instal; OSG clients. - Mike ----- Forwarded Message ----- From: "Brian Bockelman" To: "Robert Engel" Cc: "Keith Chadwick" , "Iwona Sakrejda" , OSG-int at opensciencegrid.org, OSG-VO-FORUM at opensciencegrid.org, "Arvind Gopu" , "Rob Quick" Sent: Thursday, June 17, 2010 2:44:16 AM GMT -06:00 US/Canada Central Subject: Re: How to know if a site requires a VOMS Proxy or a Grid Proxy for authentication? On Jun 17, 2010, at 12:39 AM, Robert Engel wrote: > Keith, > > thanks for the link. But that is what I meant by manually knocking on each door. As an OSG user I want a simple way to find out what proxy to use on each of the potential 50+ resources there are. > Use a VOMS proxy. Didn't we just determine they are a superset of grid proxies? Reading through the thread, I didn't see any site saying "I accept grid proxies but not VOMS proxies." Ultimately, there are a million things that can go wrong in distributed computing (cosmic rays hitting fiber cables at FNAL). Why concentrate on this one? I'm not against having better probes or tests - but we have extremely limited effort. I'd rather identify the areas where we need this the most. The only way to know if a site accepts your jobs are to submit jobs. Why should we add central complexity instead of using auto-discovery (esp since the central view, whether MyOSG, BDII, etc, is always going to be wrong as they don't use your proxy)? We are a decentralized, distributed computing facility. You can't have centralized information that's "correct" if you have a decentralized computing system. Brian > I am thinking that myOSG could provide the required proxy information for each of the resources. Perhaps Arvind and Rob can comment on that. > > Robert > > > > Keith Chadwick wrote: >> At 3:17 PM -0700 6/16/10, Robert Engel wrote: >>> Hey Iwona, >>> >>> currently I recommend in the documentation to always check with the membership VO if they support VOMS and provide a VOMS server. Just as you said, the VOMS proxy in the end is just a 'fancy' grid proxy and can be used as such. I recommend using the VOMS Proxy under this circumstances. >>> >>> On the other hand I would like users who can't generate a VOMS Proxy with extended attributes to know if a certain site requires such without having to 'knock on every door' manually? Like for instance at Fermilab where this is required. I only know it is required because I talked to Burt. Otherwise I would have no idea. >> >> The requirement for voms proxies is explicitly published in the >> FermiGrid policy document: >> >> http://fermigrid.fnal.gov/policy.html >> >> Direct quote from the above document: >> >> VOs and VO members that desire to Fermilab grid resources must initialize >> their credentials using: >> >> * $VDT_LOCATION/voms/bin/voms-proxy-init >> >> Those VOs and VO members that fail to use voms-proxy-init may be blocked >> from accessing Fermilab grid resources. >> >> -Keith. >> >>> Thanks, >>> Robert >>> >>> Iwona Sakrejda wrote: >>>> But even not all the sites that run GUMS servers requirer VOMS proxy. >>>> >>>> So I'd say - if a proxy is rejected by a site, is the error message clear? I never tried.... >>>> >>>> Also the user should check with the VO. If a vo is utilizing functionality that comes with >>>> a VOMS proxy, it will be presumably educating its users about available roles and such, no? >>>> >>>> Always asking for a VOMS proxy is safer. If no VOMS server available - it will be reduced to >>>> a regular proxy. If a site is using map files, the extra stuff will be ignored and the proxy will >>>> work anyway. >>>> >>>> Isn't it so? >>>> >>>> Iwona >>>> >>>> On Wed, Jun 16, 2010 at 2:57 PM, Robert Engel > wrote: >>>> >>>> Steven, >>>> >>>> ? Do you know how a user could find out what RSV probes are >>>> running on any given site? I tried to find this in myOSG, but >>>> nothing turned up. >>>> >>>> Thanks, >>>> Robert >>>> >>>> >>>> Steven Timm wrote: >>>> >>>> The answer is not always a clear yes or no. ?If a site copies >>>> the OSG GUMS template and runs GUMS then they will end up >>>> requiring voms proxies for about half of the VO's and not >>>> for the other half. >>>> You could indirectly find out by which RSV probes any given site >>>> is running, GUMS sites run different RSV probes than grid-mapfile >>>> sites do. ?by default all grid-mapfile sites do not require >>>> any VOMS proxy. >>>> >>>> FermiGrid is the only site I know of that requires VOMS proxy for >>>> everyone and even we have a way to make exceptions if necessary. >>>> >>>> Steve >>>> >>>> >>>> On Wed, 16 Jun 2010, Robert Engel wrote: >>>> >>>> Hello, >>>> >>>> ?I am writing documentation for end users. I would like to >>>> write how a user can find out if a site accepts a Grid >>>> Proxy or requires a VOMS Proxy. Can that information be >>>> found in myOSG? >>>> >>>> Thanks, >>>> Robert >>>> >>>> >>> >>> >>> >>> Attachment converted: Macintosh HD:engel_r 18.vcf (TEXT/ttxt) (0040AFA0) >> >> > -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From mandaya at rose-hulman.edu Thu Jun 17 10:48:00 2010 From: mandaya at rose-hulman.edu (Arjun Comar) Date: Thu, 17 Jun 2010 10:48:00 -0500 Subject: [Swift-devel] Re: May need VOMS proxy for many OSG sites In-Reply-To: <29165947.811051276789119044.JavaMail.root@zimbra> References: <29165947.811051276789119044.JavaMail.root@zimbra> Message-ID: Ah, I'll play with that then. Is the procedure for that documented on the Engage VO site? And what does it mean if it's not added into Swift, that I won't be able to run on sites that require it? Or does it operate like grid-proxy-init in that once I apply it to a site, I'm good to go? Arjun On Thu, Jun 17, 2010 at 10:38 AM, Michael Wilde wrote: > Arjun, this may be the reason that your access to many OSG sites is > failing. > > Find a site that fails using grid-proxy-init from say teraport. > Then try that same site, using voms-proxy-init (sp?) on engage-login. > > We'll both need to dig into the full meaning of a "VOMS" proxy, but > basically it appends extra "role" information to the proxy to indicate that > you are activing as a member of a specific VO (in your case, the "engage" > VO). > > I dont recall if we added that to Swift yet (I think not). Mihael, do you > recal? > > If not, you'll need to do more of the initial testing from engage-login > until we instal; OSG clients. > > - Mike > > ----- Forwarded Message ----- > From: "Brian Bockelman" > To: "Robert Engel" > Cc: "Keith Chadwick" , "Iwona Sakrejda" < > isakrejda at lbl.gov>, OSG-int at opensciencegrid.org, > OSG-VO-FORUM at opensciencegrid.org, "Arvind Gopu" , "Rob > Quick" > Sent: Thursday, June 17, 2010 2:44:16 AM GMT -06:00 US/Canada Central > Subject: Re: How to know if a site requires a VOMS Proxy or a Grid Proxy > for authentication? > > > On Jun 17, 2010, at 12:39 AM, Robert Engel wrote: > > > Keith, > > > > thanks for the link. But that is what I meant by manually knocking on > each door. As an OSG user I want a simple way to find out what proxy to use > on each of the potential 50+ resources there are. > > > > Use a VOMS proxy. Didn't we just determine they are a superset of grid > proxies? Reading through the thread, I didn't see any site saying "I accept > grid proxies but not VOMS proxies." > > Ultimately, there are a million things that can go wrong in distributed > computing (cosmic rays hitting fiber cables at FNAL). Why concentrate on > this one? I'm not against having better probes or tests - but we have > extremely limited effort. I'd rather identify the areas where we need this > the most. > > The only way to know if a site accepts your jobs are to submit jobs. Why > should we add central complexity instead of using auto-discovery (esp since > the central view, whether MyOSG, BDII, etc, is always going to be wrong as > they don't use your proxy)? > > We are a decentralized, distributed computing facility. You can't have > centralized information that's "correct" if you have a decentralized > computing system. > > Brian > > > I am thinking that myOSG could provide the required proxy information for > each of the resources. Perhaps Arvind and Rob can comment on that. > > > > Robert > > > > > > > > Keith Chadwick wrote: > >> At 3:17 PM -0700 6/16/10, Robert Engel wrote: > >>> Hey Iwona, > >>> > >>> currently I recommend in the documentation to always check with the > membership VO if they support VOMS and provide a VOMS server. Just as you > said, the VOMS proxy in the end is just a 'fancy' grid proxy and can be used > as such. I recommend using the VOMS Proxy under this circumstances. > >>> > >>> On the other hand I would like users who can't generate a VOMS Proxy > with extended attributes to know if a certain site requires such without > having to 'knock on every door' manually? Like for instance at Fermilab > where this is required. I only know it is required because I talked to Burt. > Otherwise I would have no idea. > >> > >> The requirement for voms proxies is explicitly published in the > >> FermiGrid policy document: > >> > >> http://fermigrid.fnal.gov/policy.html > >> > >> Direct quote from the above document: > >> > >> VOs and VO members that desire to Fermilab grid resources must > initialize > >> their credentials using: > >> > >> * $VDT_LOCATION/voms/bin/voms-proxy-init > >> > >> Those VOs and VO members that fail to use voms-proxy-init may be > blocked > >> from accessing Fermilab grid resources. > >> > >> -Keith. > >> > >>> Thanks, > >>> Robert > >>> > >>> Iwona Sakrejda wrote: > >>>> But even not all the sites that run GUMS servers requirer VOMS proxy. > >>>> > >>>> So I'd say - if a proxy is rejected by a site, is the error message > clear? I never tried.... > >>>> > >>>> Also the user should check with the VO. If a vo is utilizing > functionality that comes with > >>>> a VOMS proxy, it will be presumably educating its users about > available roles and such, no? > >>>> > >>>> Always asking for a VOMS proxy is safer. If no VOMS server available - > it will be reduced to > >>>> a regular proxy. If a site is using map files, the extra stuff will be > ignored and the proxy will > >>>> work anyway. > >>>> > >>>> Isn't it so? > >>>> > >>>> Iwona > >>>> > >>>> On Wed, Jun 16, 2010 at 2:57 PM, Robert Engel < > engel_r at ligo.caltech.edu > wrote: > >>>> > >>>> Steven, > >>>> > >>>> ? Do you know how a user could find out what RSV probes are > >>>> running on any given site? I tried to find this in myOSG, but > >>>> nothing turned up. > >>>> > >>>> Thanks, > >>>> Robert > >>>> > >>>> > >>>> Steven Timm wrote: > >>>> > >>>> The answer is not always a clear yes or no. ?If a site copies > >>>> the OSG GUMS template and runs GUMS then they will end up > >>>> requiring voms proxies for about half of the VO's and not > >>>> for the other half. > >>>> You could indirectly find out by which RSV probes any given > site > >>>> is running, GUMS sites run different RSV probes than > grid-mapfile > >>>> sites do. ?by default all grid-mapfile sites do not require > >>>> any VOMS proxy. > >>>> > >>>> FermiGrid is the only site I know of that requires VOMS proxy > for > >>>> everyone and even we have a way to make exceptions if > necessary. > >>>> > >>>> Steve > >>>> > >>>> > >>>> On Wed, 16 Jun 2010, Robert Engel wrote: > >>>> > >>>> Hello, > >>>> > >>>> ?I am writing documentation for end users. I would like to > >>>> write how a user can find out if a site accepts a Grid > >>>> Proxy or requires a VOMS Proxy. Can that information be > >>>> found in myOSG? > >>>> > >>>> Thanks, > >>>> Robert > >>>> > >>>> > >>> > >>> > >>> > >>> Attachment converted: Macintosh HD:engel_r 18.vcf (TEXT/ttxt) > (0040AFA0) > >> > >> > > > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > -- Arjun Comar, Rose-Hulman '12 -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Thu Jun 17 10:56:37 2010 From: wilde at mcs.anl.gov (wilde at mcs.anl.gov) Date: Thu, 17 Jun 2010 10:56:37 -0500 (CDT) Subject: [Swift-devel] svn dirs for apps and tools In-Reply-To: <12649091.812301276789866361.JavaMail.root@zimbra> Message-ID: <24690231.812991276790197458.JavaMail.root@zimbra> Dear Students, at the top of the swift svn repository, there are directories: SwiftApps -> add Montage here usertools/swift -> add swiftconfig etc here I will make sure you cam all commit here. Please commit *only* to these directories for now, not in the main trunk or branch. Note that under usertools/swift there are some old experiments I did on a set of tools like swiftrun, etc. You might get some ideas from these, but David's project - with inout from all of us - involves a fresh look at these and a clean set of user-ready tools that we'll install into the swift/bin directory (from the main swift trunk or branch). So track the tools here for now until they are ready to move to trunk and then branch. When Mihael comes back to Swift work (shortly) we'll get the stable branch stable again and decide what from trunk should move there. - Mike From wilde at mcs.anl.gov Thu Jun 17 11:37:06 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 17 Jun 2010 11:37:06 -0500 (CDT) Subject: [Swift-devel] Re: May need VOMS proxy for many OSG sites In-Reply-To: Message-ID: <29521753.816501276792626269.JavaMail.root@zimbra> Arjun, one thing I should clarify: since I have not seen the errors you are getting, Im only suggesting that this *may* explain it, not that it *does*. If not, please gather the errors you are getting for various grid operations and send them to swift-devel. - Mike ----- "Arjun Comar" wrote: > Ah, I'll play with that then. Is the procedure for that documented on > the Engage VO site? And what does it mean if it's not added into > Swift, that I won't be able to run on sites that require it? Or does > it operate like grid-proxy-init in that once I apply it to a site, I'm > good to go? > > Arjun > > > On Thu, Jun 17, 2010 at 10:38 AM, Michael Wilde < wilde at mcs.anl.gov > > wrote: > > > Arjun, this may be the reason that your access to many OSG sites is > failing. > > Find a site that fails using grid-proxy-init from say teraport. > Then try that same site, using voms-proxy-init (sp?) on engage-login. > > We'll both need to dig into the full meaning of a "VOMS" proxy, but > basically it appends extra "role" information to the proxy to indicate > that you are activing as a member of a specific VO (in your case, the > "engage" VO). > > I dont recall if we added that to Swift yet (I think not). Mihael, do > you recal? > > If not, you'll need to do more of the initial testing from > engage-login until we instal; OSG clients. > > - Mike > > ----- Forwarded Message ----- > From: "Brian Bockelman" < bbockelm at cse.unl.edu > > To: "Robert Engel" < engel_r at ligo.caltech.edu > > Cc: "Keith Chadwick" < chadwick at fnal.gov >, "Iwona Sakrejda" < > isakrejda at lbl.gov >, OSG-int at opensciencegrid.org , > OSG-VO-FORUM at opensciencegrid.org , "Arvind Gopu" < agopu at indiana.edu > >, "Rob Quick" < rquick at iupui.edu > > Sent: Thursday, June 17, 2010 2:44:16 AM GMT -06:00 US/Canada Central > Subject: Re: How to know if a site requires a VOMS Proxy or a Grid > Proxy for authentication? > > > On Jun 17, 2010, at 12:39 AM, Robert Engel wrote: > > > Keith, > > > > thanks for the link. But that is what I meant by manually knocking > on each door. As an OSG user I want a simple way to find out what > proxy to use on each of the potential 50+ resources there are. > > > > Use a VOMS proxy. Didn't we just determine they are a superset of grid > proxies? Reading through the thread, I didn't see any site saying "I > accept grid proxies but not VOMS proxies." > > Ultimately, there are a million things that can go wrong in > distributed computing (cosmic rays hitting fiber cables at FNAL). Why > concentrate on this one? I'm not against having better probes or tests > - but we have extremely limited effort. I'd rather identify the areas > where we need this the most. > > The only way to know if a site accepts your jobs are to submit jobs. > Why should we add central complexity instead of using auto-discovery > (esp since the central view, whether MyOSG, BDII, etc, is always going > to be wrong as they don't use your proxy)? > > We are a decentralized, distributed computing facility. You can't have > centralized information that's "correct" if you have a decentralized > computing system. > > Brian > > > I am thinking that myOSG could provide the required proxy > information for each of the resources. Perhaps Arvind and Rob can > comment on that. > > > > Robert > > > > > > > > Keith Chadwick wrote: > >> At 3:17 PM -0700 6/16/10, Robert Engel wrote: > >>> Hey Iwona, > >>> > >>> currently I recommend in the documentation to always check with > the membership VO if they support VOMS and provide a VOMS server. Just > as you said, the VOMS proxy in the end is just a 'fancy' grid proxy > and can be used as such. I recommend using the VOMS Proxy under this > circumstances. > >>> > >>> On the other hand I would like users who can't generate a VOMS > Proxy with extended attributes to know if a certain site requires such > without having to 'knock on every door' manually? Like for instance at > Fermilab where this is required. I only know it is required because I > talked to Burt. Otherwise I would have no idea. > >> > >> The requirement for voms proxies is explicitly published in the > >> FermiGrid policy document: > >> > >> http://fermigrid.fnal.gov/policy.html > >> > >> Direct quote from the above document: > >> > >> VOs and VO members that desire to Fermilab grid resources must > initialize > >> their credentials using: > >> > >> * $VDT_LOCATION/voms/bin/voms-proxy-init > >> > >> Those VOs and VO members that fail to use voms-proxy-init may be > blocked > >> from accessing Fermilab grid resources. > >> > >> -Keith. > >> > >>> Thanks, > >>> Robert > >>> > >>> Iwona Sakrejda wrote: > >>>> But even not all the sites that run GUMS servers requirer VOMS > proxy. > >>>> > >>>> So I'd say - if a proxy is rejected by a site, is the error > message clear? I never tried.... > >>>> > >>>> Also the user should check with the VO. If a vo is utilizing > functionality that comes with > >>>> a VOMS proxy, it will be presumably educating its users about > available roles and such, no? > >>>> > >>>> Always asking for a VOMS proxy is safer. If no VOMS server > available - it will be reduced to > >>>> a regular proxy. If a site is using map files, the extra stuff > will be ignored and the proxy will > >>>> work anyway. > >>>> > >>>> Isn't it so? > >>>> > >>>> Iwona > >>>> > >>>> On Wed, Jun 16, 2010 at 2:57 PM, Robert Engel < > engel_r at ligo.caltech.edu > wrote: > >>>> > >>>> Steven, > >>>> > >>>> ? Do you know how a user could find out what RSV probes are > >>>> running on any given site? I tried to find this in myOSG, but > >>>> nothing turned up. > >>>> > >>>> Thanks, > >>>> Robert > >>>> > >>>> > >>>> Steven Timm wrote: > >>>> > >>>> The answer is not always a clear yes or no. ?If a site copies > >>>> the OSG GUMS template and runs GUMS then they will end up > >>>> requiring voms proxies for about half of the VO's and not > >>>> for the other half. > >>>> You could indirectly find out by which RSV probes any given site > >>>> is running, GUMS sites run different RSV probes than grid-mapfile > >>>> sites do. ?by default all grid-mapfile sites do not require > >>>> any VOMS proxy. > >>>> > >>>> FermiGrid is the only site I know of that requires VOMS proxy for > >>>> everyone and even we have a way to make exceptions if necessary. > >>>> > >>>> Steve > >>>> > >>>> > >>>> On Wed, 16 Jun 2010, Robert Engel wrote: > >>>> > >>>> Hello, > >>>> > >>>> ?I am writing documentation for end users. I would like to > >>>> write how a user can find out if a site accepts a Grid > >>>> Proxy or requires a VOMS Proxy. Can that information be > >>>> found in myOSG? > >>>> > >>>> Thanks, > >>>> Robert > >>>> > >>>> > >>> > >>> > >>> > >>> Attachment converted: Macintosh HD:engel_r 18.vcf (TEXT/ttxt) > (0040AFA0) > >> > >> > > > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > > > > -- > Arjun Comar, Rose-Hulman '12 -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From benc at hawaga.org.uk Thu Jun 17 11:46:28 2010 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 17 Jun 2010 16:46:28 +0000 (GMT) Subject: [Swift-devel] May need VOMS proxy for many OSG sites In-Reply-To: <29165947.811051276789119044.JavaMail.root@zimbra> References: <29165947.811051276789119044.JavaMail.root@zimbra> Message-ID: this was my understanding when I left: i) anywhere that takes a grid-proxy-init proxy will take a voms-proxy-init proxy, as brian bockelman notes. ii) a voms proxy is a grid proxy with an additional annotation, as you describe. iii) osg sites which use grid map files will generally accept either kind. osg sites which use VOMS directly require a voms proxy. iv) I considered the issue of packaging voms-proxy-init in the broader context of making swift work in harmony with the OSG software stack. To that end, I experimented with making swift installable through pacman (one of the earlier releases has that noted on the download page, I think) so that you could install it along with an osg stack. That would give you, for example, voms-proxy-init, but also things like OSG's CA collection. -- http://www.hawaga.org.uk/ben/ From aespinosa at cs.uchicago.edu Thu Jun 17 14:49:01 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Thu, 17 Jun 2010 14:49:01 -0500 Subject: [Swift-devel] May need VOMS proxy for many OSG sites In-Reply-To: <29165947.811051276789119044.JavaMail.root@zimbra> References: <29165947.811051276789119044.JavaMail.root@zimbra> Message-ID: Some sites ignore VOMS information (which should not be the case). I was running jobs the other day on BNL-ATLAS with my proxy including Engage voms attributes. I get permission errors in accessing engage-user files because my jobs were being executed on the 'osgedu' VO. I still have a couple of open tickets in OSG support because of that. What I suggest is that you query sites that supports the VO you think your proxy defaults to (without the VOMS attributes). Then incrementally check sites per batch of VOs -Allan 2010/6/17 Michael Wilde : > Arjun, this may be the reason that your access to many OSG sites is failing. > > Find a site that fails using grid-proxy-init from say teraport. > Then try that same site, using voms-proxy-init (sp?) on engage-login. > > We'll both need to dig into the full meaning of a "VOMS" proxy, but basically it appends extra "role" information to the proxy to indicate that you are activing as a member of a specific VO (in your case, the "engage" VO). > > I dont recall if we added that to Swift yet (I think not). Mihael, do you recal? > > If not, you'll need to do more of the initial testing from engage-login until we instal; OSG clients. > > - Mike > > ----- Forwarded Message ----- > From: "Brian Bockelman" > To: "Robert Engel" > Cc: "Keith Chadwick" , "Iwona Sakrejda" , OSG-int at opensciencegrid.org, OSG-VO-FORUM at opensciencegrid.org, "Arvind Gopu" , "Rob Quick" > Sent: Thursday, June 17, 2010 2:44:16 AM GMT -06:00 US/Canada Central > Subject: Re: How to know if a site requires a VOMS Proxy or a Grid Proxy for authentication? > > > On Jun 17, 2010, at 12:39 AM, Robert Engel wrote: > >> Keith, >> >> ? thanks for the link. But that is what I meant by manually knocking on each door. As an OSG user I want a simple way to find out what proxy to use on each of the potential 50+ resources there are. >> > > Use a VOMS proxy. ?Didn't we just determine they are a superset of grid proxies? ?Reading through the thread, I didn't see any site saying "I accept grid proxies but not VOMS proxies." > > Ultimately, there are a million things that can go wrong in distributed computing (cosmic rays hitting fiber cables at FNAL). ?Why concentrate on this one? ?I'm not against having better probes or tests - but we have extremely limited effort. ?I'd rather identify the areas where we need this the most. > > The only way to know if a site accepts your jobs are to submit jobs. ?Why should we add central complexity instead of using auto-discovery (esp since the central view, whether MyOSG, BDII, etc, is always going to be wrong as they don't use your proxy)? > > We are a decentralized, distributed computing facility. ?You can't have centralized information that's "correct" if you have a decentralized computing system. > > Brian > >> I am thinking that myOSG could provide the required proxy information for each of the resources. Perhaps Arvind and Rob can comment on that. >> >> Robert >> >> >> >> Keith Chadwick wrote: >>> At 3:17 PM -0700 6/16/10, Robert Engel wrote: >>>> Hey Iwona, >>>> >>>> ? currently I recommend in the documentation to always check with the membership VO if they support VOMS and provide a VOMS server. Just as you said, the VOMS proxy in the end is just a 'fancy' grid proxy and can be used as such. I recommend using the VOMS Proxy under this circumstances. >>>> >>>> On the other hand I would like users who can't generate a VOMS Proxy with extended attributes to know if a certain site requires such without having to 'knock on every door' manually? Like for instance at Fermilab where this is required. I only know it is required because I talked to Burt. Otherwise I would have no idea. >>> >>> The requirement for voms proxies is explicitly published in the >>> FermiGrid policy document: >>> >>> ? ?http://fermigrid.fnal.gov/policy.html >>> >>> Direct quote from the above document: >>> >>> ? ?VOs and VO members that desire to Fermilab grid resources must initialize >>> ? ?their credentials using: >>> >>> ? ? ? ?* $VDT_LOCATION/voms/bin/voms-proxy-init >>> >>> ? ?Those VOs and VO members that fail to use voms-proxy-init may be blocked >>> ? ?from accessing Fermilab grid resources. >>> >>> -Keith. >>> >>>> Thanks, >>>> Robert >>>> >>>> Iwona Sakrejda wrote: >>>>> But even not all the sites that run GUMS servers requirer VOMS proxy. >>>>> >>>>> So I'd say - if a proxy is rejected by a site, is the error message clear? I never tried.... >>>>> >>>>> Also the user should check with the VO. If a vo is utilizing functionality that comes with >>>>> a VOMS proxy, it will be presumably educating its users about available roles and such, no? >>>>> >>>>> Always asking for a VOMS proxy is safer. If no VOMS server available - it will be reduced to >>>>> a regular proxy. If a site is using map files, the extra stuff will be ignored and the proxy will >>>>> work anyway. >>>>> >>>>> Isn't it so? >>>>> >>>>> Iwona >>>>> >>>>> On Wed, Jun 16, 2010 at 2:57 PM, Robert Engel > wrote: >>>>> >>>>> ? ?Steven, >>>>> >>>>> ? ?? Do you know how a user could find out what RSV probes are >>>>> ? ?running on any given site? I tried to find this in myOSG, but >>>>> ? ?nothing turned up. >>>>> >>>>> ? ?Thanks, >>>>> ? ?Robert >>>>> >>>>> >>>>> ? ?Steven Timm wrote: >>>>> >>>>> ? ? ? ?The answer is not always a clear yes or no. ?If a site copies >>>>> ? ? ? ?the OSG GUMS template and runs GUMS then they will end up >>>>> ? ? ? ?requiring voms proxies for about half of the VO's and not >>>>> ? ? ? ?for the other half. >>>>> ? ? ? ?You could indirectly find out by which RSV probes any given site >>>>> ? ? ? ?is running, GUMS sites run different RSV probes than grid-mapfile >>>>> ? ? ? ?sites do. ?by default all grid-mapfile sites do not require >>>>> ? ? ? ?any VOMS proxy. >>>>> >>>>> ? ? ? ?FermiGrid is the only site I know of that requires VOMS proxy for >>>>> ? ? ? ?everyone and even we have a way to make exceptions if necessary. >>>>> >>>>> ? ? ? ?Steve >>>>> >>>>> >>>>> ? ? ? ?On Wed, 16 Jun 2010, Robert Engel wrote: >>>>> >>>>> ? ? ? ? ? ?Hello, >>>>> >>>>> ? ? ? ? ? ??I am writing documentation for end users. I would like to >>>>> ? ? ? ? ? ?write how a user can find out if a site accepts a Grid >>>>> ? ? ? ? ? ?Proxy or requires a VOMS Proxy. Can that information be >>>>> ? ? ? ? ? ?found in myOSG? >>>>> >>>>> ? ? ? ? ? ?Thanks, >>>>> ? ? ? ? ? ?Robert >>>>> >>>>> >>>> >>>> >>>> >>>> Attachment converted: Macintosh HD:engel_r 18.vcf (TEXT/ttxt) (0040AFA0) >>> >>> >> > From benc at hawaga.org.uk Thu Jun 17 14:56:15 2010 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 17 Jun 2010 19:56:15 +0000 (GMT) Subject: [Swift-devel] May need VOMS proxy for many OSG sites In-Reply-To: References: <29165947.811051276789119044.JavaMail.root@zimbra> Message-ID: > Some sites ignore VOMS information (which should not be the case). I can imagine that corresponding with the sites that use grid map files rather than VOMS - if you are in multiple VOs you end up being mapped (from user perspective) non-deterministically to one of the VOs - the one that is (first/last) in the gridmap file, probably. -- From wilde at mcs.anl.gov Thu Jun 17 15:09:32 2010 From: wilde at mcs.anl.gov (wilde at mcs.anl.gov) Date: Thu, 17 Jun 2010 15:09:32 -0500 (CDT) Subject: [Swift-devel] Fwd: How to know if a site requires a VOMS Proxy or a Grid Proxy for authentication? In-Reply-To: <20734869.831371276805251176.JavaMail.root@zimbra> Message-ID: <8069806.831521276805372198.JavaMail.root@zimbra> More from the OSG thread, related to the discussion on VOMS proxy issues. The full thread should be somewhere below here: http://listserv.fnal.gov/scripts/wa.exe?A1=ind1006c&L=osg-int ----- Forwarded Message ----- From: "Robert Engel" To: "Brian Bockelman" Cc: "Keith Chadwick" , "Iwona Sakrejda" , OSG-int at opensciencegrid.org, OSG-VO-FORUM at opensciencegrid.org, "Arvind Gopu" , "Rob Quick" Sent: Thursday, June 17, 2010 2:58:06 PM GMT -06:00 US/Canada Central Subject: Re: How to know if a site requires a VOMS Proxy or a Grid Proxy for authentication? Hey Brian, you misunderstood me. I am concerned about users of VOs that do not provide a VOMS Server and that can not generate proxies with extended attributes. The voms proxy without attributes will be of little use if the remote site ( for instance Fermilab ) requires it. My initial goal was to direct the user to some information that would allow him to find out what the remote site requires. Otherwise the only way to find out for a user is to try and open a ticket if he fails in all possible ways (grid-proxy, voms-proxy w/o attributes, voms-proxy with attributes ). Robert Brian Bockelman wrote: > On Jun 17, 2010, at 12:39 AM, Robert Engel wrote: > > >> Keith, >> >> thanks for the link. But that is what I meant by manually knocking on each door. As an OSG user I want a simple way to find out what proxy to use on each of the potential 50+ resources there are. >> >> > > Use a VOMS proxy. Didn't we just determine they are a superset of grid proxies? Reading through the thread, I didn't see any site saying "I accept grid proxies but not VOMS proxies." > > Ultimately, there are a million things that can go wrong in distributed computing (cosmic rays hitting fiber cables at FNAL). Why concentrate on this one? I'm not against having better probes or tests - but we have extremely limited effort. I'd rather identify the areas where we need this the most. > > The only way to know if a site accepts your jobs are to submit jobs. Why should we add central complexity instead of using auto-discovery (esp since the central view, whether MyOSG, BDII, etc, is always going to be wrong as they don't use your proxy)? > > We are a decentralized, distributed computing facility. You can't have centralized information that's "correct" if you have a decentralized computing system. > > Brian > > >> I am thinking that myOSG could provide the required proxy information for each of the resources. Perhaps Arvind and Rob can comment on that. >> >> Robert >> >> >> >> Keith Chadwick wrote: >> >>> At 3:17 PM -0700 6/16/10, Robert Engel wrote: >>> >>>> Hey Iwona, >>>> >>>> currently I recommend in the documentation to always check with the membership VO if they support VOMS and provide a VOMS server. Just as you said, the VOMS proxy in the end is just a 'fancy' grid proxy and can be used as such. I recommend using the VOMS Proxy under this circumstances. >>>> >>>> On the other hand I would like users who can't generate a VOMS Proxy with extended attributes to know if a certain site requires such without having to 'knock on every door' manually? Like for instance at Fermilab where this is required. I only know it is required because I talked to Burt. Otherwise I would have no idea. >>>> >>> The requirement for voms proxies is explicitly published in the >>> FermiGrid policy document: >>> >>> http://fermigrid.fnal.gov/policy.html >>> >>> Direct quote from the above document: >>> >>> VOs and VO members that desire to Fermilab grid resources must initialize >>> their credentials using: >>> >>> * $VDT_LOCATION/voms/bin/voms-proxy-init >>> >>> Those VOs and VO members that fail to use voms-proxy-init may be blocked >>> from accessing Fermilab grid resources. >>> >>> -Keith. >>> >>> >>>> Thanks, >>>> Robert >>>> >>>> Iwona Sakrejda wrote: >>>> >>>>> But even not all the sites that run GUMS servers requirer VOMS proxy. >>>>> >>>>> So I'd say - if a proxy is rejected by a site, is the error message clear? I never tried.... >>>>> >>>>> Also the user should check with the VO. If a vo is utilizing functionality that comes with >>>>> a VOMS proxy, it will be presumably educating its users about available roles and such, no? >>>>> >>>>> Always asking for a VOMS proxy is safer. If no VOMS server available - it will be reduced to >>>>> a regular proxy. If a site is using map files, the extra stuff will be ignored and the proxy will >>>>> work anyway. >>>>> >>>>> Isn't it so? >>>>> >>>>> Iwona >>>>> >>>>> On Wed, Jun 16, 2010 at 2:57 PM, Robert Engel > wrote: >>>>> >>>>> Steven, >>>>> >>>>> ? Do you know how a user could find out what RSV probes are >>>>> running on any given site? I tried to find this in myOSG, but >>>>> nothing turned up. >>>>> >>>>> Thanks, >>>>> Robert >>>>> >>>>> >>>>> Steven Timm wrote: >>>>> >>>>> The answer is not always a clear yes or no. ?If a site copies >>>>> the OSG GUMS template and runs GUMS then they will end up >>>>> requiring voms proxies for about half of the VO's and not >>>>> for the other half. >>>>> You could indirectly find out by which RSV probes any given site >>>>> is running, GUMS sites run different RSV probes than grid-mapfile >>>>> sites do. ?by default all grid-mapfile sites do not require >>>>> any VOMS proxy. >>>>> >>>>> FermiGrid is the only site I know of that requires VOMS proxy for >>>>> everyone and even we have a way to make exceptions if necessary. >>>>> >>>>> Steve >>>>> >>>>> >>>>> On Wed, 16 Jun 2010, Robert Engel wrote: >>>>> >>>>> Hello, >>>>> >>>>> ?I am writing documentation for end users. I would like to >>>>> write how a user can find out if a site accepts a Grid >>>>> Proxy or requires a VOMS Proxy. Can that information be >>>>> found in myOSG? >>>>> >>>>> Thanks, >>>>> Robert >>>>> >>>>> >>>>> >>>> >>>> Attachment converted: Macintosh HD:engel_r 18.vcf (TEXT/ttxt) (0040AFA0) >>>> >>> >> >> > > -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Thu Jun 17 16:42:13 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 17 Jun 2010 16:42:13 -0500 (CDT) Subject: [Swift-devel] swift-plot-log Message-ID: <26491235.838771276810933578.JavaMail.root@zimbra> The students may find this helpful: http://www.ci.uchicago.edu/swift/guides/log-processing.php Note that we are debugging problems in swift-plot-log at the moment. Arjun, please post the log file location (ideally the shortest possible one) and the error you are getting, to this list. From wilde at mcs.anl.gov Thu Jun 17 16:54:08 2010 From: wilde at mcs.anl.gov (wilde at mcs.anl.gov) Date: Thu, 17 Jun 2010 16:54:08 -0500 (CDT) Subject: [Swift-devel] ADEM - a tool for grid software distribution In-Reply-To: <32219915.838931276811341226.JavaMail.root@zimbra> Message-ID: <33090816.839241276811648258.JavaMail.root@zimbra> This tool, developed by Zhengxiong Hou, may be a good source of idea and components to do site testing. There's a doc directory https://trac.ci.uchicago.edu/swift/browser/SwiftApps/adem-osg There's a paper on this at: http://www.mcs.anl.gov/uploads/cels/papers/P1659.pdf (perhaps another copy somewhere has better figure resolution) Another tool for site checking is vds-check-sites (if we can locate a copy in an old VDS release) or the pegasus equivalent pegasus-check-sites which still might be available in a recent Pegasus release (but doesnt seem to be referenced in any documents, so it might be deprecated). But Im still wondering if there was some more pragmatic version of this that was used by several people in the past few years. - Mike From wilde at mcs.anl.gov Thu Jun 17 17:55:16 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 17 Jun 2010 17:55:16 -0500 (CDT) Subject: [Swift-devel] Some examples of swift-plot-log and how it relates to swift processing Message-ID: <2318687.842271276815316577.JavaMail.root@zimbra> ...is in this (old) posting from Ben, which shows many of the tool's plot types: http://mail.ci.uchicago.edu/pipermail/swift-devel/2008-October/003950.html Searching the swift-devel archives will turn up many other examples. - Mike From dk0966 at cs.ship.edu Mon Jun 21 09:49:41 2010 From: dk0966 at cs.ship.edu (David Kelly) Date: Mon, 21 Jun 2010 10:49:41 -0400 Subject: [Swift-devel] Re: Swift configuration interface In-Reply-To: References: <22726276.770581276704176196.JavaMail.root@zimbra> Message-ID: Hello, Here are the plans for a database driven swiftconfig. Data will be stored in a SQLite database file stored in the swift/etc. It will contain a global information and local information. The global information will be a set of common tc entries and site configurations. It will include the profile/template information for multiple sites. It will be stored in the following tables: sqlite> .schema sites_default CREATE TABLE sites_default( name varchar(256) primary key, filesys_provider varchar(256), filesys_url varchar(256), execution_provider varchar(256), execution_url varchar(256), execution_jobmanager varchar(256), workdirectory varchar(256) ); sqlite> .schema sites_profiles_default CREATE TABLE sites_profiles_default( name varchar(256) primary key, namespace varchar(256), key varchar(256), value varchar(256) ); sqlite> .schema tc_default CREATE TABLE tc_default( sitename varchar(256), transformation varchar(256), path varchar(256), installed varchar(256), platform varchar(256) ); sqlite> .schema tc_profiles_default CREATE TABLE tc_profiles_default( name varchar(256), profile varchar(256) ); This default data will not be modified by swiftconfig, but instead serve as a basis for customization. The custom/local information will be stored in similar tables, without the "default" suffix (sites, sites_profiles, tc, tc_profiles) Swiftconfig will generate live configuration files tc.data and sites.xml. It will determine the location based on, in this order, a specific file location given to swift config, the location identified by $SWIFT_HOME, and based on the location of 'swift' in $PATH. Here are some examples of how it could run: $ swiftconfig add teraport This will take data from sites_default and sites_profiles_default, generate the XML and add to sites.xml. $ swiftconfig set maxtime 5000 Modify data from local tables, sites_default and sites_profiles, regenerate XML and modifiy sites.xml. $ swiftconfig remove teraport Removes teraport config from sites and sites_profiles (keeping the original profile unmodified) $ swiftconfig apps convert composite Transfers data from tc_default to tc, updates tc.data. Writing the application in Python could help resolve some of the issues I was having with Perl. Python has all the database, XML (as well as web and GUI for later) modules I would need included by default, instead of having to include compiled modules like DBI and DBD::sqlite in swift/lib. Does these changes simplify things enough? Perhaps we could talk more about it today on the call (or maybe a separate call, depending on how long it would take) -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Mon Jun 21 11:01:06 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 21 Jun 2010 11:01:06 -0500 Subject: [Swift-devel] Re: Swift configuration interface In-Reply-To: References: <22726276.770581276704176196.JavaMail.root@zimbra> Message-ID: <1277136066.3882.2.camel@blabla2.none> Given that the end product is pretty much in XML format, wouldn't it make more sense to store the information in XML and extract parts of that? Mihael On Mon, 2010-06-21 at 10:49 -0400, David Kelly wrote: > Hello, > > Here are the plans for a database driven swiftconfig. Data will be > stored in a SQLite database file stored in the swift/etc. It will > contain a global information and local information. The global > information will be a set of common tc entries and site > configurations. It will include the profile/template information for > multiple sites. It will be stored in the following tables: > > sqlite> .schema sites_default > CREATE TABLE sites_default( > name varchar(256) primary key, > filesys_provider varchar(256), > filesys_url varchar(256), > execution_provider varchar(256), > execution_url varchar(256), > execution_jobmanager varchar(256), > workdirectory varchar(256) > ); > > sqlite> .schema sites_profiles_default > CREATE TABLE sites_profiles_default( > name varchar(256) primary key, > namespace varchar(256), > key varchar(256), > value varchar(256) > ); > > sqlite> .schema tc_default > CREATE TABLE tc_default( > sitename varchar(256), > transformation varchar(256), > path varchar(256), > installed varchar(256), > platform varchar(256) > ); > > sqlite> .schema tc_profiles_default > CREATE TABLE tc_profiles_default( > name varchar(256), > profile varchar(256) > ); > > This default data will not be modified by swiftconfig, but instead > serve as a basis for customization. The custom/local information will > be stored in similar tables, without the "default" suffix (sites, > sites_profiles, tc, tc_profiles) > > Swiftconfig will generate live configuration files tc.data and > sites.xml. It will determine the location based on, in this order, a > specific file location given to swift config, the location identified > by $SWIFT_HOME, and based on the location of 'swift' in $PATH. > > Here are some examples of how it could run: > > $ swiftconfig add teraport > This will take data from sites_default and sites_profiles_default, > generate the XML and add to sites.xml. > > $ swiftconfig set maxtime 5000 > Modify data from local tables, sites_default and sites_profiles, > regenerate XML and modifiy sites.xml. > > $ swiftconfig remove teraport > Removes teraport config from sites and sites_profiles (keeping the > original profile unmodified) > > $ swiftconfig apps convert composite > Transfers data from tc_default to tc, updates tc.data. > > Writing the application in Python could help resolve some of the > issues I was having with Perl. Python has all the database, XML (as > well as web and GUI for later) modules I would need included by > default, instead of having to include compiled modules like DBI and > DBD::sqlite in swift/lib. > > Does these changes simplify things enough? Perhaps we could talk more > about it today on the call (or maybe a separate call, depending on how > long it would take) > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From dk0966 at cs.ship.edu Mon Jun 21 11:41:28 2010 From: dk0966 at cs.ship.edu (David Kelly) Date: Mon, 21 Jun 2010 12:41:28 -0400 Subject: [Swift-devel] Re: Swift configuration interface In-Reply-To: <1277136066.3882.2.camel@blabla2.none> References: <22726276.770581276704176196.JavaMail.root@zimbra> <1277136066.3882.2.camel@blabla2.none> Message-ID: On Mon, Jun 21, 2010 at 12:01 PM, Mihael Hategan wrote: Given that the end product is pretty much in XML format, wouldn't it > make more sense to store the information in XML and extract parts of > that? > That is basically how the original program I wrote works, by storing all the sites data in an XML template file and changing attributes as needed into working config files. Some of the feedback I received from that suggested it might be better to store in a database. Do you prefer the previous style and syntax? I'm feel like I'm kind of struggling to find a way to make things simpler (and useful) without a GUI. David -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Mon Jun 21 13:39:34 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 21 Jun 2010 13:39:34 -0500 Subject: [Swift-devel] Re: Swift configuration interface In-Reply-To: References: <22726276.770581276704176196.JavaMail.root@zimbra> <1277136066.3882.2.camel@blabla2.none> Message-ID: <1277145574.4729.3.camel@blabla2.none> On Mon, 2010-06-21 at 12:41 -0400, David Kelly wrote: > > On Mon, Jun 21, 2010 at 12:01 PM, Mihael Hategan > wrote: > > Given that the end product is pretty much in XML format, > wouldn't it > make more sense to store the information in XML and extract > parts of > that? > > That is basically how the original program I wrote works, by storing > all the sites data in an XML template file and changing attributes as > needed into working config files. Some of the feedback I received from > that suggested it might be better to store in a database. Do you > prefer the previous style and syntax? If there was a storage choice between XML and a database in this particular case, I would go with XML, since the source and destination formats are close and because it would keep dependencies low. But that's just me. > I'm feel like I'm kind of struggling to find a way to make things > simpler (and useful) without a GUI. Right, though I think that the user interface is a separate issue from the backend. Mihael From hategan at mcs.anl.gov Wed Jun 23 22:06:00 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 23 Jun 2010 22:06:00 -0500 Subject: [Swift-devel] ssh proxy forwarding Message-ID: <1277348760.10510.1.camel@blabla2.none> Some initial code for GSI proxy forwarding for the SSH provider has been committed. This fakes delegation and allows a coaster service to be started through the ssh provider without having to have a proxy on the remote site. A proxy on the client side is still needed. cog r2775 that is. Mihael From skenny at uchicago.edu Mon Jun 28 14:11:44 2010 From: skenny at uchicago.edu (Sarah Kenny) Date: Mon, 28 Jun 2010 14:11:44 -0500 Subject: [Swift-devel] coasters and java on queen bee Message-ID: anyone else get this using coasters on queen bee? STDOUT: Warning: -jar not understood. Ignoring. Exception in thread "main" java.lang.NoClassDefFoundError: .tmp.bootstrap.Z29712 at gnu.gcj.runtime.FirstThread.run() (/usr/lib64/libgcj.so.5.0.0) at _Jv_ThreadRun(java.lang.Thread) (/usr/lib64/libgcj.so.5.0.0) at _Jv_RunMain(java.lang.Class, byte const, int, byte const, boolean) (/usr/lib64/libgcj.so.5.0.0) at __gcj_personality_v0 (/home/skenny/java.version=1.4.2) at __libc_start_main (/lib64/tls/libc-2.3.4.so) at _Jv_RegisterClasses (/home/skenny/java.version=1.4.2) not sure why it's showing my java version as 1.4.2...when i'm logged into the headnode (or a worker node) my java version is much newer: [skenny at qb1 ~]$ java -version java version "1.6.0_20" Java(TM) SE Runtime Environment (build 1.6.0_20-b02) Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) i also tried adding JAVA_HOME to my .bashrc on the site which had no effect. any idea how i can get swift to use the right java? using a slightly older swift (swift-r3116 cog-r2482)...don't know if this has maybe been fixed in a later version (?) thanks ~sk From hategan at mcs.anl.gov Mon Jun 28 14:30:50 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 28 Jun 2010 14:30:50 -0500 Subject: [Swift-devel] coasters and java on queen bee In-Reply-To: References: Message-ID: <1277753450.29785.1.camel@blabla2.none> Have you tried setting JAVA_HOME in tc.data or sites.xml? I have a suspicion that might work. On Mon, 2010-06-28 at 14:11 -0500, Sarah Kenny wrote: > anyone else get this using coasters on queen bee? > > STDOUT: Warning: -jar not understood. Ignoring. > Exception in thread "main" java.lang.NoClassDefFoundError: .tmp.bootstrap.Z29712 > at gnu.gcj.runtime.FirstThread.run() (/usr/lib64/libgcj.so.5.0.0) > at _Jv_ThreadRun(java.lang.Thread) (/usr/lib64/libgcj.so.5.0.0) > at _Jv_RunMain(java.lang.Class, byte const, int, byte const, > boolean) (/usr/lib64/libgcj.so.5.0.0) > at __gcj_personality_v0 (/home/skenny/java.version=1.4.2) > at __libc_start_main (/lib64/tls/libc-2.3.4.so) > at _Jv_RegisterClasses (/home/skenny/java.version=1.4.2) > > not sure why it's showing my java version as 1.4.2...when i'm logged > into the headnode (or a worker node) my java version is much newer: > > [skenny at qb1 ~]$ java -version > java version "1.6.0_20" > Java(TM) SE Runtime Environment (build 1.6.0_20-b02) > Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) > > i also tried adding JAVA_HOME to my .bashrc on the site which had no effect. > > any idea how i can get swift to use the right java? > > using a slightly older swift (swift-r3116 cog-r2482)...don't know if > this has maybe been fixed in a later version (?) > > > thanks > ~sk > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From skenny at uchicago.edu Mon Jun 28 14:34:07 2010 From: skenny at uchicago.edu (Sarah Kenny) Date: Mon, 28 Jun 2010 14:34:07 -0500 Subject: [Swift-devel] coasters and java on queen bee In-Reply-To: <1277753450.29785.1.camel@blabla2.none> References: <1277753450.29785.1.camel@blabla2.none> Message-ID: tc.data had no effect...will try sites On Mon, Jun 28, 2010 at 2:30 PM, Mihael Hategan wrote: > Have you tried setting JAVA_HOME in tc.data or sites.xml? I have a > suspicion that might work. > > On Mon, 2010-06-28 at 14:11 -0500, Sarah Kenny wrote: >> anyone else get this using coasters on queen bee? >> >> STDOUT: Warning: -jar not understood. Ignoring. >> Exception in thread "main" java.lang.NoClassDefFoundError: .tmp.bootstrap.Z29712 >> ? ?at gnu.gcj.runtime.FirstThread.run() (/usr/lib64/libgcj.so.5.0.0) >> ? ?at _Jv_ThreadRun(java.lang.Thread) (/usr/lib64/libgcj.so.5.0.0) >> ? ?at _Jv_RunMain(java.lang.Class, byte const, int, byte const, >> boolean) (/usr/lib64/libgcj.so.5.0.0) >> ? ?at __gcj_personality_v0 (/home/skenny/java.version=1.4.2) >> ? ?at __libc_start_main (/lib64/tls/libc-2.3.4.so) >> ? ?at _Jv_RegisterClasses (/home/skenny/java.version=1.4.2) >> >> not sure why it's showing my java version as 1.4.2...when i'm logged >> into the headnode (or a worker node) my java version is much newer: >> >> [skenny at qb1 ~]$ java -version >> java version "1.6.0_20" >> Java(TM) SE Runtime Environment (build 1.6.0_20-b02) >> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) >> >> i also tried adding JAVA_HOME to my .bashrc on the site which had no effect. >> >> any idea how i can get swift to use the right java? >> >> using a slightly older swift (swift-r3116 cog-r2482)...don't know if >> this has maybe been fixed in a later version (?) >> >> >> thanks >> ~sk >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > From hategan at mcs.anl.gov Mon Jun 28 14:40:53 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 28 Jun 2010 14:40:53 -0500 Subject: [Swift-devel] coasters and java on queen bee In-Reply-To: References: <1277753450.29785.1.camel@blabla2.none> Message-ID: <1277754053.29917.0.camel@blabla2.none> Ok. That might not work either. However, you may try to add the proper java executable to the PATH. I think that might work better. On Mon, 2010-06-28 at 14:34 -0500, Sarah Kenny wrote: > tc.data had no effect...will try sites > > On Mon, Jun 28, 2010 at 2:30 PM, Mihael Hategan wrote: > > Have you tried setting JAVA_HOME in tc.data or sites.xml? I have a > > suspicion that might work. > > > > On Mon, 2010-06-28 at 14:11 -0500, Sarah Kenny wrote: > >> anyone else get this using coasters on queen bee? > >> > >> STDOUT: Warning: -jar not understood. Ignoring. > >> Exception in thread "main" java.lang.NoClassDefFoundError: .tmp.bootstrap.Z29712 > >> at gnu.gcj.runtime.FirstThread.run() (/usr/lib64/libgcj.so.5.0.0) > >> at _Jv_ThreadRun(java.lang.Thread) (/usr/lib64/libgcj.so.5.0.0) > >> at _Jv_RunMain(java.lang.Class, byte const, int, byte const, > >> boolean) (/usr/lib64/libgcj.so.5.0.0) > >> at __gcj_personality_v0 (/home/skenny/java.version=1.4.2) > >> at __libc_start_main (/lib64/tls/libc-2.3.4.so) > >> at _Jv_RegisterClasses (/home/skenny/java.version=1.4.2) > >> > >> not sure why it's showing my java version as 1.4.2...when i'm logged > >> into the headnode (or a worker node) my java version is much newer: > >> > >> [skenny at qb1 ~]$ java -version > >> java version "1.6.0_20" > >> Java(TM) SE Runtime Environment (build 1.6.0_20-b02) > >> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) > >> > >> i also tried adding JAVA_HOME to my .bashrc on the site which had no effect. > >> > >> any idea how i can get swift to use the right java? > >> > >> using a slightly older swift (swift-r3116 cog-r2482)...don't know if > >> this has maybe been fixed in a later version (?) > >> > >> > >> thanks > >> ~sk > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > From skenny at uchicago.edu Mon Jun 28 14:45:08 2010 From: skenny at uchicago.edu (Sarah Kenny) Date: Mon, 28 Jun 2010 14:45:08 -0500 Subject: [Swift-devel] coasters and java on queen bee In-Reply-To: References: <1277753450.29785.1.camel@blabla2.none> Message-ID: darn...no luck with sites.xml either On Mon, Jun 28, 2010 at 2:34 PM, Sarah Kenny wrote: > tc.data had no effect...will try sites > > On Mon, Jun 28, 2010 at 2:30 PM, Mihael Hategan wrote: >> Have you tried setting JAVA_HOME in tc.data or sites.xml? I have a >> suspicion that might work. >> >> On Mon, 2010-06-28 at 14:11 -0500, Sarah Kenny wrote: >>> anyone else get this using coasters on queen bee? >>> >>> STDOUT: Warning: -jar not understood. Ignoring. >>> Exception in thread "main" java.lang.NoClassDefFoundError: .tmp.bootstrap.Z29712 >>> ? ?at gnu.gcj.runtime.FirstThread.run() (/usr/lib64/libgcj.so.5.0.0) >>> ? ?at _Jv_ThreadRun(java.lang.Thread) (/usr/lib64/libgcj.so.5.0.0) >>> ? ?at _Jv_RunMain(java.lang.Class, byte const, int, byte const, >>> boolean) (/usr/lib64/libgcj.so.5.0.0) >>> ? ?at __gcj_personality_v0 (/home/skenny/java.version=1.4.2) >>> ? ?at __libc_start_main (/lib64/tls/libc-2.3.4.so) >>> ? ?at _Jv_RegisterClasses (/home/skenny/java.version=1.4.2) >>> >>> not sure why it's showing my java version as 1.4.2...when i'm logged >>> into the headnode (or a worker node) my java version is much newer: >>> >>> [skenny at qb1 ~]$ java -version >>> java version "1.6.0_20" >>> Java(TM) SE Runtime Environment (build 1.6.0_20-b02) >>> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) >>> >>> i also tried adding JAVA_HOME to my .bashrc on the site which had no effect. >>> >>> any idea how i can get swift to use the right java? >>> >>> using a slightly older swift (swift-r3116 cog-r2482)...don't know if >>> this has maybe been fixed in a later version (?) >>> >>> >>> thanks >>> ~sk >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> >> >> > From skenny at uchicago.edu Tue Jun 29 09:50:21 2010 From: skenny at uchicago.edu (Sarah Kenny) Date: Tue, 29 Jun 2010 09:50:21 -0500 Subject: [Swift-devel] coasters and java on queen bee In-Reply-To: <1277754053.29917.0.camel@blabla2.none> References: <1277753450.29785.1.camel@blabla2.none> <1277754053.29917.0.camel@blabla2.none> Message-ID: nope :( On Mon, Jun 28, 2010 at 2:40 PM, Mihael Hategan wrote: > Ok. That might not work either. > > However, you may try to add the proper java executable to the PATH. I > think that might work better. > > On Mon, 2010-06-28 at 14:34 -0500, Sarah Kenny wrote: >> tc.data had no effect...will try sites >> >> On Mon, Jun 28, 2010 at 2:30 PM, Mihael Hategan wrote: >> > Have you tried setting JAVA_HOME in tc.data or sites.xml? I have a >> > suspicion that might work. >> > >> > On Mon, 2010-06-28 at 14:11 -0500, Sarah Kenny wrote: >> >> anyone else get this using coasters on queen bee? >> >> >> >> STDOUT: Warning: -jar not understood. Ignoring. >> >> Exception in thread "main" java.lang.NoClassDefFoundError: .tmp.bootstrap.Z29712 >> >> ? ?at gnu.gcj.runtime.FirstThread.run() (/usr/lib64/libgcj.so.5.0.0) >> >> ? ?at _Jv_ThreadRun(java.lang.Thread) (/usr/lib64/libgcj.so.5.0.0) >> >> ? ?at _Jv_RunMain(java.lang.Class, byte const, int, byte const, >> >> boolean) (/usr/lib64/libgcj.so.5.0.0) >> >> ? ?at __gcj_personality_v0 (/home/skenny/java.version=1.4.2) >> >> ? ?at __libc_start_main (/lib64/tls/libc-2.3.4.so) >> >> ? ?at _Jv_RegisterClasses (/home/skenny/java.version=1.4.2) >> >> >> >> not sure why it's showing my java version as 1.4.2...when i'm logged >> >> into the headnode (or a worker node) my java version is much newer: >> >> >> >> [skenny at qb1 ~]$ java -version >> >> java version "1.6.0_20" >> >> Java(TM) SE Runtime Environment (build 1.6.0_20-b02) >> >> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) >> >> >> >> i also tried adding JAVA_HOME to my .bashrc on the site which had no effect. >> >> >> >> any idea how i can get swift to use the right java? >> >> >> >> using a slightly older swift (swift-r3116 cog-r2482)...don't know if >> >> this has maybe been fixed in a later version (?) >> >> >> >> >> >> thanks >> >> ~sk >> >> _______________________________________________ >> >> Swift-devel mailing list >> >> Swift-devel at ci.uchicago.edu >> >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > >> > >> > > > > From wilde at mcs.anl.gov Tue Jun 29 10:39:03 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 29 Jun 2010 10:39:03 -0500 (CDT) Subject: [Swift-devel] coasters and java on queen bee In-Reply-To: Message-ID: <3325503.1218841277825943880.JavaMail.root@zimbra> It seems like some interaction between coaster bootstrap, softenv (or lack thereof) and site conventions is causing gcj to be used. That seems to be in your PATH and you can try to explicitly put the correct JAVA in your PATH in .bashrc and/or .profile. Setting JAVA_HOME alone wont cause the correct Java to be run. - Mike ----- "Sarah Kenny" wrote: > nope :( > > On Mon, Jun 28, 2010 at 2:40 PM, Mihael Hategan > wrote: > > Ok. That might not work either. > > > > However, you may try to add the proper java executable to the PATH. > I > > think that might work better. > > > > On Mon, 2010-06-28 at 14:34 -0500, Sarah Kenny wrote: > >> tc.data had no effect...will try sites > >> > >> On Mon, Jun 28, 2010 at 2:30 PM, Mihael Hategan > wrote: > >> > Have you tried setting JAVA_HOME in tc.data or sites.xml? I have > a > >> > suspicion that might work. > >> > > >> > On Mon, 2010-06-28 at 14:11 -0500, Sarah Kenny wrote: > >> >> anyone else get this using coasters on queen bee? > >> >> > >> >> STDOUT: Warning: -jar not understood. Ignoring. > >> >> Exception in thread "main" java.lang.NoClassDefFoundError: > .tmp.bootstrap.Z29712 > >> >> ? ?at gnu.gcj.runtime.FirstThread.run() > (/usr/lib64/libgcj.so.5.0.0) > >> >> ? ?at _Jv_ThreadRun(java.lang.Thread) > (/usr/lib64/libgcj.so.5.0.0) > >> >> ? ?at _Jv_RunMain(java.lang.Class, byte const, int, byte const, > >> >> boolean) (/usr/lib64/libgcj.so.5.0.0) > >> >> ? ?at __gcj_personality_v0 (/home/skenny/java.version=1.4.2) > >> >> ? ?at __libc_start_main (/lib64/tls/libc-2.3.4.so) > >> >> ? ?at _Jv_RegisterClasses (/home/skenny/java.version=1.4.2) > >> >> > >> >> not sure why it's showing my java version as 1.4.2...when i'm > logged > >> >> into the headnode (or a worker node) my java version is much > newer: > >> >> > >> >> [skenny at qb1 ~]$ java -version > >> >> java version "1.6.0_20" > >> >> Java(TM) SE Runtime Environment (build 1.6.0_20-b02) > >> >> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) > >> >> > >> >> i also tried adding JAVA_HOME to my .bashrc on the site which > had no effect. > >> >> > >> >> any idea how i can get swift to use the right java? > >> >> > >> >> using a slightly older swift (swift-r3116 cog-r2482)...don't > know if > >> >> this has maybe been fixed in a later version (?) > >> >> > >> >> > >> >> thanks > >> >> ~sk > >> >> _______________________________________________ > >> >> Swift-devel mailing list > >> >> Swift-devel at ci.uchicago.edu > >> >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >> > > >> > > >> > > > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From aespinosa at cs.uchicago.edu Tue Jun 29 11:10:20 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Tue, 29 Jun 2010 11:10:20 -0500 Subject: [Swift-devel] spread changing or lots of big jobs left behind Message-ID: Based on the attached screenshot, swift now requested for 1036 nodes which is much greater than 198 of the maxnodes. The swift session log reports 1450 jobs submitted. I would have expected ~10360 jobs submitted since i use the default overallocation factor of 10. During the first few hours/ minutes of the workflow, swift was still submitting small jobs (10~20 workers per slot) along with the large request. Based on this observation, does swift request the entire spread per batch? Also, how does jobThrottle now factor into submitted jobs and corresponding slots? Latest session status: Progress: Initializing:7675 Submitted:1450 Failed:89 Finished successfully:7811 My foreach.max.threads=73 which translates to 5,329 concurrent jobs at a time. Here's my sites.xml entry: 14400 198 0.8 10 true short 1500.0 1.98 /gpfs/teraport/OSG/data/aespinosa/swift_scratch thanks, -Allan -- Allan M. Espinosa PhD student, Computer Science University of Chicago -------------- next part -------------- A non-text attachment was scrubbed... Name: spread.png Type: image/png Size: 8234 bytes Desc: not available URL: From skenny at uchicago.edu Tue Jun 29 11:21:52 2010 From: skenny at uchicago.edu (Sarah Kenny) Date: Tue, 29 Jun 2010 11:21:52 -0500 Subject: [Swift-devel] coasters and java on queen bee In-Reply-To: <3325503.1218841277825943880.JavaMail.root@zimbra> References: <3325503.1218841277825943880.JavaMail.root@zimbra> Message-ID: pre-pending the path via .bashrc,tc.data or sites.xml seems to have no effect...i was under the impression that others were successfully running coasters on queen bee, so thought someone might've dealt with this already...but perhaps i should try poking the loni admins to see if they can shed some light. thanks ~sk On Tue, Jun 29, 2010 at 10:39 AM, Michael Wilde wrote: > It seems like some interaction between coaster bootstrap, softenv (or lack thereof) and site conventions is causing gcj to be used. That seems to be in your PATH and you can try to explicitly put the correct JAVA in your PATH in .bashrc and/or .profile. Setting JAVA_HOME alone wont cause the correct Java to be run. > > - Mike > > ----- "Sarah Kenny" wrote: > >> nope :( >> >> On Mon, Jun 28, 2010 at 2:40 PM, Mihael Hategan >> wrote: >> > Ok. That might not work either. >> > >> > However, you may try to add the proper java executable to the PATH. >> I >> > think that might work better. >> > >> > On Mon, 2010-06-28 at 14:34 -0500, Sarah Kenny wrote: >> >> tc.data had no effect...will try sites >> >> >> >> On Mon, Jun 28, 2010 at 2:30 PM, Mihael Hategan >> wrote: >> >> > Have you tried setting JAVA_HOME in tc.data or sites.xml? I have >> a >> >> > suspicion that might work. >> >> > >> >> > On Mon, 2010-06-28 at 14:11 -0500, Sarah Kenny wrote: >> >> >> anyone else get this using coasters on queen bee? >> >> >> >> >> >> STDOUT: Warning: -jar not understood. Ignoring. >> >> >> Exception in thread "main" java.lang.NoClassDefFoundError: >> .tmp.bootstrap.Z29712 >> >> >> ? ?at gnu.gcj.runtime.FirstThread.run() >> (/usr/lib64/libgcj.so.5.0.0) >> >> >> ? ?at _Jv_ThreadRun(java.lang.Thread) >> (/usr/lib64/libgcj.so.5.0.0) >> >> >> ? ?at _Jv_RunMain(java.lang.Class, byte const, int, byte const, >> >> >> boolean) (/usr/lib64/libgcj.so.5.0.0) >> >> >> ? ?at __gcj_personality_v0 (/home/skenny/java.version=1.4.2) >> >> >> ? ?at __libc_start_main (/lib64/tls/libc-2.3.4.so) >> >> >> ? ?at _Jv_RegisterClasses (/home/skenny/java.version=1.4.2) >> >> >> >> >> >> not sure why it's showing my java version as 1.4.2...when i'm >> logged >> >> >> into the headnode (or a worker node) my java version is much >> newer: >> >> >> >> >> >> [skenny at qb1 ~]$ java -version >> >> >> java version "1.6.0_20" >> >> >> Java(TM) SE Runtime Environment (build 1.6.0_20-b02) >> >> >> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) >> >> >> >> >> >> i also tried adding JAVA_HOME to my .bashrc on the site which >> had no effect. >> >> >> >> >> >> any idea how i can get swift to use the right java? >> >> >> >> >> >> using a slightly older swift (swift-r3116 cog-r2482)...don't >> know if >> >> >> this has maybe been fixed in a later version (?) >> >> >> >> >> >> >> >> >> thanks >> >> >> ~sk >> >> >> _______________________________________________ >> >> >> Swift-devel mailing list >> >> >> Swift-devel at ci.uchicago.edu >> >> >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> >> > >> >> > >> >> > >> > >> > >> > >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > From wilde at mcs.anl.gov Tue Jun 29 11:28:15 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 29 Jun 2010 11:28:15 -0500 (CDT) Subject: [Swift-devel] coasters and java on queen bee In-Reply-To: Message-ID: <31773529.1222441277828895568.JavaMail.root@zimbra> Assuming you are starting coasters via GT2, I would suggest a few experiments with globus-job-rub to see how your path is being set up. Similarly if you are running locally or via ssh. I had similar problems on Abe quite a while ago, which is configured much like QueenBee, but much has changed since then both in coasters and possibly on the systems. I suspect we need to debug a bit more ourselves before we go to the sysadmins. - Mike ----- "Sarah Kenny" wrote: > pre-pending the path via .bashrc,tc.data or sites.xml seems to have > no > effect...i was under the impression that others were successfully > running coasters on queen bee, so thought someone might've dealt with > this already...but perhaps i should try poking the loni admins to see > if they can shed some light. > > thanks > ~sk > > On Tue, Jun 29, 2010 at 10:39 AM, Michael Wilde > wrote: > > It seems like some interaction between coaster bootstrap, softenv > (or lack thereof) and site conventions is causing gcj to be used. That > seems to be in your PATH and you can try to explicitly put the correct > JAVA in your PATH in .bashrc and/or .profile. Setting JAVA_HOME alone > wont cause the correct Java to be run. > > > > - Mike > > > > ----- "Sarah Kenny" wrote: > > > >> nope :( > >> > >> On Mon, Jun 28, 2010 at 2:40 PM, Mihael Hategan > > >> wrote: > >> > Ok. That might not work either. > >> > > >> > However, you may try to add the proper java executable to the > PATH. > >> I > >> > think that might work better. > >> > > >> > On Mon, 2010-06-28 at 14:34 -0500, Sarah Kenny wrote: > >> >> tc.data had no effect...will try sites > >> >> > >> >> On Mon, Jun 28, 2010 at 2:30 PM, Mihael Hategan > >> wrote: > >> >> > Have you tried setting JAVA_HOME in tc.data or sites.xml? I > have > >> a > >> >> > suspicion that might work. > >> >> > > >> >> > On Mon, 2010-06-28 at 14:11 -0500, Sarah Kenny wrote: > >> >> >> anyone else get this using coasters on queen bee? > >> >> >> > >> >> >> STDOUT: Warning: -jar not understood. Ignoring. > >> >> >> Exception in thread "main" java.lang.NoClassDefFoundError: > >> .tmp.bootstrap.Z29712 > >> >> >> ? ?at gnu.gcj.runtime.FirstThread.run() > >> (/usr/lib64/libgcj.so.5.0.0) > >> >> >> ? ?at _Jv_ThreadRun(java.lang.Thread) > >> (/usr/lib64/libgcj.so.5.0.0) > >> >> >> ? ?at _Jv_RunMain(java.lang.Class, byte const, int, byte > const, > >> >> >> boolean) (/usr/lib64/libgcj.so.5.0.0) > >> >> >> ? ?at __gcj_personality_v0 (/home/skenny/java.version=1.4.2) > >> >> >> ? ?at __libc_start_main (/lib64/tls/libc-2.3.4.so) > >> >> >> ? ?at _Jv_RegisterClasses (/home/skenny/java.version=1.4.2) > >> >> >> > >> >> >> not sure why it's showing my java version as 1.4.2...when > i'm > >> logged > >> >> >> into the headnode (or a worker node) my java version is much > >> newer: > >> >> >> > >> >> >> [skenny at qb1 ~]$ java -version > >> >> >> java version "1.6.0_20" > >> >> >> Java(TM) SE Runtime Environment (build 1.6.0_20-b02) > >> >> >> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed > mode) > >> >> >> > >> >> >> i also tried adding JAVA_HOME to my .bashrc on the site > which > >> had no effect. > >> >> >> > >> >> >> any idea how i can get swift to use the right java? > >> >> >> > >> >> >> using a slightly older swift (swift-r3116 cog-r2482)...don't > >> know if > >> >> >> this has maybe been fixed in a later version (?) > >> >> >> > >> >> >> > >> >> >> thanks > >> >> >> ~sk > >> >> >> _______________________________________________ > >> >> >> Swift-devel mailing list > >> >> >> Swift-devel at ci.uchicago.edu > >> >> >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >> >> > > >> >> > > >> >> > > >> > > >> > > >> > > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Tue Jun 29 11:33:30 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 29 Jun 2010 11:33:30 -0500 Subject: [Swift-devel] spread changing or lots of big jobs left behind In-Reply-To: References: Message-ID: <1277829210.2442.2.camel@blabla2.none> On Tue, 2010-06-29 at 11:10 -0500, Allan Espinosa wrote: > Based on the attached screenshot, swift now requested for 1036 nodes > which is much greater than 198 of the maxnodes. Maxnodes applies to one block. I suppose there should be maxNodes and maxBlockNodes. > The swift session log > reports 1450 jobs submitted. I would have expected ~10360 jobs > submitted since i use the default overallocation factor of 10. The overallocation applies in time not in breadth. > > During the first few hours/ minutes of the workflow, swift was still > submitting small jobs (10~20 workers per slot) along with the large > request. Based on this observation, does swift request the entire > spread per batch? Each round of block allocation will consider the available slots and the spread. > > Also, how does jobThrottle now factor into submitted jobs and > corresponding slots? It's orthogonal. Swift will submit jobs as it normally would. From aespinosa at cs.uchicago.edu Tue Jun 29 11:45:09 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Tue, 29 Jun 2010 11:45:09 -0500 Subject: [Swift-devel] spread changing or lots of big jobs left behind In-Reply-To: <1277829210.2442.2.camel@blabla2.none> References: <1277829210.2442.2.camel@blabla2.none> Message-ID: So if i want total workers in all blocks to be ~198, i can use the formula for an arithmetic series S_n = n/2 (a_1 + a_n) 198 = 10/2 (a_1+ a_10) a_1 + a_10 = 39.6 so having 38 or 40 for maxNodes can do the trick right? Thanks, -Allan 2010/6/29 Mihael Hategan : > > Maxnodes applies to one block. > > I suppose there should be maxNodes and maxBlockNodes. > From wilde at mcs.anl.gov Tue Jun 29 12:32:32 2010 From: wilde at mcs.anl.gov (wilde at mcs.anl.gov) Date: Tue, 29 Jun 2010 12:32:32 -0500 (CDT) Subject: [Swift-devel] coasters and java on queen bee In-Reply-To: <30231604.1224901277832052274.JavaMail.root@zimbra> Message-ID: <22341076.1225191277832752761.JavaMail.root@zimbra> With my login, globus-job-run gets gcj (see below). When I login to QueenBee, my default .soft entries give me Sun Java 1.6. When I use globus-job-run, neither .profile nor .bashrc seems to be run (as I placed PATH= statements in both). Mihael, is there any other provision to find the right Java for coaster service startup? - Mike --- Running from PADS: login1$ globus-job-run grid-qb.loni-lsu.teragrid.org:2120/jobmanager-fork /bin/sh -c 'echo $PATH' /usr/local/bin:/bin:/usr/bin login1$ login1$ globus-job-run grid-qb.loni-lsu.teragrid.org:2120/jobmanager-fork /bin/sh -c "type java" java is /usr/bin/java login1$ login1$ globus-job-run grid-qb.loni-lsu.teragrid.org:2120/jobmanager-fork /bin/sh -c "java -version" java version "1.4.2" gcj (GCC) 3.4.6 20060404 (Red Hat 3.4.6-8) Copyright (C) 2006 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. --- ----- "Michael Wilde" wrote: > Assuming you are starting coasters via GT2, I would suggest a few > experiments with globus-job-rub to see how your path is being set up. > > Similarly if you are running locally or via ssh. > > I had similar problems on Abe quite a while ago, which is configured > much like QueenBee, but much has changed since then both in coasters > and possibly on the systems. > > I suspect we need to debug a bit more ourselves before we go to the > sysadmins. > > - Mike > > ----- "Sarah Kenny" wrote: > > > pre-pending the path via .bashrc,tc.data or sites.xml seems to have > > no > > effect...i was under the impression that others were successfully > > running coasters on queen bee, so thought someone might've dealt > with > > this already...but perhaps i should try poking the loni admins to > see > > if they can shed some light. > > > > thanks > > ~sk > > > > On Tue, Jun 29, 2010 at 10:39 AM, Michael Wilde > > wrote: > > > It seems like some interaction between coaster bootstrap, softenv > > (or lack thereof) and site conventions is causing gcj to be used. > That > > seems to be in your PATH and you can try to explicitly put the > correct > > JAVA in your PATH in .bashrc and/or .profile. Setting JAVA_HOME > alone > > wont cause the correct Java to be run. > > > > > > - Mike > > > > > > ----- "Sarah Kenny" wrote: > > > > > >> nope :( > > >> > > >> On Mon, Jun 28, 2010 at 2:40 PM, Mihael Hategan > > > > >> wrote: > > >> > Ok. That might not work either. > > >> > > > >> > However, you may try to add the proper java executable to the > > PATH. > > >> I > > >> > think that might work better. > > >> > > > >> > On Mon, 2010-06-28 at 14:34 -0500, Sarah Kenny wrote: > > >> >> tc.data had no effect...will try sites > > >> >> > > >> >> On Mon, Jun 28, 2010 at 2:30 PM, Mihael Hategan > > >> wrote: > > >> >> > Have you tried setting JAVA_HOME in tc.data or sites.xml? I > > have > > >> a > > >> >> > suspicion that might work. > > >> >> > > > >> >> > On Mon, 2010-06-28 at 14:11 -0500, Sarah Kenny wrote: > > >> >> >> anyone else get this using coasters on queen bee? > > >> >> >> > > >> >> >> STDOUT: Warning: -jar not understood. Ignoring. > > >> >> >> Exception in thread "main" java.lang.NoClassDefFoundError: > > >> .tmp.bootstrap.Z29712 > > >> >> >> ? ?at gnu.gcj.runtime.FirstThread.run() > > >> (/usr/lib64/libgcj.so.5.0.0) > > >> >> >> ? ?at _Jv_ThreadRun(java.lang.Thread) > > >> (/usr/lib64/libgcj.so.5.0.0) > > >> >> >> ? ?at _Jv_RunMain(java.lang.Class, byte const, int, byte > > const, > > >> >> >> boolean) (/usr/lib64/libgcj.so.5.0.0) > > >> >> >> ? ?at __gcj_personality_v0 > (/home/skenny/java.version=1.4.2) > > >> >> >> ? ?at __libc_start_main (/lib64/tls/libc-2.3.4.so) > > >> >> >> ? ?at _Jv_RegisterClasses > (/home/skenny/java.version=1.4.2) > > >> >> >> > > >> >> >> not sure why it's showing my java version as 1.4.2...when > > i'm > > >> logged > > >> >> >> into the headnode (or a worker node) my java version is > much > > >> newer: > > >> >> >> > > >> >> >> [skenny at qb1 ~]$ java -version > > >> >> >> java version "1.6.0_20" > > >> >> >> Java(TM) SE Runtime Environment (build 1.6.0_20-b02) > > >> >> >> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed > > mode) > > >> >> >> > > >> >> >> i also tried adding JAVA_HOME to my .bashrc on the site > > which > > >> had no effect. > > >> >> >> > > >> >> >> any idea how i can get swift to use the right java? > > >> >> >> > > >> >> >> using a slightly older swift (swift-r3116 > cog-r2482)...don't > > >> know if > > >> >> >> this has maybe been fixed in a later version (?) > > >> >> >> > > >> >> >> > > >> >> >> thanks > > >> >> >> ~sk > > >> >> >> _______________________________________________ > > >> >> >> Swift-devel mailing list > > >> >> >> Swift-devel at ci.uchicago.edu > > >> >> >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > >> >> > > > >> >> > > > >> >> > > > >> > > > >> > > > >> > > > >> _______________________________________________ > > >> Swift-devel mailing list > > >> Swift-devel at ci.uchicago.edu > > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > -- > > > Michael Wilde > > > Computation Institute, University of Chicago > > > Mathematics and Computer Science Division > > > Argonne National Laboratory > > > > > > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Tue Jun 29 13:04:56 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 29 Jun 2010 13:04:56 -0500 Subject: [Swift-devel] coasters and java on queen bee In-Reply-To: <22341076.1225191277832752761.JavaMail.root@zimbra> References: <22341076.1225191277832752761.JavaMail.root@zimbra> Message-ID: <1277834696.3323.2.camel@blabla2.none> On Tue, 2010-06-29 at 12:32 -0500, wilde at mcs.anl.gov wrote: > With my login, globus-job-run gets gcj (see below). When I login to QueenBee, my default .soft entries give me Sun Java 1.6. > > When I use globus-job-run, neither .profile nor .bashrc seems to be run (as I placed PATH= statements in both). > > Mihael, is there any other provision to find the right Java for coaster service startup? The addition of it to PATH should be sufficient as far as I can tell. I can try to debug it. Skenny, can you pass me your sites file? Mihael From wilde at mcs.anl.gov Tue Jun 29 15:45:54 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 29 Jun 2010 15:45:54 -0500 (CDT) Subject: [Swift-devel] coasters and java on queen bee In-Reply-To: <1277834696.3323.2.camel@blabla2.none> Message-ID: <9217991.1236241277844354307.JavaMail.root@zimbra> I added this line to sites.xml, and now I see a PBS job queued on QueenBee, which I *think* means that the coaster service was able to start: /usr/local/compilers/jdk/bin - Mike ----- "Mihael Hategan" wrote: > On Tue, 2010-06-29 at 12:32 -0500, wilde at mcs.anl.gov wrote: > > With my login, globus-job-run gets gcj (see below). When I login to > QueenBee, my default .soft entries give me Sun Java 1.6. > > > > When I use globus-job-run, neither .profile nor .bashrc seems to be > run (as I placed PATH= statements in both). > > > > Mihael, is there any other provision to find the right Java for > coaster service startup? > > The addition of it to PATH should be sufficient as far as I can tell. > > I can try to debug it. > > Skenny, can you pass me your sites file? > > Mihael -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wozniak at mcs.anl.gov Tue Jun 29 16:26:53 2010 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Tue, 29 Jun 2010 16:26:53 -0500 (CDT) Subject: [Swift-devel] Logging changes Message-ID: Hello I just committed some changes in trunk that change the way log output looks by default. The overall idea is to make it easier for application users to debug scripts. All of the previous log messages can be generated by modifying log4j.properties in the normal way to produce DEBUG level messages. This will be necessary to make use of log processing scripts such as the plotting tools. See this page for more notes: http://www.ci.uchicago.edu/wiki/bin/view/SWFT/SwiftLog Justin -- Justin M Wozniak From dk0966 at cs.ship.edu Wed Jun 30 06:29:48 2010 From: dk0966 at cs.ship.edu (David Kelly) Date: Wed, 30 Jun 2010 07:29:48 -0400 Subject: [Swift-devel] Re: Swift configuration interface In-Reply-To: <1277145574.4729.3.camel@blabla2.none> References: <22726276.770581276704176196.JavaMail.root@zimbra> <1277136066.3882.2.camel@blabla2.none> <1277145574.4729.3.camel@blabla2.none> Message-ID: Hello all, Here are some updates on swiftconfig. After thinking about how to make the command line simpler, I ended making some changes to the old design rather than rewriting a new one. It's similar to the original, but should hopefully makes things a little simpler by allowing smaller, simpler changes. swiftconfig -add copies over the template, swiftconfig -modify -foo bar allows the user to change one setting. Here's an example of how it works so far: # See what's available to use $ swiftconfig -templates multisite-ssh teraport pads-pbs-coasters-ssh teraport-remote-pbs-coasters-ssh teraport-local-pbs-coasters pads-local-pbs-coasters pads-remote-ssh teraport-remote-ssh pads-local-pbs teraport-local-pbs pads-pbs-multisite-coasters teraport-pbs-multisite-coasters $ swiftconfig -add pads-remote-ssh $ swiftconfig -modify pads-remote-ssh -directory /home/david/swiftwork Sites.xml entry 0 /home/david/swiftwork This is checked into SVN at https://svn.ci.uchicago.edu/svn/vdl2/usertools/swift/swiftconfig. I've also been doing some preliminary work on the web version. If you want to get an idea of the interface, check http://www.ci.uchicago.edu/~davidk/test.pl. Ideally, at the edit screen next to each option should be a question mark. Then clicking on the question mark takes you to an anchor in the user guide (or maybe a pop-up from the same page) with a detailed explanation of what that setting actually does. I still need to get profiles working, an integrated web server, deletion from the web, more options, linking to user guide, cleanup, tuning, more.. but I just wanted to get this out to you all to see what you thought so far David -------------- next part -------------- An HTML attachment was scrubbed... URL: From skenny at uchicago.edu Wed Jun 30 11:45:27 2010 From: skenny at uchicago.edu (Sarah Kenny) Date: Wed, 30 Jun 2010 11:45:27 -0500 Subject: [Swift-devel] coasters and java on queen bee In-Reply-To: <9217991.1236241277844354307.JavaMail.root@zimbra> References: <1277834696.3323.2.camel@blabla2.none> <9217991.1236241277844354307.JavaMail.root@zimbra> Message-ID: yeah, you definitely got further than me if you're in the queue...but i can't replicate this (even when i update to the latest swift) can you post your whole sites and tc files? thanks ~sk On Tue, Jun 29, 2010 at 3:45 PM, Michael Wilde wrote: > I added this line to sites.xml, and now I see a PBS job queued on QueenBee, which I *think* means that the coaster service was able to start: > > /usr/local/compilers/jdk/bin > > - Mike > > > ----- "Mihael Hategan" wrote: > >> On Tue, 2010-06-29 at 12:32 -0500, wilde at mcs.anl.gov wrote: >> > With my login, globus-job-run gets gcj (see below). When I login to >> QueenBee, my default .soft entries give me Sun Java 1.6. >> > >> > When I use globus-job-run, neither .profile nor .bashrc seems to be >> run (as I placed PATH= statements in both). >> > >> > Mihael, is there any other provision to find the right Java for >> coaster service startup? >> >> The addition of it to PATH should be sufficient as far as I can tell. >> >> I can try to debug it. >> >> Skenny, can you pass me your sites file? >> >> Mihael > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > From wilde at mcs.anl.gov Wed Jun 30 11:51:22 2010 From: wilde at mcs.anl.gov (wilde at mcs.anl.gov) Date: Wed, 30 Jun 2010 11:51:22 -0500 (CDT) Subject: [Swift-devel] coasters and java on queen bee In-Reply-To: <24213253.1260271277916503418.JavaMail.root@zimbra> Message-ID: <16195865.1260411277916682339.JavaMail.root@zimbra> Yup, job finished, so seems to have basic sanity. Sites entry is below. I used the SIDGrid TG DBS acct, and my Swift build: /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin - Mike 8 1 1 1 .15 10000 /usr/local/compilers/jdk/bin TG-DBSACCTHERE /home/ux454325/swiftwork ----- "Sarah Kenny" wrote: > yeah, you definitely got further than me if you're in the queue...but > i can't replicate this (even when i update to the latest swift) can > you post your whole sites and tc files? > > thanks > ~sk > > On Tue, Jun 29, 2010 at 3:45 PM, Michael Wilde > wrote: > > I added this line to sites.xml, and now I see a PBS job queued on > QueenBee, which I *think* means that the coaster service was able to > start: > > > > key="PATHPPREFIX">/usr/local/compilers/jdk/bin > > > > - Mike > > > > > > ----- "Mihael Hategan" wrote: > > > >> On Tue, 2010-06-29 at 12:32 -0500, wilde at mcs.anl.gov wrote: > >> > With my login, globus-job-run gets gcj (see below). When I login > to > >> QueenBee, my default .soft entries give me Sun Java 1.6. > >> > > >> > When I use globus-job-run, neither .profile nor .bashrc seems to > be > >> run (as I placed PATH= statements in both). > >> > > >> > Mihael, is there any other provision to find the right Java for > >> coaster service startup? > >> > >> The addition of it to PATH should be sufficient as far as I can > tell. > >> > >> I can try to debug it. > >> > >> Skenny, can you pass me your sites file? > >> > >> Mihael > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From skenny at uchicago.edu Wed Jun 30 12:16:51 2010 From: skenny at uchicago.edu (Sarah Kenny) Date: Wed, 30 Jun 2010 12:16:51 -0500 Subject: [Swift-devel] coasters and java on queen bee In-Reply-To: <16195865.1260411277916682339.JavaMail.root@zimbra> References: <24213253.1260271277916503418.JavaMail.root@zimbra> <16195865.1260411277916682339.JavaMail.root@zimbra> Message-ID: awesome, using your sites file it's working...although, weird thing is it wasn't the java line at all. it looks like it has to do with specifying the port on the coaster line. i usually don't specify port for other sites (e.g. abe) but i tried that and it worked. in fact, even if i remove the PATHPREFIX setting in your sties file it still works as long as port is specified, if i remove the port specification it fails as before...not sure how/why the port affects the java version but that seems to be the case: anyway, it's running...thanks! ~sk On Wed, Jun 30, 2010 at 11:51 AM, wrote: > Yup, job finished, so seems to have basic sanity. > > Sites entry is below. I used the SIDGrid TG DBS acct, and my Swift build: > ?/home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin > > - Mike > > > ? > > ? ? > > ? ?8 > ? ?1 > ? ?1 > ? ?1 > ? ?.15 > ? ?10000 > ? ?/usr/local/compilers/jdk/bin > > ? ?TG-DBSACCTHERE > > ? ? > ? ?/home/ux454325/swiftwork > > ? > > > > > ----- "Sarah Kenny" wrote: > >> yeah, you definitely got further than me if you're in the queue...but >> i can't replicate this (even when i update to the latest swift) can >> you post your whole sites and tc files? >> >> thanks >> ~sk >> >> On Tue, Jun 29, 2010 at 3:45 PM, Michael Wilde >> wrote: >> > I added this line to sites.xml, and now I see a PBS job queued on >> QueenBee, which I *think* means that the coaster service was able to >> start: >> > >> > > key="PATHPPREFIX">/usr/local/compilers/jdk/bin >> > >> > - Mike >> > >> > >> > ----- "Mihael Hategan" wrote: >> > >> >> On Tue, 2010-06-29 at 12:32 -0500, wilde at mcs.anl.gov wrote: >> >> > With my login, globus-job-run gets gcj (see below). When I login >> to >> >> QueenBee, my default .soft entries give me Sun Java 1.6. >> >> > >> >> > When I use globus-job-run, neither .profile nor .bashrc seems to >> be >> >> run (as I placed PATH= statements in both). >> >> > >> >> > Mihael, is there any other provision to find the right Java for >> >> coaster service startup? >> >> >> >> The addition of it to PATH should be sufficient as far as I can >> tell. >> >> >> >> I can try to debug it. >> >> >> >> Skenny, can you pass me your sites file? >> >> >> >> Mihael >> > >> > -- >> > Michael Wilde >> > Computation Institute, University of Chicago >> > Mathematics and Computer Science Division >> > Argonne National Laboratory >> > >> > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > From wilde at mcs.anl.gov Wed Jun 30 12:39:53 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 30 Jun 2010 12:39:53 -0500 (CDT) Subject: [Swift-devel] coasters and java on queen bee In-Reply-To: Message-ID: <24009976.1262431277919593870.JavaMail.root@zimbra> Hmm. Note that the ports below get you to the GT5 GRAM service at QueenBee. One possibility is that perhaps the services differ in whether they run softenv, .bashrc, etc? Needs more experimentation to understand the discrepancies we are seeing. Maybe you can try the different URIs manually and see what PATH gets set up, if it seems that the discrepancy is related to that. - Mike ----- "Sarah Kenny" wrote: > awesome, using your sites file it's working...although, weird thing > is > it wasn't the java line at all. it looks like it has to do with > specifying the port on the coaster line. i usually don't specify port > for other sites (e.g. abe) but i tried that and it worked. in fact, > even if i remove the PATHPREFIX setting in your sties file it still > works as long as port is specified, if i remove the port > specification > it fails as before...not sure how/why the port affects the java > version but that seems to be the case: > > jobManager="gt2:gt2:PBS"/> > > url="grid-qb.loni-lsu.teragrid.org:2120" > jobmanager="gt2:gt2:pbs"/> > > anyway, it's running...thanks! > > ~sk > > On Wed, Jun 30, 2010 at 11:51 AM, wrote: > > Yup, job finished, so seems to have basic sanity. > > > > Sites entry is below. I used the SIDGrid TG DBS acct, and my Swift > build: > > ?/home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin > > > > - Mike > > > > > > ? > > > > ? ? url="grid-qb.loni-lsu.teragrid.org:2120" jobmanager="gt2:gt2:pbs"/> > > > > ? ?8 > > ? ?1 > > ? ?1 > > ? ?1 > > ? ?.15 > > ? ?10000 > > ? ? key="PATHPPREFIX">/usr/local/compilers/jdk/bin > > > > ? ? key="project">TG-DBSACCTHERE > > > > ? ? > > ? ?/home/ux454325/swiftwork > > > > ? > > > > > > > > > > ----- "Sarah Kenny" wrote: > > > >> yeah, you definitely got further than me if you're in the > queue...but > >> i can't replicate this (even when i update to the latest swift) > can > >> you post your whole sites and tc files? > >> > >> thanks > >> ~sk > >> > >> On Tue, Jun 29, 2010 at 3:45 PM, Michael Wilde > >> wrote: > >> > I added this line to sites.xml, and now I see a PBS job queued > on > >> QueenBee, which I *think* means that the coaster service was able > to > >> start: > >> > > >> > >> key="PATHPPREFIX">/usr/local/compilers/jdk/bin > >> > > >> > - Mike > >> > > >> > > >> > ----- "Mihael Hategan" wrote: > >> > > >> >> On Tue, 2010-06-29 at 12:32 -0500, wilde at mcs.anl.gov wrote: > >> >> > With my login, globus-job-run gets gcj (see below). When I > login > >> to > >> >> QueenBee, my default .soft entries give me Sun Java 1.6. > >> >> > > >> >> > When I use globus-job-run, neither .profile nor .bashrc seems > to > >> be > >> >> run (as I placed PATH= statements in both). > >> >> > > >> >> > Mihael, is there any other provision to find the right Java > for > >> >> coaster service startup? > >> >> > >> >> The addition of it to PATH should be sufficient as far as I can > >> tell. > >> >> > >> >> I can try to debug it. > >> >> > >> >> Skenny, can you pass me your sites file? > >> >> > >> >> Mihael > >> > > >> > -- > >> > Michael Wilde > >> > Computation Institute, University of Chicago > >> > Mathematics and Computer Science Division > >> > Argonne National Laboratory > >> > > >> > > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From skenny at uchicago.edu Wed Jun 30 13:07:25 2010 From: skenny at uchicago.edu (Sarah Kenny) Date: Wed, 30 Jun 2010 13:07:25 -0500 Subject: [Swift-devel] coasters and java on queen bee In-Reply-To: <24009976.1262431277919593870.JavaMail.root@zimbra> References: <24009976.1262431277919593870.JavaMail.root@zimbra> Message-ID: one thing i can tell currently is that it's NOT running my .bashrc using that port. On Wed, Jun 30, 2010 at 12:39 PM, Michael Wilde wrote: > Hmm. Note that the ports below get you to the GT5 GRAM service at QueenBee. > > One possibility is that perhaps the services differ in whether they run softenv, .bashrc, etc? ?Needs more experimentation to understand the discrepancies we are seeing. > > Maybe you can try the different URIs manually and see what PATH gets set up, if it seems that the discrepancy is related to that. > > - Mike > > > > ----- "Sarah Kenny" wrote: > >> awesome, using your sites file it's working...although, weird thing >> is >> it wasn't the java line at all. it looks like it has to do with >> specifying the port on the coaster line. i usually don't specify port >> for other sites (e.g. abe) but i tried that and it worked. in fact, >> even if i remove the PATHPREFIX setting in your sties file it still >> works as long as port is specified, if i remove the port >> specification >> it fails as before...not sure how/why the port affects the java >> version but that seems to be the case: >> >> > jobManager="gt2:gt2:PBS"/> >> >> > url="grid-qb.loni-lsu.teragrid.org:2120" >> jobmanager="gt2:gt2:pbs"/> >> >> anyway, it's running...thanks! >> >> ~sk >> >> On Wed, Jun 30, 2010 at 11:51 AM, ? wrote: >> > Yup, job finished, so seems to have basic sanity. >> > >> > Sites entry is below. I used the SIDGrid TG DBS acct, and my Swift >> build: >> > ?/home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin >> > >> > - Mike >> > >> > >> > ? >> > >> > ? ?> url="grid-qb.loni-lsu.teragrid.org:2120" jobmanager="gt2:gt2:pbs"/> >> > >> > ? ?8 >> > ? ?1 >> > ? ?1 >> > ? ?1 >> > ? ?.15 >> > ? ?10000 >> > ? ?> key="PATHPPREFIX">/usr/local/compilers/jdk/bin >> > >> > ? ?> key="project">TG-DBSACCTHERE >> > >> > ? ? >> > ? ?/home/ux454325/swiftwork >> > >> > ? >> > >> > >> > >> > >> > ----- "Sarah Kenny" wrote: >> > >> >> yeah, you definitely got further than me if you're in the >> queue...but >> >> i can't replicate this (even when i update to the latest swift) >> can >> >> you post your whole sites and tc files? >> >> >> >> thanks >> >> ~sk >> >> >> >> On Tue, Jun 29, 2010 at 3:45 PM, Michael Wilde >> >> wrote: >> >> > I added this line to sites.xml, and now I see a PBS job queued >> on >> >> QueenBee, which I *think* means that the coaster service was able >> to >> >> start: >> >> > >> >> > > >> key="PATHPPREFIX">/usr/local/compilers/jdk/bin >> >> > >> >> > - Mike >> >> > >> >> > >> >> > ----- "Mihael Hategan" wrote: >> >> > >> >> >> On Tue, 2010-06-29 at 12:32 -0500, wilde at mcs.anl.gov wrote: >> >> >> > With my login, globus-job-run gets gcj (see below). When I >> login >> >> to >> >> >> QueenBee, my default .soft entries give me Sun Java 1.6. >> >> >> > >> >> >> > When I use globus-job-run, neither .profile nor .bashrc seems >> to >> >> be >> >> >> run (as I placed PATH= statements in both). >> >> >> > >> >> >> > Mihael, is there any other provision to find the right Java >> for >> >> >> coaster service startup? >> >> >> >> >> >> The addition of it to PATH should be sufficient as far as I can >> >> tell. >> >> >> >> >> >> I can try to debug it. >> >> >> >> >> >> Skenny, can you pass me your sites file? >> >> >> >> >> >> Mihael >> >> > >> >> > -- >> >> > Michael Wilde >> >> > Computation Institute, University of Chicago >> >> > Mathematics and Computer Science Division >> >> > Argonne National Laboratory >> >> > >> >> > >> > >> > -- >> > Michael Wilde >> > Computation Institute, University of Chicago >> > Mathematics and Computer Science Division >> > Argonne National Laboratory >> > >> > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > From hategan at mcs.anl.gov Wed Jun 30 23:12:51 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 30 Jun 2010 23:12:51 -0500 Subject: [Swift-devel] manual coasters Message-ID: <1277957571.15423.8.camel@blabla2.none> Manual coasters are in trunk. I did some limited testing on localhost. The basic idea is that you say passive in sites.xml. Other than that you may want to set workersPerNode, but the other options are useless. Then, when swift starts the coaster service, it will print the URL of that on stderr. You carefully dig for worker.pl and then launch it in whatever way you like: worker.pl The blockid can be whatever you want, but it can be used to group workers in the traditional blocks. The logdir is where you want the worker logs to go. They are all mandatory. When workers connect to the service, the service should start shipping jobs to them. When the service is shut down, it will also try to shut down the workers (they are useless anyway at that point), but it cannot control the LRM jobs, so it may fail to do so (or rather said, it is more likely to fail to do so). Mihael From glen842 at uchicago.edu Wed Jun 30 13:18:35 2010 From: glen842 at uchicago.edu (Glen Hocky) Date: Wed, 30 Jun 2010 18:18:35 -0000 Subject: [Swift-devel] coasters and java on queen bee In-Reply-To: References: <24009976.1262431277919593870.JavaMail.root@zimbra> Message-ID: Not sure if this helps, but I once had a problem with .bashrc not being sourced on a machine so I just did something like run.sh: > #!/bin/bash source ~/.basrhc "$*" and then you can do something like globus-job-run ... run.sh COMMAND On Wed, Jun 30, 2010 at 2:07 PM, Sarah Kenny wrote: > one thing i can tell currently is that it's NOT running my .bashrc > using that port. > > On Wed, Jun 30, 2010 at 12:39 PM, Michael Wilde wrote: > > Hmm. Note that the ports below get you to the GT5 GRAM service at > QueenBee. > > > > One possibility is that perhaps the services differ in whether they run > softenv, .bashrc, etc? Needs more experimentation to understand the > discrepancies we are seeing. > > > > Maybe you can try the different URIs manually and see what PATH gets set > up, if it seems that the discrepancy is related to that. > > > > - Mike > > > > > > > > ----- "Sarah Kenny" wrote: > > > >> awesome, using your sites file it's working...although, weird thing > >> is > >> it wasn't the java line at all. it looks like it has to do with > >> specifying the port on the coaster line. i usually don't specify port > >> for other sites (e.g. abe) but i tried that and it worked. in fact, > >> even if i remove the PATHPREFIX setting in your sties file it still > >> works as long as port is specified, if i remove the port > >> specification > >> it fails as before...not sure how/why the port affects the java > >> version but that seems to be the case: > >> > >> >> jobManager="gt2:gt2:PBS"/> > >> > >> >> url="grid-qb.loni-lsu.teragrid.org:2120" > >> jobmanager="gt2:gt2:pbs"/> > >> > >> anyway, it's running...thanks! > >> > >> ~sk > >> > >> On Wed, Jun 30, 2010 at 11:51 AM, wrote: > >> > Yup, job finished, so seems to have basic sanity. > >> > > >> > Sites entry is below. I used the SIDGrid TG DBS acct, and my Swift > >> build: > >> > /home/wilde/swift/src/stable/cog/modules/swift/dist/swift-svn/bin > >> > > >> > - Mike > >> > > >> > > >> > > >> > > >> > >> url="grid-qb.loni-lsu.teragrid.org:2120" jobmanager="gt2:gt2:pbs"/> > >> > > >> > 8 > >> > 1 > >> > 1 > >> > 1 > >> > .15 > >> > 10000 > >> > >> key="PATHPPREFIX">/usr/local/compilers/jdk/bin > >> > > >> > >> key="project">TG-DBSACCTHERE > >> > > >> > > >> > /home/ux454325/swiftwork > >> > > >> > > >> > > >> > > >> > > >> > > >> > ----- "Sarah Kenny" wrote: > >> > > >> >> yeah, you definitely got further than me if you're in the > >> queue...but > >> >> i can't replicate this (even when i update to the latest swift) > >> can > >> >> you post your whole sites and tc files? > >> >> > >> >> thanks > >> >> ~sk > >> >> > >> >> On Tue, Jun 29, 2010 at 3:45 PM, Michael Wilde > >> >> wrote: > >> >> > I added this line to sites.xml, and now I see a PBS job queued > >> on > >> >> QueenBee, which I *think* means that the coaster service was able > >> to > >> >> start: > >> >> > > >> >> > >> >> key="PATHPPREFIX">/usr/local/compilers/jdk/bin > >> >> > > >> >> > - Mike > >> >> > > >> >> > > >> >> > ----- "Mihael Hategan" wrote: > >> >> > > >> >> >> On Tue, 2010-06-29 at 12:32 -0500, wilde at mcs.anl.gov wrote: > >> >> >> > With my login, globus-job-run gets gcj (see below). When I > >> login > >> >> to > >> >> >> QueenBee, my default .soft entries give me Sun Java 1.6. > >> >> >> > > >> >> >> > When I use globus-job-run, neither .profile nor .bashrc seems > >> to > >> >> be > >> >> >> run (as I placed PATH= statements in both). > >> >> >> > > >> >> >> > Mihael, is there any other provision to find the right Java > >> for > >> >> >> coaster service startup? > >> >> >> > >> >> >> The addition of it to PATH should be sufficient as far as I can > >> >> tell. > >> >> >> > >> >> >> I can try to debug it. > >> >> >> > >> >> >> Skenny, can you pass me your sites file? > >> >> >> > >> >> >> Mihael > >> >> > > >> >> > -- > >> >> > Michael Wilde > >> >> > Computation Institute, University of Chicago > >> >> > Mathematics and Computer Science Division > >> >> > Argonne National Laboratory > >> >> > > >> >> > > >> > > >> > -- > >> > Michael Wilde > >> > Computation Institute, University of Chicago > >> > Mathematics and Computer Science Division > >> > Argonne National Laboratory > >> > > >> > > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: