From onionknigh at gmail.com Thu Apr 1 15:05:46 2010 From: onionknigh at gmail.com (Rickard Westerlund) Date: Thu, 1 Apr 2010 22:05:46 +0200 Subject: [Swift-user] Problems running example scripts Message-ID: Hello, I've got problems running swift scripts on both Windows and Linux. On Windows I get errors that indicate my system is expected to be Unix, output here: http://pastebin.com/CqwFrxDx Is there something wrong with my configuration or is there a lack of Windows support? On Linux I can't get the regexp.swift and foreach.swift to run properly, they both complain about the character '1' in the regexp transform. Changing the expression to "\\1count" from "\1count" seems to make it compile, but then I get the following output: http://pastebin.com/NDWK4541 Every other example works fine though. --- Rickard Westerlund From hategan at mcs.anl.gov Thu Apr 1 15:16:57 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 01 Apr 2010 15:16:57 -0500 Subject: [Swift-user] Problems running example scripts In-Reply-To: References: Message-ID: <1270153017.10351.1.camel@localhost> On Thu, 2010-04-01 at 22:05 +0200, Rickard Westerlund wrote: > Hello, > I've got problems running swift scripts on both Windows and Linux. On > Windows I get errors that indicate my system is expected to be Unix, > output here: http://pastebin.com/CqwFrxDx > Is there something wrong with my configuration or is there a lack of > Windows support? See this: http://www.ci.uchicago.edu/swift/guides/userguide.php#tips.windows > > On Linux I can't get the regexp.swift and foreach.swift to run > properly, they both complain about the character '1' in the regexp > transform. Changing the expression to "\\1count" from "\1count" seems > to make it compile, but then I get the following output: > http://pastebin.com/NDWK4541 That seems to indicate that your tc.data does not have an entry for "wc". Is that correct? Mihael From hategan at mcs.anl.gov Thu Apr 1 17:03:46 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 01 Apr 2010 17:03:46 -0500 Subject: [Swift-user] Problems running example scripts In-Reply-To: References: <1270153017.10351.1.camel@localhost> Message-ID: <1270159426.14641.2.camel@localhost> On Thu, 2010-04-01 at 23:57 +0200, Rickard Westerlund wrote: > On Thu, Apr 1, 2010 at 10:16 PM, Mihael Hategan wrote: > > See this: > > http://www.ci.uchicago.edu/swift/guides/userguide.php#tips.windows > That certainly helped, now I'm just getting a "execution failled: > exception in echo" error and the job failing with an exit code of 252. > I tried changing paths in tc.data but that didn't help. Removing the > line for echo gives the "could not find any valid host for task" > error. Right. Most of the examples will probably fail to work because the standard unix executables don't exist in windows (unless you install cygwin). So you would need to use some valid windows executables there and write swift scripts that make sense for them. > > > That seems to indicate that your tc.data does not have an entry for > > "wc". Is that correct? > Yep, adding an entry for it solved it. Ok. Good. From onionknigh at gmail.com Thu Apr 1 17:17:46 2010 From: onionknigh at gmail.com (Rickard Westerlund) Date: Fri, 2 Apr 2010 00:17:46 +0200 Subject: [Swift-user] Problems running example scripts In-Reply-To: <1270159426.14641.2.camel@localhost> References: <1270153017.10351.1.camel@localhost> <1270159426.14641.2.camel@localhost> Message-ID: On Fri, Apr 2, 2010 at 12:03 AM, Mihael Hategan wrote: > Right. Most of the examples will probably fail to work because the > standard unix executables don't exist in windows (unless you install > cygwin). I see. I have gnuwin installed and had tried to use the echo from there, but got the same error. Also tried to use DOS naming to avoid spaces. Guess I'll have to find a format that works. Thanks for your fast responses. --- Rickard Westerlund From hategan at mcs.anl.gov Thu Apr 1 17:28:53 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 01 Apr 2010 17:28:53 -0500 Subject: [Swift-user] Problems running example scripts In-Reply-To: References: <1270153017.10351.1.camel@localhost> <1270159426.14641.2.camel@localhost> Message-ID: <1270160933.16404.0.camel@localhost> On Fri, 2010-04-02 at 00:17 +0200, Rickard Westerlund wrote: > On Fri, Apr 2, 2010 at 12:03 AM, Mihael Hategan wrote: > > Right. Most of the examples will probably fail to work because the > > standard unix executables don't exist in windows (unless you install > > cygwin). > I see. I have gnuwin installed and had tried to use the echo from > there, but got the same error. Also tried to use DOS naming to avoid > spaces. Guess I'll have to find a format that works. Make sure you include the extension (i.e. echo.exe) in tc.data. From wilde at mcs.anl.gov Thu Apr 1 23:44:00 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 1 Apr 2010 23:44:00 -0500 (CDT) Subject: [Swift-user] Article on Swift in SciDAC Review Message-ID: <10961150.1131270183440578.JavaMail.root@zimbra> This just came out: http://www.scidacreview.org/1002/html/swift.html - Mike -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From amilamad at gmail.com Fri Apr 2 12:36:51 2010 From: amilamad at gmail.com (Amila Madusanka) Date: Fri, 2 Apr 2010 23:06:51 +0530 Subject: [Swift-user] Can`t run the first.swift Message-ID: when runing -> swift first.swift I get the following massage Execution failed: File not found scheduler.xml please help -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Fri Apr 2 13:03:17 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 02 Apr 2010 13:03:17 -0500 Subject: [Swift-user] Can`t run the first.swift In-Reply-To: References: Message-ID: <1270231397.2398.1.camel@localhost> Hi, You'll need to provide more details than that. Such as what syestem/OS you're trying to run this on, what you did to download/compile/install Swift, perhaps the log file that the failed run produced, etc. Mihael On Fri, 2010-04-02 at 23:06 +0530, Amila Madusanka wrote: > when runing -> swift first.swift I get the following massage > > > Execution failed: > File not found scheduler.xml > > > > > please help > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user From hategan at mcs.anl.gov Fri Apr 2 13:13:22 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 02 Apr 2010 13:13:22 -0500 Subject: [Swift-user] Can`t run the first.swift In-Reply-To: References: <1270231397.2398.1.camel@localhost> Message-ID: <1270232002.4159.3.camel@localhost> On Fri, 2010-04-02 at 23:38 +0530, Amila Madusanka wrote: > I am new to the swift.The OS is Windows Vista Business Edition.I > extract the swift 0.8 and set the environment variables to the bin. Which environment variables? Please be specific. How are you launching swift (i.e. what exact commands are you typing)? > But for each example it gives > > Execution failed: > > File not found scheduler.xml > Yes. You already mentioned that. Please read this: http://www.chiark.greenend.org.uk/~sgtatham/bugs.html Mihael From hategan at mcs.anl.gov Fri Apr 2 15:01:16 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 02 Apr 2010 15:01:16 -0500 Subject: [Swift-user] Can`t run the first.swift In-Reply-To: References: <1270231397.2398.1.camel@localhost> <1270232002.4159.3.camel@localhost> Message-ID: <1270238476.27377.6.camel@localhost> On Fri, 2010-04-02 at 23:49 +0530, Amila Madusanka wrote: > I set the "path" environment variable. > And go to example dir and use command "swift first.swift" (As in > Swift Quickstart Guide) 1. You should keep the mailing list CC-ed. 2. We don't support windows much. It's more like we try to make it run there, but there are no guarantees. So you have two choices: a) Try to troubleshoot and fix the problems on windows, which would become part of your GSOC contributions to the project. b) Install cygwin or use linux 3. I don't think Swift 0.8 works with Windows. I would recommend trying the SVN version, either the stable branch or trunk. If you are going to be a GSOC student for Swift, that is the code you will work with. 4. Read http://www.ci.uchicago.edu/swift/guides/userguide.php#tips.windows Mihael From wozniak at mcs.anl.gov Mon Apr 5 10:11:19 2010 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Mon, 5 Apr 2010 10:11:19 -0500 (CDT) Subject: [Swift-user] @tostring? In-Reply-To: <9408326.381681269727849710.JavaMail.root@zimbra> References: <9408326.381681269727849710.JavaMail.root@zimbra> Message-ID: On Sat, 27 Mar 2010, wilde at mcs.anl.gov wrote: > (3) Justin has added an @java() primitive to the development trunk; so > if there's a Java method among standard or Swift Java classes that you > can call to do a needed string operation, that may be useful. I cant > recall if that was posted to one of the lists or not. Sorry for the delay. Here is the link to the @java note: http://mail.ci.uchicago.edu/pipermail/swift-user/2010-March/001382.html Also, I implemented an @tostring() back when I thought @strcat() was typechecked. You can try that if you want. > === (1) - A "toy" Swift library: > > // General Swift Lib Functions > > app (file o) echoi (int i) { echo i stdout=@o;} > app (file o) echof (float f) { echo f stdout=@o;} > app (file o) echob (boolean b) { echo b stdout=@o;} > app (file o) echos (string s) { echo s stdout=@o;} > > (string s) itostr (int i) > { > file f; > f = echoi(i); > s = readData(f); > } > > (string s) ftostr (float n) > { > file f; > f = echof(n); > s = readData(f); > } > > (int n) strtoi (string s) > { > file f; > f = echos(s); > n = readData(f); > } > > (float n) strtof (string s) > { > file f; > f = echos(s); > n = readData(f); > } > > app (file o) sprintfsApp (string fmt, string e[]) > { > sprintfs fmt e stdout=@o; > } > > (string s) sprintfs (string fmt, string e[]) > { > file f; > f = sprintfsApp(fmt,e); > s = readData(f); > } > > === (2) swiftshell: > > login1$ more shelldemo.swift swiftshell > :::::::::::::: > shelldemo.swift > :::::::::::::: > type file; > > app (file o) cat (file i) > { > shell " ( cat " @i "; date; hostname ) | grep . " stdout=@o; > } > > file data<"data.txt">; > file out<"out.txt">; > out = cat(data); > :::::::::::::: > swiftshell > :::::::::::::: > bash -c "$*" > login1$ grep shell tc > localhost shell /home/wilde/swift/lab/swiftshell null null null > mcs shell /home/wilde/swift/lab/swiftshell null null null > login1$ > > === (3) @java() > > Justin previously posted this to the list: > > " If you can check out the latest Swift from trunk I've added some > features that might help you out here. There's a new built-in function @java() > that allows you to call into an existing Java library. You can call into the > Java Platform or into your CLASSPATH. > Here is one example: > > (float result) sin(float x) { > result = @java("java.lang.Math", "sin", x); > } > > float x = 0.5; > float y = sin(x); > > trace("sin", x, y); > > Note that you currently have to assign the result of @java() to a variable." > > > > > ----- "Andriy Fedorov" wrote: > >> On Sat, Mar 27, 2010 at 17:34, Michael Wilde >> wrote: >>> Andriy, Im pretty sure @strcat() will take an int and return a >> string. >>> >> >> Mike, yes, you are right -- there was another error in my script, >> @strcat indeed works with int. >> >> One more basic question (sorry if I missed this in the guide): is >> there a way to get the size of an array? I would like to split a >> string, and get the last item in the array it returns. Is this >> possible? >> >> >>> - Mike >>> >>> ----- "Andriy Fedorov" wrote: >>> >>>> Hi, >>>> >>>> Is it possible to convert int type to string type in Swift? >>>> >>>> I see @toint, but not @tostring, and it looks like I am not able >> to >>>> pass int to @strcat(). >>>> >>>> Thanks >>>> >>>> -- >>>> Andriy Fedorov, Ph.D. >>>> >>>> Research Fellow >>>> Brigham and Women's Hospital >>>> Harvard Medical School >>>> 75 Francis Street >>>> Boston, MA 02115 USA >>>> fedorov at bwh.harvard.edu >>>> _______________________________________________ >>>> Swift-user mailing list >>>> Swift-user at ci.uchicago.edu >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user >>> >>> -- >>> Michael Wilde >>> Computation Institute, University of Chicago >>> Mathematics and Computer Science Division >>> Argonne National Laboratory >>> >>> > > -- Justin M Wozniak From marcin at galton.uchicago.edu Tue Apr 6 00:20:50 2010 From: marcin at galton.uchicago.edu (Marcin Hitczenko) Date: Tue, 6 Apr 2010 00:20:50 -0500 (CDT) Subject: [Swift-user] swift and fusion Message-ID: <40547.207.181.247.181.1270531250.squirrel@galton.uchicago.edu> Hi, I am using swift to submit several R jobs on fusion and am trying to determine whether or not I am making use of all the available cores on each node (I believe there are 8 cores on each node). I submitted 10 really short jobs to try to see if I could determine what was going on, but I don't really know what to look for. In case it is useful, I am attaching the .log file, sites.xml, and an info file for one of the jobs. Thanks for your help, Marcin -------------- next part -------------- A non-text attachment was scrubbed... Name: sites.xml Type: text/xml Size: 375 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: environment_setup-20100405-2320-oi51qh72.log Type: application/octet-stream Size: 341881 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: RBATCH-chvmf5qj-info Type: application/octet-stream Size: 2128 bytes Desc: not available URL: From aespinosa at cs.uchicago.edu Sat Apr 10 00:30:24 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Sat, 10 Apr 2010 00:30:24 -0500 Subject: [Swift-user] throttling mapping threads Message-ID: Hi, How do you throttle the number of mapping threads like foreach.max.threads ? I have this workflow whose mapper accesses a mysql database that makes a 400+ access at a time. I can change foreach.max.threads, but that then would affect the number of jobs I could send at a time too. -- Allan M. Espinosa PhD student, Computer Science University of Chicago From hategan at mcs.anl.gov Sat Apr 10 00:36:16 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 10 Apr 2010 00:36:16 -0500 Subject: [Swift-user] throttling mapping threads In-Reply-To: References: Message-ID: <1270877776.14163.1.camel@localhost> On Sat, 2010-04-10 at 00:30 -0500, Allan Espinosa wrote: > Hi, > > How do you throttle the number of mapping threads like > foreach.max.threads ? I have this workflow whose mapper accesses a > mysql database that makes a 400+ access at a time. There is no throttling for those unfortunately. Any way you can bulk the mapping invocations? From aespinosa at cs.uchicago.edu Sat Apr 10 00:45:28 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Sat, 10 Apr 2010 00:45:28 -0500 Subject: [Swift-user] throttling mapping threads In-Reply-To: <1270877776.14163.1.camel@localhost> References: <1270877776.14163.1.camel@localhost> Message-ID: Oh well... Yeah, I'm actually working on caching the queries somewhere (via memcache) Thanks! -Allan 2010/4/10 Mihael Hategan : > On Sat, 2010-04-10 at 00:30 -0500, Allan Espinosa wrote: >> Hi, >> >> How do you throttle the number of mapping threads like >> foreach.max.threads ? ?I have this workflow whose mapper accesses a >> mysql database that makes a 400+ access at a time. > > There is no throttling for those unfortunately. > > Any way you can bulk the mapping invocations? > > > From aespinosa at cs.uchicago.edu Mon Apr 12 16:11:01 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Mon, 12 Apr 2010 16:11:01 -0500 Subject: [Swift-user] throttling mapping threads In-Reply-To: References: <1270877776.14163.1.camel@localhost> Message-ID: <1271106661.2089.8.camel@origin> I have tried caching queries and still had problems. I just made a C version of my mapper because i thought invoking the interpreter might be too much; still had problems. I guess i'll now limit the foreach.max.threads then... -Allan On Sab, 2010-04-10 at 00:45 -0500, Allan Espinosa wrote: > Oh well... > > Yeah, I'm actually working on caching the queries somewhere (via memcache) > > Thanks! > -Allan > > 2010/4/10 Mihael Hategan : > > On Sat, 2010-04-10 at 00:30 -0500, Allan Espinosa wrote: > >> Hi, > >> > >> How do you throttle the number of mapping threads like > >> foreach.max.threads ? I have this workflow whose mapper accesses a > >> mysql database that makes a 400+ access at a time. > > > > There is no throttling for those unfortunately. > > > > Any way you can bulk the mapping invocations? > > > > > > From aespinosa at cs.uchicago.edu Mon Apr 12 19:28:17 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Mon, 12 Apr 2010 19:28:17 -0500 Subject: [Swift-user] throttling mapping threads In-Reply-To: <1271106661.2089.8.camel@origin> References: <1270877776.14163.1.camel@localhost> <1271106661.2089.8.camel@origin> Message-ID: <1271118497.2089.10.camel@origin> Aha! I just delegated my mapper to an app function + readdata incantation. Hopefully this will work -Allan On Lun, 2010-04-12 at 16:11 -0500, Allan Espinosa wrote: > I have tried caching queries and still had problems. I just made a C > version of my mapper because i thought invoking the interpreter might be > too much; still had problems. > > I guess i'll now limit the foreach.max.threads then... > -Allan > > On Sab, 2010-04-10 at 00:45 -0500, Allan Espinosa wrote: > > Oh well... > > > > Yeah, I'm actually working on caching the queries somewhere (via memcache) > > > > Thanks! > > -Allan > > > > 2010/4/10 Mihael Hategan : > > > On Sat, 2010-04-10 at 00:30 -0500, Allan Espinosa wrote: > > >> Hi, > > >> > > >> How do you throttle the number of mapping threads like > > >> foreach.max.threads ? I have this workflow whose mapper accesses a > > >> mysql database that makes a 400+ access at a time. > > > > > > There is no throttling for those unfortunately. > > > > > > Any way you can bulk the mapping invocations? From benc at hawaga.org.uk Tue Apr 13 09:02:58 2010 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 13 Apr 2010 14:02:58 +0000 (GMT) Subject: [Swift-user] throttling mapping threads In-Reply-To: <1271118497.2089.10.camel@origin> References: <1270877776.14163.1.camel@localhost> <1271106661.2089.8.camel@origin> <1271118497.2089.10.camel@origin> Message-ID: > Aha! I just delegated my mapper to an app function + readdata > incantation. Hopefully this will work one invocation per mapping or one for the whole script? -- From aespinosa at cs.uchicago.edu Tue Apr 13 09:04:57 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Tue, 13 Apr 2010 09:04:57 -0500 Subject: [Swift-user] throttling mapping threads In-Reply-To: References: <1270877776.14163.1.camel@localhost> <1271106661.2089.8.camel@origin> <1271118497.2089.10.camel@origin> Message-ID: one for the whole script which is invoked a lot of times 2010/4/13 Ben Clifford : > >> Aha! ?I just delegated my mapper to an app function + readdata >> incantation. ?Hopefully this will work > > one invocation per mapping or one for the whole script? > > -- > > > From benc at hawaga.org.uk Tue Apr 13 11:11:12 2010 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 13 Apr 2010 16:11:12 +0000 (GMT) Subject: [Swift-user] throttling mapping threads In-Reply-To: References: <1270877776.14163.1.camel@localhost> <1271106661.2089.8.camel@origin> <1271118497.2089.10.camel@origin> Message-ID: > one for the whole script which is invoked a lot of times you're invoking the same swiftscript a whole lot of times? -- From aespinosa at cs.uchicago.edu Tue Apr 13 11:52:52 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Tue, 13 Apr 2010 11:52:52 -0500 Subject: [Swift-user] throttling mapping threads In-Reply-To: References: <1270877776.14163.1.camel@localhost> <1271106661.2089.8.camel@origin> <1271118497.2089.10.camel@origin> Message-ID: oh i was referring to my ext mapper script 1 swift script invoked once, inside of it is a foreach doing 400+ ext mappings. -Allan 2010/4/13 Ben Clifford : >> one for the whole script which is invoked a lot of times > > > you're invoking the same swiftscript a whole lot of times? > > -- > > > -- Allan M. Espinosa PhD student, Computer Science University of Chicago From hategan at mcs.anl.gov Wed Apr 14 00:45:42 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 14 Apr 2010 00:45:42 -0500 Subject: [Swift-user] throttling mapping threads In-Reply-To: References: <1270877776.14163.1.camel@localhost> <1271106661.2089.8.camel@origin> <1271118497.2089.10.camel@origin> Message-ID: <1271223942.7014.8.camel@localhost> So I looked at the problem in a bit more detail. The part that starts the external process is run from the karajan worker threads which are limited in number (somewhere between 1 and 8)*. So that's the maximum number of concurrent invocations of an external mapper. Do you actually see all 400 of them run at once? Mihael (*) Funny thing that for cooperative multi-tasking such non-cooperating tasks as the example above were the thing that prompted the development of preemptive multi-tasking. And yet here's an example where, due to lazy coding, it turns out to be helpful. On Tue, 2010-04-13 at 11:52 -0500, Allan Espinosa wrote: > oh i was referring to my ext mapper script > 1 swift script invoked once, inside of it is a foreach doing 400+ ext mappings. > > -Allan > > 2010/4/13 Ben Clifford : > >> one for the whole script which is invoked a lot of times > > > > > > you're invoking the same swiftscript a whole lot of times? > > > > -- > > > > > > > > > From aespinosa at cs.uchicago.edu Wed Apr 14 02:41:07 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Wed, 14 Apr 2010 02:41:07 -0500 Subject: [Swift-user] throttling mapping threads In-Reply-To: <1271223942.7014.8.camel@localhost> References: <1270877776.14163.1.camel@localhost> <1271106661.2089.8.camel@origin> <1271118497.2089.10.camel@origin> <1271223942.7014.8.camel@localhost> Message-ID: Moving the thread to swift-devel. Ah you're right. i did a thread trace in my old setup: $ jstack 8026 | grep "state =" | cat -n 1 Thread 8051: (state = BLOCKED) 2 Thread 8050: (state = BLOCKED) 3 Thread 8047: (state = BLOCKED) 4 Thread 8045: (state = BLOCKED) 5 Thread 8043: (state = BLOCKED) 6 Thread 8042: (state = BLOCKED) 7 Thread 8041: (state = BLOCKED) 8 Thread 8040: (state = BLOCKED) 9 Thread 8034: (state = BLOCKED) 10 Thread 8033: (state = BLOCKED) 11 Thread 8032: (state = BLOCKED) 12 Thread 8026: (state = BLOCKED) Also I added a line in my ext mapper that creates a files. after 13 minutes from the time i started the workflow, only 12 files were created. i'll check with the new version's swift.log on how long these files are expected to come. -Alla 2010/4/14 Mihael Hategan : > So I looked at the problem in a bit more detail. > > The part that starts the external process is run from the karajan worker > threads which are limited in number (somewhere between 1 and 8)*. So > that's the maximum number of concurrent invocations of an external > mapper. Do you actually see all 400 of them run at once? > > Mihael > > (*) Funny thing that for cooperative multi-tasking such non-cooperating > tasks as the example above were the thing that prompted the development > of preemptive multi-tasking. And yet here's an example where, due to > lazy coding, it turns out to be helpful. > > On Tue, 2010-04-13 at 11:52 -0500, Allan Espinosa wrote: >> oh i was referring to my ext mapper script >> 1 swift script invoked once, inside of it is a foreach doing 400+ ext mappings. >> >> -Allan >> >> 2010/4/13 Ben Clifford : >> >> one for the whole script which is invoked a lot of times >> > >> > >> > you're invoking the same swiftscript a whole lot of times? From tstitt at cscs.ch Thu Apr 15 08:06:57 2010 From: tstitt at cscs.ch (Stitt Timothy) Date: Thu, 15 Apr 2010 15:06:57 +0200 Subject: [Swift-user] Can't Access Swift Quickstart and Tutorial HTML Pages Message-ID: Dear Swift Development Team, I was interested in playing around with Swift for the first time today but I can't seem to access the Quickstart and Tutorial HTML pages on the project website. I am receiving some PHP related warnings and errors when I access the links. I would be grateful if you could look into this...particularly for the Quickstart guide that doesn't have an accompanying PDF version. Regards, Tim. ----------------------------- Timothy Stitt Ph.D National Support Services Swiss National Supercomputing Centre (CSCS), Switzerland Phone: +41 (0) 91 610 8233 Email: stitt(at)cscs.ch From wilde at mcs.anl.gov Thu Apr 15 10:26:51 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 15 Apr 2010 10:26:51 -0500 (CDT) Subject: [Swift-user] Can't Access Swift Quickstart and Tutorial HTML Pages In-Reply-To: Message-ID: <12535219.366391271345211229.JavaMail.root@zimbra> Thanks for spotting this, Tim. Ive notified our sysadmin team to correct the problem (caused by web server re-orgs done yesterday). I hope this will be fixed by the end of the day (CDT) if not sooner. - Mike ----- "Stitt Timothy" wrote: > Dear Swift Development Team, > > I was interested in playing around with Swift for the first time today > but I can't seem to access the Quickstart and Tutorial HTML pages on > the project website. I am receiving some PHP related warnings and > errors when I access the links. > > I would be grateful if you could look into this...particularly for the > Quickstart guide that doesn't have an accompanying PDF version. > > Regards, > > Tim. > > ----------------------------- > > Timothy Stitt Ph.D > National Support Services > Swiss National Supercomputing Centre (CSCS), Switzerland > Phone: +41 (0) 91 610 8233 > > Email: stitt(at)cscs.ch > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Sat Apr 17 14:54:16 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 17 Apr 2010 14:54:16 -0500 (CDT) Subject: [Swift-user] Swift documentation is back online In-Reply-To: <12535219.366391271345211229.JavaMail.root@zimbra> Message-ID: <27037321.419261271534056463.JavaMail.root@zimbra> Timothy, all - my apologies for the outage. - Mike ----- "Michael Wilde" wrote: > Thanks for spotting this, Tim. Ive notified our sysadmin team to > correct the problem (caused by web server re-orgs done yesterday). I > hope this will be fixed by the end of the day (CDT) if not sooner. > > - Mike > > ----- "Stitt Timothy" wrote: > > > Dear Swift Development Team, > > > > I was interested in playing around with Swift for the first time > today > > but I can't seem to access the Quickstart and Tutorial HTML pages on > > the project website. I am receiving some PHP related warnings and > > errors when I access the links. > > > > I would be grateful if you could look into this...particularly for > the > > Quickstart guide that doesn't have an accompanying PDF version. > > > > Regards, > > > > Tim. > > > > ----------------------------- > > > > Timothy Stitt Ph.D > > National Support Services > > Swiss National Supercomputing Centre (CSCS), Switzerland > > Phone: +41 (0) 91 610 8233 > > > > Email: stitt(at)cscs.ch > > > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From tstitt at cscs.ch Mon Apr 19 04:32:07 2010 From: tstitt at cscs.ch (Stitt Timothy) Date: Mon, 19 Apr 2010 11:32:07 +0200 Subject: [Swift-user] Re: Swift documentation is back online In-Reply-To: <27037321.419261271534056463.JavaMail.root@zimbra> References: <27037321.419261271534056463.JavaMail.root@zimbra> Message-ID: <432ED015-BEBE-41A5-90A4-9D9B81D29C4C@cscs.ch> Now worries Mike...thanks for that. All working now. Cheers, Tim. On Apr 17, 2010, at 9:54 PM, Michael Wilde wrote: Timothy, all - my apologies for the outage. - Mike ----- "Michael Wilde" > wrote: Thanks for spotting this, Tim. Ive notified our sysadmin team to correct the problem (caused by web server re-orgs done yesterday). I hope this will be fixed by the end of the day (CDT) if not sooner. - Mike ----- "Stitt Timothy" > wrote: Dear Swift Development Team, I was interested in playing around with Swift for the first time today but I can't seem to access the Quickstart and Tutorial HTML pages on the project website. I am receiving some PHP related warnings and errors when I access the links. I would be grateful if you could look into this...particularly for the Quickstart guide that doesn't have an accompanying PDF version. Regards, Tim. ----------------------------- Timothy Stitt Ph.D National Support Services Swiss National Supercomputing Centre (CSCS), Switzerland Phone: +41 (0) 91 610 8233 Email: stitt(at)cscs.ch _______________________________________________ Swift-user mailing list Swift-user at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-user -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory ----------------------------- Timothy Stitt Ph.D National Support Services Swiss National Supercomputing Centre (CSCS), Switzerland Phone: +41 (0) 91 610 8233 Email: stitt(at)cscs.ch From wilde at mcs.anl.gov Tue Apr 27 06:04:48 2010 From: wilde at mcs.anl.gov (wilde at mcs.anl.gov) Date: Tue, 27 Apr 2010 06:04:48 -0500 (CDT) Subject: [Swift-user] Set maxtime > maxwalltime or your script will hang In-Reply-To: <8065554.659051272366271851.JavaMail.root@zimbra> Message-ID: <11830128.659071272366288457.JavaMail.root@zimbra> [cc'ing swift-user] Marcin, Quick answer: Since you changed the maxwalltime in tc.data to 10 minutes, change the "maxtime" setting in sites.xml to N times 10 minutes *plus* 1 minute. The "plus" is important. Long answer: Swift deducts a small fraction of maxtime to use to cleanly shut down the PBS job. The default for this "reserve time" (see the Users Guide) is 10 seconds. So while before it was happily fitting 15-second jobs (the prior setting you had for maxwalltime) into (600-10) second slots, now (with maxwalltime increased to 10 minutes) it could not find any slots into which it could fit a 10 *minute* job. Unfortunately, at the moment, Swift just hangs, continuing to try to find a slot until the maxtime time runs out and the PBS jobs shut down. (There are "good" reasons for this "bad" behavior, which we need to fix) So bottom line: make maxtime some multiple of maxwalltime, and add a bit to maxtime (say 1 minute, or at least 10 seconds). Note that since maxwalltime is just an *estimate* of how long you expect the apptask to run for, this division is by necessity approximate. After an apptask finishes on a coaster worker CPU, the CPU becomes free, and has some varying amount of time left before the coaster worker expires. Then the process repeats, and Swift again uses maxwalltime to see if there is a coaster worker with at least maxwalltime remaining that can run the next apptask. - Mike ps. How about if from now on, if you forget to cc swift-user on these questions, then I will just cc the list on my replies. This has 2 benefits: other users benefit from your questions and my answers, and other swift developers and users can contribute more advice, suggest betters approach, or correct me when I goof. You should join the swift-user list if you have not already done so: http://www.ci.uchicago.edu/swift/support/index.php Thanks! ----- "Marcin Hitczenko" wrote: > Hi Mike, > > Sorry to be such a pain, but I can't get the jobs to run again. I am > still > running environment_setup.swift and have not made any changes to the > pbscoast.xml file other than to change walltime. I am using the same > tc.data file. I again run into the problem where it seems to run for > a > minute, then fail and restart and it does so over and over. I logged > into > the node and it didn't seem to be running my stuff. Also, the output > files > are not being written. > > I am not sure why it would not work now, though I ran it and it > worked > before. > > Marcin > > > From marcin at galton.uchicago.edu Tue Apr 27 10:56:17 2010 From: marcin at galton.uchicago.edu (Marcin Hitczenko) Date: Tue, 27 Apr 2010 10:56:17 -0500 (CDT) Subject: [Swift-user] Re: Set maxtime > maxwalltime or your script will hang Message-ID: <46971.207.181.247.181.1272383777.squirrel@galton.uchicago.edu> Hi Mike, Two unrelated questions: 1. It seems then if I have 8 apptasks calls each of which is one hour (maxwalltime is 01:00:00). I should be able to accomplish this by submitting one PBS job for one node/ 8 cores and having each core run a different task. If I understand correctly, this should take on the order of 1 hour plus some extra. But, we would set maxwalltime to 60*60*8, or 8 hours. Why should it run for 8 hours? I feel as I may still be misunderstanding the distinctions. 2. I submitted a job and would like to cancel it (because it has an error). I use qdel to cancel the job, but within a few minutes a new job restarts. I take it this is swift retrying. How do I actually cancel the entire job? I have done qdel quite a few times, but jobs keep popping up? Thanks, Marcin > [cc'ing swift-user] > > Marcin, > > Quick answer: Since you changed the maxwalltime in tc.data to 10 minutes, > change the "maxtime" setting in sites.xml to N times 10 minutes *plus* 1 > minute. The "plus" is important. > > Long answer: > > Swift deducts a small fraction of maxtime to use to cleanly shut down the > PBS job. The default for this "reserve time" (see the Users Guide) is 10 > seconds. So while before it was happily fitting 15-second jobs (the prior > setting you had for maxwalltime) into (600-10) second slots, now (with > maxwalltime increased to 10 minutes) it could not find any slots into > which it could fit a 10 *minute* job. Unfortunately, at the moment, Swift > just hangs, continuing to try to find a slot until the maxtime time runs > out and the PBS jobs shut down. (There are "good" reasons for this "bad" > behavior, which we need to fix) > > So bottom line: make maxtime some multiple of maxwalltime, and add a bit > to maxtime (say 1 minute, or at least 10 seconds). Note that since > maxwalltime is just an *estimate* of how long you expect the apptask to > run for, this division is by necessity approximate. After an apptask > finishes on a coaster worker CPU, the CPU becomes free, and has some > varying amount of time left before the coaster worker expires. Then the > process repeats, and Swift again uses maxwalltime to see if there is a > coaster worker with at least maxwalltime remaining that can run the next > apptask. > > - Mike > > ps. How about if from now on, if you forget to cc swift-user on these > questions, then I will just cc the list on my replies. This has 2 > benefits: other users benefit from your questions and my answers, and > other swift developers and users can contribute more advice, suggest > betters approach, or correct me when I goof. > > You should join the swift-user list if you have not already done so: > http://www.ci.uchicago.edu/swift/support/index.php > > Thanks! > > ----- "Marcin Hitczenko" wrote: > >> Hi Mike, >> >> Sorry to be such a pain, but I can't get the jobs to run again. I am >> still >> running environment_setup.swift and have not made any changes to the >> pbscoast.xml file other than to change walltime. I am using the same >> tc.data file. I again run into the problem where it seems to run for >> a >> minute, then fail and restart and it does so over and over. I logged >> into >> the node and it didn't seem to be running my stuff. Also, the output >> files >> are not being written. >> >> I am not sure why it would not work now, though I ran it and it >> worked >> before. >> >> Marcin >> >> >> > From wilde at mcs.anl.gov Tue Apr 27 11:06:09 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 27 Apr 2010 11:06:09 -0500 (CDT) Subject: [Swift-user] Re: Set maxtime > maxwalltime or your script will hang In-Reply-To: <46971.207.181.247.181.1272383777.squirrel@galton.uchicago.edu> Message-ID: <17326438.669351272384369740.JavaMail.root@zimbra> Marcin, ----- "Marcin Hitczenko" wrote: > Hi Mike, > > Two unrelated questions: > > 1. It seems then if I have 8 apptasks calls each of which is one hour > (maxwalltime is 01:00:00). I should be able to accomplish this by > submitting one PBS job for one node/ 8 cores and having each core run > a > different task. If I understand correctly, this should take on the > order > of 1 hour plus some extra. But, we would set maxwalltime to 60*60*8, > or 8 > hours. Why should it run for 8 hours? I feel as I may still be > misunderstanding the distinctions. For this case, set maxwalltime to 01:00:00 (ie the estimated duration of each app() task) and maxtime to 3700 (1 hour with 100 secs "reserve") One caveat, though, for running on the Fusion cluster: I know that if you ask for the batch queue, it insists that you request at least 2 nodes; if you ask for one, your job is rejected. We need to test exactly how to configure coasters for 1 node on that system. You can try removing the queue element from your sites.xml, and see how it behaves. I need to do more testing on that system and post the results. > > 2. I submitted a job and would like to cancel it (because it has an > error). I use qdel to cancel the job, but within a few minutes a new > job > restarts. I take it this is swift retrying. How do I actually cancel > the > entire job? I have done qdel quite a few times, but jobs keep popping > up? I think the best way to clean up is to interrupt/kill the swift command with a ^C, and then it should clean up its PBS jobs. I think this has been working for me; let us know if that doesnt work for you. - Mike > Thanks, > > Marcin > > > > [cc'ing swift-user] > > > > Marcin, > > > > Quick answer: Since you changed the maxwalltime in tc.data to 10 > minutes, > > change the "maxtime" setting in sites.xml to N times 10 minutes > *plus* 1 > > minute. The "plus" is important. > > > > Long answer: > > > > Swift deducts a small fraction of maxtime to use to cleanly shut > down the > > PBS job. The default for this "reserve time" (see the Users Guide) > is 10 > > seconds. So while before it was happily fitting 15-second jobs (the > prior > > setting you had for maxwalltime) into (600-10) second slots, now > (with > > maxwalltime increased to 10 minutes) it could not find any slots > into > > which it could fit a 10 *minute* job. Unfortunately, at the moment, > Swift > > just hangs, continuing to try to find a slot until the maxtime time > runs > > out and the PBS jobs shut down. (There are "good" reasons for this > "bad" > > behavior, which we need to fix) > > > > So bottom line: make maxtime some multiple of maxwalltime, and add a > bit > > to maxtime (say 1 minute, or at least 10 seconds). Note that since > > maxwalltime is just an *estimate* of how long you expect the apptask > to > > run for, this division is by necessity approximate. After an > apptask > > finishes on a coaster worker CPU, the CPU becomes free, and has > some > > varying amount of time left before the coaster worker expires. Then > the > > process repeats, and Swift again uses maxwalltime to see if there is > a > > coaster worker with at least maxwalltime remaining that can run the > next > > apptask. > > > > - Mike > > > > ps. How about if from now on, if you forget to cc swift-user on > these > > questions, then I will just cc the list on my replies. This has 2 > > benefits: other users benefit from your questions and my answers, > and > > other swift developers and users can contribute more advice, > suggest > > betters approach, or correct me when I goof. > > > > You should join the swift-user list if you have not already done > so: > > http://www.ci.uchicago.edu/swift/support/index.php > > > > Thanks! > > > > ----- "Marcin Hitczenko" wrote: > > > >> Hi Mike, > >> > >> Sorry to be such a pain, but I can't get the jobs to run again. I > am > >> still > >> running environment_setup.swift and have not made any changes to > the > >> pbscoast.xml file other than to change walltime. I am using the > same > >> tc.data file. I again run into the problem where it seems to run > for > >> a > >> minute, then fail and restart and it does so over and over. I > logged > >> into > >> the node and it didn't seem to be running my stuff. Also, the > output > >> files > >> are not being written. > >> > >> I am not sure why it would not work now, though I ran it and it > >> worked > >> before. > >> > >> Marcin > >> > >> > >> > > -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Tue Apr 27 11:48:35 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 27 Apr 2010 11:48:35 -0500 Subject: [Swift-user] Set maxtime > maxwalltime or your script will hang In-Reply-To: <11830128.659071272366288457.JavaMail.root@zimbra> References: <11830128.659071272366288457.JavaMail.root@zimbra> Message-ID: <1272386915.3670.2.camel@localhost> On Tue, 2010-04-27 at 06:04 -0500, wilde at mcs.anl.gov wrote: > [cc'ing swift-user] > > Marcin, > > Quick answer: Since you changed the maxwalltime in tc.data to 10 > minutes, change the "maxtime" setting in sites.xml to N times 10 > minutes *plus* 1 minute. The "plus" is important. > > Long answer: > > Swift deducts a small fraction of maxtime to use to cleanly shut down > the PBS job. The default for this "reserve time" (see the Users Guide) > is 10 seconds. So while before it was happily fitting 15-second jobs > (the prior setting you had for maxwalltime) into (600-10) second > slots, now (with maxwalltime increased to 10 minutes) it could not > find any slots into which it could fit a 10 *minute* job. > Unfortunately, at the moment, Swift just hangs, continuing to try to > find a slot until the maxtime time runs out and the PBS jobs shut > down. (There are "good" reasons for this "bad" behavior, which we need > to fix) A comment there: Set the "maxtime" coaster parameter if you know that the queue you are using has a limit to the time a job can have. Don't set it because your jobs have a certain maxwalltime. It's purpose is to prevent the creation of blocks that cannot be run on a given queue. From marcin at galton.uchicago.edu Tue Apr 27 12:00:39 2010 From: marcin at galton.uchicago.edu (Marcin Hitczenko) Date: Tue, 27 Apr 2010 12:00:39 -0500 (CDT) Subject: [Swift-user] Re: Set maxtime > maxwalltime or your script will hang In-Reply-To: <17326438.669351272384369740.JavaMail.root@zimbra> References: <17326438.669351272384369740.JavaMail.root@zimbra> Message-ID: <44445.207.181.247.181.1272387639.squirrel@galton.uchicago.edu> Hi, > Marcin, > > ----- "Marcin Hitczenko" wrote: > >> Hi Mike, >> >> Two unrelated questions: >> >> 1. It seems then if I have 8 apptasks calls each of which is one hour >> (maxwalltime is 01:00:00). I should be able to accomplish this by >> submitting one PBS job for one node/ 8 cores and having each core run >> a >> different task. If I understand correctly, this should take on the >> order >> of 1 hour plus some extra. But, we would set maxwalltime to 60*60*8, >> or 8 >> hours. Why should it run for 8 hours? I feel as I may still be >> misunderstanding the distinctions. > > For this case, set maxwalltime to 01:00:00 (ie the estimated duration of > each app() task) and maxtime to 3700 (1 hour with 100 secs "reserve") > > One caveat, though, for running on the Fusion cluster: I know that if you > ask for the batch queue, it insists that you request at least 2 nodes; if > you ask for one, your job is rejected. We need to test exactly how to > configure coasters for 1 node on that system. You can try removing the > queue element from your sites.xml, and see how it behaves. I need to do > more testing on that system and post the results. But even if we have 16 app() tasks and we call for 2 nodes, shouldn't maxtime be more or less the same i.e. 3600+reserve? I guess my question refers to maxtime=maxwalltime*N + reserve. Shouldn't N be the number of waves or (# app() tasks)/(8cores/node*#nodes), rather than the # of app() tasks? >> 2. I submitted a job and would like to cancel it (because it has an >> error). I use qdel to cancel the job, but within a few minutes a new >> job >> restarts. I take it this is swift retrying. How do I actually cancel >> the >> entire job? I have done qdel quite a few times, but jobs keep popping >> up? I am using the command: swift .... >& swift.out &, so I can't kill the command directly, I don't think. > I think the best way to clean up is to interrupt/kill the swift command > with a ^C, > and then it should clean up its PBS jobs. I think this has been working > for me; let us know if that doesnt work for you. > > - Mike > > >> Thanks, >> >> Marcin >> >> >> > [cc'ing swift-user] >> > >> > Marcin, >> > >> > Quick answer: Since you changed the maxwalltime in tc.data to 10 >> minutes, >> > change the "maxtime" setting in sites.xml to N times 10 minutes >> *plus* 1 >> > minute. The "plus" is important. >> > >> > Long answer: >> > >> > Swift deducts a small fraction of maxtime to use to cleanly shut >> down the >> > PBS job. The default for this "reserve time" (see the Users Guide) >> is 10 >> > seconds. So while before it was happily fitting 15-second jobs (the >> prior >> > setting you had for maxwalltime) into (600-10) second slots, now >> (with >> > maxwalltime increased to 10 minutes) it could not find any slots >> into >> > which it could fit a 10 *minute* job. Unfortunately, at the moment, >> Swift >> > just hangs, continuing to try to find a slot until the maxtime time >> runs >> > out and the PBS jobs shut down. (There are "good" reasons for this >> "bad" >> > behavior, which we need to fix) >> > >> > So bottom line: make maxtime some multiple of maxwalltime, and add a >> bit >> > to maxtime (say 1 minute, or at least 10 seconds). Note that since >> > maxwalltime is just an *estimate* of how long you expect the apptask >> to >> > run for, this division is by necessity approximate. After an >> apptask >> > finishes on a coaster worker CPU, the CPU becomes free, and has >> some >> > varying amount of time left before the coaster worker expires. Then >> the >> > process repeats, and Swift again uses maxwalltime to see if there is >> a >> > coaster worker with at least maxwalltime remaining that can run the >> next >> > apptask. >> > >> > - Mike >> > >> > ps. How about if from now on, if you forget to cc swift-user on >> these >> > questions, then I will just cc the list on my replies. This has 2 >> > benefits: other users benefit from your questions and my answers, >> and >> > other swift developers and users can contribute more advice, >> suggest >> > betters approach, or correct me when I goof. >> > >> > You should join the swift-user list if you have not already done >> so: >> > http://www.ci.uchicago.edu/swift/support/index.php >> > >> > Thanks! >> > >> > ----- "Marcin Hitczenko" wrote: >> > >> >> Hi Mike, >> >> >> >> Sorry to be such a pain, but I can't get the jobs to run again. I >> am >> >> still >> >> running environment_setup.swift and have not made any changes to >> the >> >> pbscoast.xml file other than to change walltime. I am using the >> same >> >> tc.data file. I again run into the problem where it seems to run >> for >> >> a >> >> minute, then fail and restart and it does so over and over. I >> logged >> >> into >> >> the node and it didn't seem to be running my stuff. Also, the >> output >> >> files >> >> are not being written. >> >> >> >> I am not sure why it would not work now, though I ran it and it >> >> worked >> >> before. >> >> >> >> Marcin >> >> >> >> >> >> >> > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > From wilde at mcs.anl.gov Tue Apr 27 12:05:40 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 27 Apr 2010 12:05:40 -0500 Subject: [Swift-user] Re: Set maxtime > maxwalltime or your script will hang In-Reply-To: <44445.207.181.247.181.1272387639.squirrel@galton.uchicago.edu> References: <17326438.669351272384369740.JavaMail.root@zimbra> <44445.207.181.247.181.1272387639.squirrel@galton.uchicago.edu> Message-ID: <614A5D33-1D76-47F2-BAFE-1ACB627FCD03@mcs.anl.gov> On Apr 27, 2010, at 12:00 PM, Marcin Hitczenko wrote: > But even if we have 16 app() tasks and we call for 2 nodes, shouldn't > maxtime be more or less the same i.e. 3600+reserve? I guess my question > refers to maxtime=maxwalltime*N + reserve. Shouldn't N be the number of > waves or (# app() tasks)/(8cores/node*#nodes), rather than the # of app() > tasks? Yes, sorry for the confusion - that is correct: by N I meant "waves", not jobs. - Mike From hategan at mcs.anl.gov Tue Apr 27 12:12:27 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 27 Apr 2010 12:12:27 -0500 Subject: [Swift-user] Re: Set maxtime > maxwalltime or your script will hang In-Reply-To: <44445.207.181.247.181.1272387639.squirrel@galton.uchicago.edu> References: <17326438.669351272384369740.JavaMail.root@zimbra> <44445.207.181.247.181.1272387639.squirrel@galton.uchicago.edu> Message-ID: <1272388347.6519.3.camel@localhost> > > For this case, set maxwalltime to 01:00:00 (ie the estimated duration of > > each app() task) and maxtime to 3700 (1 hour with 100 secs "reserve") > > > > One caveat, though, for running on the Fusion cluster: I know that if you > > ask for the batch queue, it insists that you request at least 2 nodes; if > > you ask for one, your job is rejected. We need to test exactly how to > > configure coasters for 1 node on that system. You can try removing the > > queue element from your sites.xml, and see how it behaves. I need to do > > more testing on that system and post the results. > > But even if we have 16 app() tasks and we call for 2 nodes, shouldn't > maxtime be more or less the same i.e. 3600+reserve? I guess my question > refers to maxtime=maxwalltime*N + reserve. Shouldn't N be the number of > waves or (# app() tasks)/(8cores/node*#nodes), rather than the # of app() > tasks? No. Maxtime is something that is specific to the cluster not the jobs you are running. It should be the maximum allowed time for a job in the queue you are using. Actual coaster blocks may have a different walltime, but they will never have a walltime larger than maxtime. From wilde at mcs.anl.gov Tue Apr 27 12:16:48 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 27 Apr 2010 12:16:48 -0500 Subject: [Swift-user] Set maxtime > maxwalltime or your script will hang In-Reply-To: <1272386915.3670.2.camel@localhost> References: <11830128.659071272366288457.JavaMail.root@zimbra> <1272386915.3670.2.camel@localhost> Message-ID: On Apr 27, 2010, at 11:48 AM, Mihael Hategan wrote: > A comment there: > Set the "maxtime" coaster parameter if you know that the queue you are > using has a limit to the time a job can have. Don't set it because your > jobs have a certain maxwalltime. It's purpose is to prevent the creation > of blocks that cannot be run on a given queue. Mihael, can you clarify this (and the related text from the User Guide)? Users almost always want to get their jobs into a "good" queue, and hence they usually feel compelled to set maxtime. So more questions arise when you try to figure out how to do this: - How does coasters fill blocks? - What LRM wall time will be used when maxtime is not specified? (i.e, how many jobs will coasters place in a block)? - Is coasters committing more jobs to specific blocks (and hence specific sites) than the block has cores, when it computes its schedules? Or just estimating what it may need? From hategan at mcs.anl.gov Tue Apr 27 12:30:40 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 27 Apr 2010 12:30:40 -0500 Subject: [Swift-user] Set maxtime > maxwalltime or your script will hang In-Reply-To: References: <11830128.659071272366288457.JavaMail.root@zimbra> <1272386915.3670.2.camel@localhost> Message-ID: <1272389440.6945.8.camel@localhost> On Tue, 2010-04-27 at 12:16 -0500, Michael Wilde wrote: > On Apr 27, 2010, at 11:48 AM, Mihael Hategan wrote: > > > A comment there: > > Set the "maxtime" coaster parameter if you know that the queue you are > > using has a limit to the time a job can have. Don't set it because your > > jobs have a certain maxwalltime. It's purpose is to prevent the creation > > of blocks that cannot be run on a given queue. > > Mihael, can you clarify this (and the related text from the User Guide)? > > Users almost always want to get their jobs into a "good" queue, and hence > they usually feel compelled to set maxtime. So more questions arise when > you try to figure out how to do this: > > - How does coasters fill blocks? Largest job in the smallest space that can fit it. Then repeat. > > - What LRM wall time will be used when maxtime is not specified? > (i.e, how many jobs will coasters place in a block)? That is controlled by the overallocation numbers. Here's the role of maxtime: blockTime = calculateFromJobsAndOverallocationSettings(); if (blockTime > maxtime) { blockTime = maxtime; } So you shouldn't try to use "maxtime" to control the block walltime. That is not its purpose. And I would rename it to have a clearer name, except I can't think of a better name than "maxtime" for what it does. Or maybe "neverExceedTime". > > - Is coasters committing more jobs to specific blocks (and hence specific sites) > than the block has cores, when it computes its schedules? It depends on whether you want parallelism or efficiency. The default values tries for parallelism, so it will try to allocate as many cores as you have jobs. So typically you won't see more cores allocated than jobs, unless you are hitting some granularity. > Or just estimating what it may need? It is estimating what it may need.