From yadudoc1729 at gmail.com Fri Mar 1 02:23:30 2013 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Fri, 1 Mar 2013 13:53:30 +0530 Subject: [Swift-devel] Weekly training sessions In-Reply-To: References: <1568462793.1945423.1362083469515.JavaMail.root@ci.uchicago.edu> <1455849172.1964578.1362085731588.JavaMail.root@ci.uchicago.edu> Message-ID: Could we please, have the these over live/recorded-video ? -Yadu On Fri, Mar 1, 2013 at 4:49 AM, Tim Armstrong wrote: > I could probably put together a run-through of Swift/T at some point - a > high level overview of the compiler/runtime stack, talk about some of the > current limitations plus some of the language features we've been playing > with. > > It would be good to communicate more with you guys so we can get each > other more up to speed on the state of play, and maybe think through future > directions for development. > > - Tim > > > On Thu, Feb 28, 2013 at 4:46 PM, Glen Hocky wrote: > >> As a remote outsider, I'd be interested in watching a video (live or >> recorded, but i guess pref both?) >> -Glen >> >> >> On Thu, Feb 28, 2013 at 5:44 PM, Scott Krieder wrote: >> >>> As an outsider (if I get a vote) I would definitely be interested in >>> attending. >>> >>> -Scott >>> >>> On Thu, Feb 28, 2013 at 3:08 PM, David Kelly wrote: >>> >>>> Hello, >>>> >>>> I was thinking it might be a good time to bring back the weekly >>>> developer/training sessions we used to have. I always found them very >>>> useful. I think Justin has a list of future topics somewhere (which I can't >>>> seem to find at the moment), and I'm sure there are a lot of new things to >>>> discuss as well. Some topics that come to mind: >>>> >>>> Swift-T tutorial >>>> An overview of coaster configurations (automatic/passive/persistent/etc) >>>> MPI >>>> Modis >>>> Understanding mappers >>>> Methods for approaching file I/O >>>> Running swift on EC2 >>>> Running swift on OSG >>>> >>>> Maybe we could alternate who gives the training each week so that no >>>> one developer has to spend too much time on it. >>>> >>>> Any interest? >>>> >>>> >>>> >>>> * David Michael Kelly* >>>> Systems Programmer >>>> University of Chicago Computation Institute >>>> 5735 S. Ellis Ave. Chicago, IL 60637 >>>> >>>> >>>> _______________________________________________ >>>> Swift-devel mailing list >>>> Swift-devel at ci.uchicago.edu >>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel >>>> >>>> >>> >>> >>> -- >>> Scott J. Krieder >>> C: 419-685-0410 >>> E: skrieder at iit.edu >>> http://datasys.cs.iit.edu/~skrieder/ >>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel >>> >>> >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel >> >> > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > -- Thanks and Regards, Yadu Nand B -------------- next part -------------- An HTML attachment was scrubbed... URL: From wozniak at mcs.anl.gov Fri Mar 1 05:47:22 2013 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Fri, 1 Mar 2013 05:47:22 -0600 (CST) Subject: [Swift-devel] Weekly training sessions In-Reply-To: Message-ID: <605535474.2169579.1362138442159.JavaMail.root@mcs.anl.gov> Great! I will set up a poll for the time. We'll work something out for screen sharing and/or recording. Justin ----- Original Message ----- From: "Yadu Nand" To: "Tim Armstrong" Cc: "Glen Hocky" , "Swift Devel" Sent: Friday, March 1, 2013 2:23:30 AM Subject: Re: [Swift-devel] Weekly training sessions Could we please, have the these over live/recorded-video ? -Yadu On Fri, Mar 1, 2013 at 4:49 AM, Tim Armstrong < tim.g.armstrong at gmail.com > wrote: I could probably put together a run-through of Swift/T at some point - a high level overview of the compiler/runtime stack, talk about some of the current limitations plus some of the language features we've been playing with. It would be good to communicate more with you guys so we can get each other more up to speed on the state of play, and maybe think through future directions for development. - Tim On Thu, Feb 28, 2013 at 4:46 PM, Glen Hocky < hockyg at gmail.com > wrote: As a remote outsider, I'd be interested in watching a video (live or recorded, but i guess pref both?) -Glen On Thu, Feb 28, 2013 at 5:44 PM, Scott Krieder < skrieder at iit.edu > wrote: As an outsider (if I get a vote) I would definitely be interested in attending. -Scott On Thu, Feb 28, 2013 at 3:08 PM, David Kelly < davidk at ci.uchicago.edu > wrote: Hello, I was thinking it might be a good time to bring back the weekly developer/training sessions we used to have. I always found them very useful. I think Justin has a list of future topics somewhere (which I can't seem to find at the moment), and I'm sure there are a lot of new things to discuss as well. Some topics that come to mind: Swift-T tutorial An overview of coaster configurations (automatic/passive/persistent/etc) MPI Modis Understanding mappers Methods for approaching file I/O Running swift on EC2 Running swift on OSG Maybe we could alternate who gives the training each week so that no one developer has to spend too much time on it. Any interest? David Michael Kelly Systems Programmer University of Chicago Computation Institute 5735 S. Ellis Ave. Chicago, IL 60637 _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel -- Scott J. Krieder C: 419-685-0410 E: skrieder at iit.edu http://datasys.cs.iit.edu/~skrieder/ _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel -- Thanks and Regards, Yadu Nand B _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel From davidk at ci.uchicago.edu Fri Mar 1 07:06:05 2013 From: davidk at ci.uchicago.edu (David Kelly) Date: Fri, 1 Mar 2013 07:06:05 -0600 (CST) Subject: [Swift-devel] 0.94 release note draft In-Reply-To: <1048561590.545450.1360868936831.JavaMail.root@mcs.anl.gov> Message-ID: <2079798110.2181490.1362143165573.JavaMail.root@ci.uchicago.edu> Here is the list, based mostly on what I could find in the svn logs. - To behavior of iterate has changed from 0.93 to 0.94. If you have scripts that use iterate, please read http://www.ci.uchicago.edu/swift/guides/release-0.94/userguide/userguide.html#_iterate - Walltimes are more strictly enforced by coasters. Previous to Swift version 0.94, if an application run with coasters would exceed its specified maxwalltime, it would be allowed to continue to execute. However, if this would cause the worker on which the application was running to exceed its maxwalltime, the queuing system would kill the worker. The resulting error message was not always very clear. Since version 0.94 coaster workers enforce the user-specified maxwalltime. If an application exceeds its maxwalltime, the coaster worker will not allow it to continue, but terminate it and report the error. - Swift will now use camel case for functions, for example, @toInt instead of @toint. The previous naming convention will still work, but you may see deprecated warnings. - Associative arrays have been added. More details and examples can be found at http://www.ci.uchicago.edu/swift/guides/release-0.94/userguide/userguide.html#_associative_arrays - Dynamic profiles. Many settings formerly only definable in sites.xml can now be set on a per-app basis. This can make things easier when running multiple apps that have different requirements for settings like processors per node and wallitme. http://www.ci.uchicago.edu/swift/guides/release-0.94/userguide/userguide.html#_dynamic_profiles - Added a new ssh command line provider. Previously ssh support was done by creating a file called ~/.ssh/auth.defaults. The ssh command line provider is more flexible and doesn't require this step. ssh-cl allows you to use SSH agents. You can use ssh-cl by adding something like this to your sites.xml: - Many fixes and improvements to improve the reliability and performance of coaster provider staging. - Added support for the Slurm scheduler - Added support for the LSF scheduler - Improvements to condor provider (non-shared jobtype and more flexibility to define what gets added to the submit script). - Fixes for the textual user interface (TUI). Adding the -tui option to the swift command line allows you to monitor progress in a curses based menu. A brief example of this can be found at http://www.ci.uchicago.edu/~davidk/modis.ogv. - Added the ability to call Java methods within swift using @java. For example: float f = @java("java.lang.Math", "sin", 0.5); http://www.ci.uchicago.edu/swift/guides/release-0.94/userguide/userguide.html#_java - Added a hang checker that provides the user with more information about potential hangs - @strjoin function for joining strings. http://www.ci.uchicago.edu/swift/guides/release-0.94/userguide/userguide.html#_strjoin - If you have a requirement that a command get run on the worker node before processing any work, worker.pl will now execute commands stored in the environment variable $WORKER_INIT_CMD - Use $SWIFT_USERHOME to determine where to swift should create some of its required files. This defaults to $HOME, but this may cause problems in some situations where $HOME is not accessable on worker nodes. - Experimental "wrapper staging" feature that delegates file staging to an external wrapper script. - Various improvements to the way that Swift runs MPI jobs. - Better OSG integration/support using GlideinWMS. ----- Original Message ----- From: "Michael Wilde" To: "David Kelly" Cc: "Swift Devel" Sent: Thursday, February 14, 2013 1:08:56 PM Subject: Fwd: 0.94 release note draft Some notes toward an 0.94 release notes document. There's a longer list, I think in an IM chat transcript, that we need to incorporate. Please send additional items to this thread for David to integrate. Thanks, - Mike ----- Forwarded Message ----- From: "David Kelly" To: "Michael Wilde" Sent: Thursday, January 24, 2013 11:06:18 PM Subject: Re: 0.94 release note draft Mike, I just have the quick notes I took from our meeting. These combined with your emails are all the changes that I'm aware of at this point. Iterate differences Walltime hard limit with coasters Associate arrays changes Tracebacker Coaster changes / parameters Slurm and LSF providers Condor provider changes ssh-cl TUI hang checker @functions (strjoin, and possibly others) Dynamic profiles Wrapper staging Pass-thru (PBS attributes) MPI support ----- Original Message ----- > From: "Michael Wilde" > To: "David Kelly" > Sent: Thursday, January 24, 2013 7:56:54 PM > Subject: 0.94 release note draft > > > Hi David, > > I recall I sent you a few batches of line items to list in 0.94 > release notes. Did you gather those somewhere where I can review > them? (Need them for a status report) > > Thanks, > > - Mike > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yadudoc1729 at gmail.com Fri Mar 1 08:49:42 2013 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Fri, 1 Mar 2013 20:19:42 +0530 Subject: [Swift-devel] Scripts with Iterate fails to compile Message-ID: Hi, I'm seeing my scripts fail to compile just as the scripts in swift tests fail with the same error: yadu at Miranda:~/swift-0.94/cog/modules/swift/tests/apps$ swift 050-stomp-skel-1.swift Warning: Function toint is deprecated, at 9 Could not start execution Failed to convert .swiftx to .kml for 050-stomp-skel-1.swift no such attribute: refs in template context [iterate] The script I wrote which doesn't compile is here -> https://github.com/yadudoc/swift-basics/blob/master/stress/x_iterate.swift My tests were on swift 0.94 run on my laptop (ubuntu 12.04). -- Thanks, Yadu Nand B -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidk at ci.uchicago.edu Fri Mar 1 09:18:08 2013 From: davidk at ci.uchicago.edu (David Kelly) Date: Fri, 1 Mar 2013 09:18:08 -0600 (CST) Subject: [Swift-devel] Scripts with Iterate fails to compile In-Reply-To: Message-ID: <1940171960.2237777.1362151088628.JavaMail.root@ci.uchicago.edu> Yadu, The script you wrote works in 0.93, but fails in trunk/0.94. Setting an array element from within an iterate statement seems to fail. The nightly test group we run doesn't test for that at the moment - good catch. ----- Original Message ----- > From: "Yadu Nand" > To: "swift-devel" > Sent: Friday, March 1, 2013 8:49:42 AM > Subject: [Swift-devel] Scripts with Iterate fails to compile > Hi, > I'm seeing my scripts fail to compile just as the scripts in swift > tests fail with the same error: > yadu at Miranda:~/swift-0.94/cog/modules/swift/tests/apps$ swift > 050-stomp-skel-1.swift > Warning: Function toint is deprecated, at 9 > Could not start execution > Failed to convert .swiftx to .kml for 050-stomp-skel-1.swift > no such attribute: refs in template context [iterate] > The script I wrote which doesn't compile is here -> > https://github.com/yadudoc/swift-basics/blob/master/stress/x_iterate.swift > My tests were on swift 0.94 run on my laptop (ubuntu 12.04). > -- > Thanks, > Yadu Nand B > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidk at ci.uchicago.edu Fri Mar 1 11:49:07 2013 From: davidk at ci.uchicago.edu (David Kelly) Date: Fri, 1 Mar 2013 11:49:07 -0600 (CST) Subject: [Swift-devel] 0.94 release note draft In-Reply-To: Message-ID: <1043592494.2397903.1362160147401.JavaMail.root@ci.uchicago.edu> Good point, I'll rephrase that to make it clearer. I believe it's handled by _swiftwrap.wrapperstaging. ----- Original Message ----- > From: "Ketan Maheshwari" > To: "David Kelly" > Cc: "Michael Wilde" , "Swift Devel" > > Sent: Friday, March 1, 2013 9:45:27 AM > Subject: Re: [Swift-devel] 0.94 release note draft > One comment about the 'wrapper staging' feature: my understanding was > that wrapper staging means the files are staged by the swift > wrapper: _swiftwrap and not an external wrapper. > Correct me if I am wrong. > Regards, > Ketan > On Fri, Mar 1, 2013 at 8:06 AM, David Kelly < davidk at ci.uchicago.edu > > wrote: > > Here is the list, based mostly on what I could find in the svn > > logs. > > > - To behavior of iterate has changed from 0.93 to 0.94. If you have > > scripts > > > that use iterate, please read > > > http://www.ci.uchicago.edu/swift/guides/release-0.94/userguide/userguide.html#_iterate > > > - Walltimes are more strictly enforced by coasters. Previous to > > Swift > > version > > > 0.94, if an application run with coasters would exceed its > > specified > > > maxwalltime, it would be allowed to continue to execute. However, > > if > > this > > > would cause the worker on which the application was running to > > exceed > > its > > > maxwalltime, the queuing system would kill the worker. The > > resulting > > error > > > message was not always very clear. Since version 0.94 coaster > > workers > > enforce > > > the user-specified maxwalltime. If an application exceeds its > > maxwalltime, > > > the coaster worker will not allow it to continue, but terminate it > > and report > > > the error. > > > - Swift will now use camel case for functions, for example, @toInt > > instead of > > > @toint. The previous naming convention will still work, but you may > > see > > > deprecated warnings. > > > - Associative arrays have been added. More details and examples can > > be found at > > > http://www.ci.uchicago.edu/swift/guides/release-0.94/userguide/userguide.html#_associative_arrays > > > - Dynamic profiles. Many settings formerly only definable in > > sites.xml can now > > > be set on a per-app basis. This can make things easier when running > > > multiple apps that have different requirements for settings like > > processors > > > per node and wallitme. > > > http://www.ci.uchicago.edu/swift/guides/release-0.94/userguide/userguide.html#_dynamic_profiles > > > - Added a new ssh command line provider. Previously ssh support was > > done by > > > creating a file called ~/.ssh/auth.defaults. The ssh command line > > provider > > > is more flexible and doesn't require this step. ssh-cl allows you > > to > > use > > > SSH agents. You can use ssh-cl by adding something like this to > > your > > sites.xml: > > > > jobmanager="ssh-cl:pbs"/> > > > - Many fixes and improvements to improve the reliability and > > performance of > > > coaster provider staging. > > > - Added support for the Slurm scheduler > > > - Added support for the LSF scheduler > > > - Improvements to condor provider (non-shared jobtype and more > > flexibility > > > to define what gets added to the submit script). > > > - Fixes for the textual user interface (TUI). Adding the -tui > > option > > to the swift > > > command line allows you to monitor progress in a curses based menu. > > A > > brief > > > example of this can be found at > > http://www.ci.uchicago.edu/~davidk/modis.ogv . > > > - Added the ability to call Java methods within swift using @java. > > For example: > > > float f = @java("java.lang.Math", "sin", 0.5); > > > http://www.ci.uchicago.edu/swift/guides/release-0.94/userguide/userguide.html#_java > > > - Added a hang checker that provides the user with more information > > about > > > potential hangs > > > - @strjoin function for joining strings. > > > http://www.ci.uchicago.edu/swift/guides/release-0.94/userguide/userguide.html#_strjoin > > > - If you have a requirement that a command get run on the worker > > node > > before > > > processing any work, worker.pl will now execute commands stored in > > the > > > environment variable $WORKER_INIT_CMD > > > - Use $SWIFT_USERHOME to determine where to swift should create > > some > > of its > > > required files. This defaults to $HOME, but this may cause problems > > in some > > > situations where $HOME is not accessable on worker nodes. > > > - Experimental "wrapper staging" feature that delegates file > > staging > > to an > > > external wrapper script. > > > - Various improvements to the way that Swift runs MPI jobs. > > > - Better OSG integration/support using GlideinWMS. > > > > From: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > Cc: "Swift Devel" < swift-devel at ci.uchicago.edu > > > > > > > Sent: Thursday, February 14, 2013 1:08:56 PM > > > > > > Subject: Fwd: 0.94 release note draft > > > > > > Some notes toward an 0.94 release notes document. > > > > > > There's a longer list, I think in an IM chat transcript, that we > > > need > > > to incorporate. > > > > > > Please send additional items to this thread for David to > > > integrate. > > > > > > Thanks, > > > > > > - Mike > > > > > > ----- Forwarded Message ----- > > > > > > From: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > Sent: Thursday, January 24, 2013 11:06:18 PM > > > > > > Subject: Re: 0.94 release note draft > > > > > > Mike, > > > > > > I just have the quick notes I took from our meeting. These > > > combined > > > with your emails are all the changes that I'm aware of at this > > > point. > > > > > > Iterate differences > > > > > > Walltime hard limit with coasters > > > > > > Associate arrays changes > > > > > > Tracebacker > > > > > > Coaster changes / parameters > > > > > > Slurm and LSF providers > > > > > > Condor provider changes > > > > > > ssh-cl > > > > > > TUI > > > > > > hang checker > > > > > > @functions (strjoin, and possibly others) > > > > > > Dynamic profiles > > > > > > Wrapper staging > > > > > > Pass-thru (PBS attributes) > > > > > > MPI support > > > > > > ----- Original Message ----- > > > > > > > From: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > > Sent: Thursday, January 24, 2013 7:56:54 PM > > > > > > > Subject: 0.94 release note draft > > > > > > > > > > > > > > > > > > > > > Hi David, > > > > > > > > > > > > > > I recall I sent you a few batches of line items to list in 0.94 > > > > > > > release notes. Did you gather those somewhere where I can > > > > review > > > > > > > them? (Need them for a status report) > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Michael Wilde > > > > > > > Computation Institute, University of Chicago > > > > > > > Mathematics and Computer Science Division > > > > > > > Argonne National Laboratory > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Fri Mar 1 12:36:03 2013 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 01 Mar 2013 10:36:03 -0800 Subject: [Swift-devel] 0.94 release note draft In-Reply-To: <2079798110.2181490.1362143165573.JavaMail.root@ci.uchicago.edu> References: <2079798110.2181490.1362143165573.JavaMail.root@ci.uchicago.edu> Message-ID: <1362162963.29670.0.camel@echo> So I don't think I committed the memory leak fixes to 0.94 and I think they should be there. Mihael On Fri, 2013-03-01 at 07:06 -0600, David Kelly wrote: > Here is the list, based mostly on what I could find in the svn logs. > > > > - To behavior of iterate has changed from 0.93 to 0.94. If you have scripts > that use iterate, please read > http://www.ci.uchicago.edu/swift/guides/release-0.94/userguide/userguide.html#_iterate > > > - Walltimes are more strictly enforced by coasters. Previous to Swift version > 0.94, if an application run with coasters would exceed its specified > maxwalltime, it would be allowed to continue to execute. However, if this > would cause the worker on which the application was running to exceed its > maxwalltime, the queuing system would kill the worker. The resulting error > message was not always very clear. Since version 0.94 coaster workers enforce > the user-specified maxwalltime. If an application exceeds its maxwalltime, > the coaster worker will not allow it to continue, but terminate it and report > the error. > > > - Swift will now use camel case for functions, for example, @toInt instead of > @toint. The previous naming convention will still work, but you may see > deprecated warnings. > > > - Associative arrays have been added. More details and examples can be found at > http://www.ci.uchicago.edu/swift/guides/release-0.94/userguide/userguide.html#_associative_arrays > > > - Dynamic profiles. Many settings formerly only definable in sites.xml can now > be set on a per-app basis. This can make things easier when running > multiple apps that have different requirements for settings like processors > per node and wallitme. > http://www.ci.uchicago.edu/swift/guides/release-0.94/userguide/userguide.html#_dynamic_profiles > > > - Added a new ssh command line provider. Previously ssh support was done by > creating a file called ~/.ssh/auth.defaults. The ssh command line provider > is more flexible and doesn't require this step. ssh-cl allows you to use > SSH agents. You can use ssh-cl by adding something like this to your sites.xml: > > > > > > - Many fixes and improvements to improve the reliability and performance of > coaster provider staging. > > > - Added support for the Slurm scheduler > > > - Added support for the LSF scheduler > > > - Improvements to condor provider (non-shared jobtype and more flexibility > to define what gets added to the submit script). > > > - Fixes for the textual user interface (TUI). Adding the -tui option to the swift > command line allows you to monitor progress in a curses based menu. A brief > example of this can be found at http://www.ci.uchicago.edu/~davidk/modis.ogv. > > > - Added the ability to call Java methods within swift using @java. For example: > float f = @java("java.lang.Math", "sin", 0.5); > http://www.ci.uchicago.edu/swift/guides/release-0.94/userguide/userguide.html#_java > > > - Added a hang checker that provides the user with more information about > potential hangs > > > - @strjoin function for joining strings. > http://www.ci.uchicago.edu/swift/guides/release-0.94/userguide/userguide.html#_strjoin > > > - If you have a requirement that a command get run on the worker node before > processing any work, worker.pl will now execute commands stored in the > environment variable $WORKER_INIT_CMD > > > - Use $SWIFT_USERHOME to determine where to swift should create some of its > required files. This defaults to $HOME, but this may cause problems in some > situations where $HOME is not accessable on worker nodes. > > > - Experimental "wrapper staging" feature that delegates file staging to an > external wrapper script. > > > - Various improvements to the way that Swift runs MPI jobs. > > > - Better OSG integration/support using GlideinWMS. > > > ----- Original Message ----- > > > From: "Michael Wilde" > To: "David Kelly" > Cc: "Swift Devel" > Sent: Thursday, February 14, 2013 1:08:56 PM > Subject: Fwd: 0.94 release note draft > > > Some notes toward an 0.94 release notes document. > > There's a longer list, I think in an IM chat transcript, that we need to incorporate. > > Please send additional items to this thread for David to integrate. > > Thanks, > > - Mike From hategan at mcs.anl.gov Fri Mar 1 12:38:41 2013 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 01 Mar 2013 10:38:41 -0800 Subject: [Swift-devel] Scripts with Iterate fails to compile In-Reply-To: <1940171960.2237777.1362151088628.JavaMail.root@ci.uchicago.edu> References: <1940171960.2237777.1362151088628.JavaMail.root@ci.uchicago.edu> Message-ID: <1362163121.29670.1.camel@echo> This seems to indicate that the variable closing reference counting is in 0.94. Are we sure we want that? Mihael On Fri, 2013-03-01 at 09:18 -0600, David Kelly wrote: > Yadu, > > The script you wrote works in 0.93, but fails in trunk/0.94. Setting an array element from within an iterate statement seems to fail. The nightly test group we run doesn't test for that at the moment - good catch. > > ----- Original Message ----- > > > From: "Yadu Nand" > > To: "swift-devel" > > Sent: Friday, March 1, 2013 8:49:42 AM > > Subject: [Swift-devel] Scripts with Iterate fails to compile > > > Hi, > > > I'm seeing my scripts fail to compile just as the scripts in swift > > tests fail with the same error: > > > yadu at Miranda:~/swift-0.94/cog/modules/swift/tests/apps$ swift > > 050-stomp-skel-1.swift > > Warning: Function toint is deprecated, at 9 > > Could not start execution > > Failed to convert .swiftx to .kml for 050-stomp-skel-1.swift > > no such attribute: refs in template context [iterate] > > > The script I wrote which doesn't compile is here -> > > https://github.com/yadudoc/swift-basics/blob/master/stress/x_iterate.swift > > > My tests were on swift 0.94 run on my laptop (ubuntu 12.04). > > > -- > > Thanks, > > Yadu Nand B > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > Yadu, > > > The script you wrote works in 0.93, but fails in trunk/0.94. Setting > an array element from within an iterate statement seems to fail. The > nightly test group we run doesn't test for that at the moment - good > catch. > > > ______________________________________________________________________ > From: "Yadu Nand" > To: "swift-devel" > Sent: Friday, March 1, 2013 8:49:42 AM > Subject: [Swift-devel] Scripts with Iterate fails to compile > > Hi, > > > I'm seeing my scripts fail to compile just as the scripts in > swift tests fail with the same error: > yadu at Miranda:~/swift-0.94/cog/modules/swift/tests/apps$ swift > 050-stomp-skel-1.swift > Warning: Function toint is deprecated, at 9 > Could not start execution > Failed to convert .swiftx to .kml for 050-stomp-skel-1.swift > no such attribute: refs in template context [iterate] > > > The script I wrote which doesn't compile is here -> > https://github.com/yadudoc/swift-basics/blob/master/stress/x_iterate.swift > > > My tests were on swift 0.94 run on my laptop (ubuntu 12.04). > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > From yadudoc1729 at gmail.com Fri Mar 1 13:23:27 2013 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Sat, 2 Mar 2013 00:53:27 +0530 Subject: [Swift-devel] Swift crashing for runs with 1M calls Message-ID: Hi, I'm trying to see swift behavior with some stress on and I see crashes at close to 1M calls/loops. Here's one such case ( x_func.swift code is here -> https://github.com/yadudoc/swift-basics/blob/master/stress/x_func.swift): Test run with 0.94 ( corei5 with 8Gb ram) yadu at Miranda:~/src/swift-basics/stress$ time swift x_func.swift -loops=1000000 Swift 0.94RC3 swift-r6268 cog-r3605 RunID: 20130302-0019-l2alv5d2 Progress: time: Sat, 02 Mar 2013 00:19:38 +0530 Fibonacci[2] = 1 No events in 10s. Progress: time: Sat, 02 Mar 2013 00:20:08 +0530 Progress: time: Sat, 02 Mar 2013 00:20:38 +0530 Progress: time: Sat, 02 Mar 2013 00:21:08 +0530 Progress: time: Sat, 02 Mar 2013 00:21:38 +0530 Progress: time: Sat, 02 Mar 2013 00:22:08 +0530 Progress: time: Sat, 02 Mar 2013 00:22:38 +0530 Progress: time: Sat, 02 Mar 2013 00:23:08 +0530 Finding dependency loops...Exception in thread "Hang checker" java.lang.StackOverflowError at java.util.HashMap.put(HashMap.java:484) at java.util.HashSet.add(HashSet.java:217) at org.griphyn.vdl.karajan.HangChecker.findLoop(HangChecker.java:299) at org.griphyn.vdl.karajan.HangChecker.findLoop(HangChecker.java:303) Earlier I had posted about iterate failing to compile on 0.94. I ran a few tests on 0.93 (code : https://github.com/yadudoc/swift-basics/blob/master/stress/x_iterate.swift) with results: 100K loops -> 17s real 1M loops -> No response for a while, then java.lang.OutOfMemory exception thrown. I'm finding it difficult to keep stats so, I'm placing the stats and as much run info as possible in comments in the scripts themselves. All the scripts are here -> https://github.com/yadudoc/swift-basics/tree/master/stress -Yadu -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Fri Mar 1 14:01:30 2013 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 1 Mar 2013 14:01:30 -0600 (CST) Subject: [Swift-devel] Swift crashing for runs with 1M calls In-Reply-To: Message-ID: <680226079.1624588.1362168090499.JavaMail.root@mcs.anl.gov> Hurray!!!! Nice work, Yadu. I think you need to do a few things here: - put in some (perhaps periodic) logging to see how far fib progressed - try a non-recursive example: fib() is not reflective of most swift scripts and excessively challenges the call stack - learn how to control Java heap and stack limits (the former is documented *I think* in the User Guide and if not needs to be, and should be, in the release notes Others may have some more specific guidance for you. - Mike ----- Original Message ----- > From: "Yadu Nand" > To: "swift-devel" > Sent: Friday, March 1, 2013 1:23:27 PM > Subject: [Swift-devel] Swift crashing for runs with 1M calls > > > Hi, > > > I'm trying to see swift behavior with some stress on and I see > crashes at close to 1M calls/loops. > > > Here's one such case ( x_func.swift code is here -> > https://github.com/yadudoc/swift-basics/blob/master/stress/x_func.swift > ): > Test run with 0.94 ( corei5 with 8Gb ram) > > > yadu at Miranda:~/src/swift-basics/stress$ time swift x_func.swift > -loops=1000000 > Swift 0.94RC3 swift-r6268 cog-r3605 > > > RunID: 20130302-0019-l2alv5d2 > Progress: time: Sat, 02 Mar 2013 00:19:38 +0530 > Fibonacci[2] = 1 > No events in 10s. > Progress: time: Sat, 02 Mar 2013 00:20:08 +0530 > Progress: time: Sat, 02 Mar 2013 00:20:38 +0530 > Progress: time: Sat, 02 Mar 2013 00:21:08 +0530 > Progress: time: Sat, 02 Mar 2013 00:21:38 +0530 > Progress: time: Sat, 02 Mar 2013 00:22:08 +0530 > Progress: time: Sat, 02 Mar 2013 00:22:38 +0530 > Progress: time: Sat, 02 Mar 2013 00:23:08 +0530 > Finding dependency loops...Exception in thread "Hang checker" > java.lang.StackOverflowError > at java.util.HashMap.put(HashMap.java:484) > at java.util.HashSet.add(HashSet.java:217) > at org.griphyn.vdl.karajan.HangChecker.findLoop(HangChecker.java:299) > at org.griphyn.vdl.karajan.HangChecker.findLoop(HangChecker.java:303) > > > Earlier I had posted about iterate failing to compile on 0.94. I ran > a few tests on 0.93 > (code : > https://github.com/yadudoc/swift-basics/blob/master/stress/x_iterate.swift > ) with > results: > 100K loops -> 17s real > 1M loops -> No response for a while, then java.lang.OutOfMemory > exception thrown. > > > I'm finding it difficult to keep stats so, I'm placing the stats and > as much run info as > possible in comments in the scripts themselves. All the scripts are > here -> > https://github.com/yadudoc/swift-basics/tree/master/stress > > > > > -Yadu > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > From tim.g.armstrong at gmail.com Fri Mar 1 14:05:03 2013 From: tim.g.armstrong at gmail.com (Tim Armstrong) Date: Fri, 1 Mar 2013 14:05:03 -0600 Subject: [Swift-devel] Swift crashing for runs with 1M calls In-Reply-To: <680226079.1624588.1362168090499.JavaMail.root@mcs.anl.gov> References: <680226079.1624588.1362168090499.JavaMail.root@mcs.anl.gov> Message-ID: Yadu, awesome, feel free to try and break Swift/T as well :) There isn't any recursion in Yadu's code - it looks like there's a recursive call in the hang checker. - Tim On Fri, Mar 1, 2013 at 2:01 PM, Michael Wilde wrote: > Hurray!!!! Nice work, Yadu. > > I think you need to do a few things here: > > - put in some (perhaps periodic) logging to see how far fib progressed > - try a non-recursive example: fib() is not reflective of most swift > scripts and excessively challenges the call stack > - learn how to control Java heap and stack limits (the former is > documented *I think* in the User Guide and if not needs to be, and should > be, in the release notes > > Others may have some more specific guidance for you. > > - Mike > > > ----- Original Message ----- > > From: "Yadu Nand" > > To: "swift-devel" > > Sent: Friday, March 1, 2013 1:23:27 PM > > Subject: [Swift-devel] Swift crashing for runs with 1M calls > > > > > > Hi, > > > > > > I'm trying to see swift behavior with some stress on and I see > > crashes at close to 1M calls/loops. > > > > > > Here's one such case ( x_func.swift code is here -> > > https://github.com/yadudoc/swift-basics/blob/master/stress/x_func.swift > > ): > > Test run with 0.94 ( corei5 with 8Gb ram) > > > > > > yadu at Miranda:~/src/swift-basics/stress$ time swift x_func.swift > > -loops=1000000 > > Swift 0.94RC3 swift-r6268 cog-r3605 > > > > > > RunID: 20130302-0019-l2alv5d2 > > Progress: time: Sat, 02 Mar 2013 00:19:38 +0530 > > Fibonacci[2] = 1 > > No events in 10s. > > Progress: time: Sat, 02 Mar 2013 00:20:08 +0530 > > Progress: time: Sat, 02 Mar 2013 00:20:38 +0530 > > Progress: time: Sat, 02 Mar 2013 00:21:08 +0530 > > Progress: time: Sat, 02 Mar 2013 00:21:38 +0530 > > Progress: time: Sat, 02 Mar 2013 00:22:08 +0530 > > Progress: time: Sat, 02 Mar 2013 00:22:38 +0530 > > Progress: time: Sat, 02 Mar 2013 00:23:08 +0530 > > Finding dependency loops...Exception in thread "Hang checker" > > java.lang.StackOverflowError > > at java.util.HashMap.put(HashMap.java:484) > > at java.util.HashSet.add(HashSet.java:217) > > at org.griphyn.vdl.karajan.HangChecker.findLoop(HangChecker.java:299) > > at org.griphyn.vdl.karajan.HangChecker.findLoop(HangChecker.java:303) > > > > > > Earlier I had posted about iterate failing to compile on 0.94. I ran > > a few tests on 0.93 > > (code : > > > https://github.com/yadudoc/swift-basics/blob/master/stress/x_iterate.swift > > ) with > > results: > > 100K loops -> 17s real > > 1M loops -> No response for a while, then java.lang.OutOfMemory > > exception thrown. > > > > > > I'm finding it difficult to keep stats so, I'm placing the stats and > > as much run info as > > possible in comments in the scripts themselves. All the scripts are > > here -> > > https://github.com/yadudoc/swift-basics/tree/master/stress > > > > > > > > > > -Yadu > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Fri Mar 1 14:07:15 2013 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 1 Mar 2013 14:07:15 -0600 (CST) Subject: [Swift-devel] 0.94 release note draft In-Reply-To: <2079798110.2181490.1362143165573.JavaMail.root@ci.uchicago.edu> Message-ID: <1706178604.1625084.1362168435558.JavaMail.root@mcs.anl.gov> David, this is a great list, and a great start to the Release Notes doc. Before releasing, though, you need to take many of the items to the next level of detail. If we dont want to change the Userguide prior to release, write sections in the release notes that will go into trunk Userguide. Some notes, below. Some are easy to fix. For ones (eg MPI) where time may not permit, maybe say "Ask on the Swift list and watch the trunk doc for emerging details." ??? - Mike ----- Original Message ----- > From: "David Kelly" > To: "Michael Wilde" > Cc: "Swift Devel" > Sent: Friday, March 1, 2013 7:06:05 AM > Subject: Re: 0.94 release note draft > > > Here is the list, based mostly on what I could find in the svn logs. > > > > - To behavior of iterate has changed from 0.93 to 0.94. If you have > scripts > that use iterate, please read > http://www.ci.uchicago.edu/swift/guides/release-0.94/userguide/userguide.html#_iterate Spell out what changed. Further, the userguide section on iterate needs more clarity. Echo what changed, in the user guide (by adding a short note and example to the end marked "behavior prior to 0.94"). The example of i and j termination being different is important but needs to be clarified. > - Walltimes are more strictly enforced by coasters. Are now enforced, where before they were not. > Previous to Swift > version > 0.94, if an application run with coasters would exceed its specified > maxwalltime, it would be allowed to continue to execute. However, if > this > would cause the worker on which the application was running to exceed > its > maxwalltime, the queuing system would kill the worker. The resulting > error > message was not always very clear. Since version 0.94 coaster workers > enforce > the user-specified maxwalltime. If an application exceeds its > maxwalltime, > the coaster worker will not allow it to continue, but terminate it > and report > the error. Need to explain the diff between maxtime and maxwalltime. This is lacking in the userguide. > > - Swift will now use camel case for functions, for example, @toInt > instead of > @toint. The previous naming convention will still work, but you may > see > deprecated warnings. For example? What do the messages look like? Exactly how is case-difference treated? What were the old conventions and what are the new ones? > - Associative arrays have been added. More details and examples can > be found at > http://www.ci.uchicago.edu/swift/guides/release-0.94/userguide/userguide.html#_associative_arrays > - Dynamic profiles. Many settings formerly only definable in > sites.xml can now > be set on a per-app basis. This can make things easier when running > multiple apps that have different requirements for settings like > processors > per node and wallitme. > http://www.ci.uchicago.edu/swift/guides/release-0.94/userguide/userguide.html#_dynamic_profiles > > > - Added a new ssh command line provider. Previously ssh support was > done by > creating a file called ~/.ssh/auth.defaults. The ssh command line > provider > is more flexible and doesn't require this step. ssh-cl allows you to > use > SSH agents. You can use ssh-cl by adding something like this to your > sites.xml: > > > jobmanager="ssh-cl:pbs"/> > > > - Many fixes and improvements to improve the reliability and > performance of > coaster provider staging. > > > - Added support for the Slurm scheduler How to use? What are the parameter override/passing issues? > > - Added support for the LSF scheduler ditto > > - Improvements to condor provider (non-shared jobtype and more > flexibility > to define what gets added to the submit script). How to use? > > - Fixes for the textual user interface (TUI). Adding the -tui option > to the swift > command line allows you to monitor progress in a curses based menu. A > brief > example of this can be found at > http://www.ci.uchicago.edu/~davidk/modis.ogv. Insert into README? > > - Added the ability to call Java methods within swift using @java. > For example: > float f = @java("java.lang.Math", "sin", 0.5); > http://www.ci.uchicago.edu/swift/guides/release-0.94/userguide/userguide.html#_java > > > - Added a hang checker that provides the user with more information > about > potential hangs Much more explanation needed here. WHat causes hangs? Which are real, which may be "just slow"? How to read the traceback? > > - @strjoin function for joining strings. > http://www.ci.uchicago.edu/swift/guides/release-0.94/userguide/userguide.html#_strjoin > > - If you have a requirement that a command get run on the worker node > before > processing any work, worker.pl will now execute commands stored in > the > environment variable $WORKER_INIT_CMD How to format the var? > > - Use $SWIFT_USERHOME to determine where to swift should create some > of its > required files. This defaults to $HOME, but this may cause problems > in some > situations where $HOME is not accessable on worker nodes. Unclear. Still a problem, or thats when to use this? > > - Experimental "wrapper staging" feature that delegates file staging > to an > external wrapper script. DOnt mention this; no one can use it! > > - Various improvements to the way that Swift runs MPI jobs. Much much more needed here. > > - Better OSG integration/support using GlideinWMS. Ditto. > > ----- Original Message ----- > > > From: "Michael Wilde" > To: "David Kelly" > Cc: "Swift Devel" > Sent: Thursday, February 14, 2013 1:08:56 PM > Subject: Fwd: 0.94 release note draft > > > Some notes toward an 0.94 release notes document. > > There's a longer list, I think in an IM chat transcript, that we need > to incorporate. > > Please send additional items to this thread for David to integrate. > > Thanks, > > - Mike > > ----- Forwarded Message ----- > From: "David Kelly" > To: "Michael Wilde" > Sent: Thursday, January 24, 2013 11:06:18 PM > Subject: Re: 0.94 release note draft > > Mike, > > I just have the quick notes I took from our meeting. These combined > with your emails are all the changes that I'm aware of at this > point. > > Iterate differences > Walltime hard limit with coasters > Associate arrays changes > Tracebacker > Coaster changes / parameters > Slurm and LSF providers > Condor provider changes > ssh-cl > TUI > hang checker > @functions (strjoin, and possibly others) > Dynamic profiles > Wrapper staging > Pass-thru (PBS attributes) > MPI support > > > ----- Original Message ----- > > From: "Michael Wilde" > > To: "David Kelly" > > Sent: Thursday, January 24, 2013 7:56:54 PM > > Subject: 0.94 release note draft > > > > > > Hi David, > > > > I recall I sent you a few batches of line items to list in 0.94 > > release notes. Did you gather those somewhere where I can review > > them? (Need them for a status report) > > > > Thanks, > > > > - Mike > > > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > > > From davidk at ci.uchicago.edu Fri Mar 1 14:28:25 2013 From: davidk at ci.uchicago.edu (David Kelly) Date: Fri, 1 Mar 2013 14:28:25 -0600 (CST) Subject: [Swift-devel] Scripts with Iterate fails to compile In-Reply-To: <1362163121.29670.1.camel@echo> Message-ID: <140868916.2476436.1362169705030.JavaMail.root@ci.uchicago.edu> Mihael, It's probably worth discussing. Can you remind me, was variable closing reference counting was the fix for bug #927, failure assigning a file array from within an if statement? If so, I think that is important to have because without that ncar is unable to run their scripts in 0.94. They're in the process of migrating machines which will require the LSF provider, so 0.93 will not be an option for them for very long. David ----- Original Message ----- From: "Mihael Hategan" To: "David Kelly" Cc: "Yadu Nand" , "swift-devel" Sent: Friday, March 1, 2013 12:38:41 PM Subject: Re: [Swift-devel] Scripts with Iterate fails to compile This seems to indicate that the variable closing reference counting is in 0.94. Are we sure we want that? Mihael On Fri, 2013-03-01 at 09:18 -0600, David Kelly wrote: > Yadu, > > The script you wrote works in 0.93, but fails in trunk/0.94. Setting an array element from within an iterate statement seems to fail. The nightly test group we run doesn't test for that at the moment - good catch. > > ----- Original Message ----- > > > From: "Yadu Nand" > > To: "swift-devel" > > Sent: Friday, March 1, 2013 8:49:42 AM > > Subject: [Swift-devel] Scripts with Iterate fails to compile > > > Hi, > > > I'm seeing my scripts fail to compile just as the scripts in swift > > tests fail with the same error: > > > yadu at Miranda:~/swift-0.94/cog/modules/swift/tests/apps$ swift > > 050-stomp-skel-1.swift > > Warning: Function toint is deprecated, at 9 > > Could not start execution > > Failed to convert .swiftx to .kml for 050-stomp-skel-1.swift > > no such attribute: refs in template context [iterate] > > > The script I wrote which doesn't compile is here -> > > https://github.com/yadudoc/swift-basics/blob/master/stress/x_iterate.swift > > > My tests were on swift 0.94 run on my laptop (ubuntu 12.04). > > > -- > > Thanks, > > Yadu Nand B > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > Yadu, > > > The script you wrote works in 0.93, but fails in trunk/0.94. Setting > an array element from within an iterate statement seems to fail. The > nightly test group we run doesn't test for that at the moment - good > catch. > > > ______________________________________________________________________ > From: "Yadu Nand" > To: "swift-devel" > Sent: Friday, March 1, 2013 8:49:42 AM > Subject: [Swift-devel] Scripts with Iterate fails to compile > > Hi, > > > I'm seeing my scripts fail to compile just as the scripts in > swift tests fail with the same error: > yadu at Miranda:~/swift-0.94/cog/modules/swift/tests/apps$ swift > 050-stomp-skel-1.swift > Warning: Function toint is deprecated, at 9 > Could not start execution > Failed to convert .swiftx to .kml for 050-stomp-skel-1.swift > no such attribute: refs in template context [iterate] > > > The script I wrote which doesn't compile is here -> > https://github.com/yadudoc/swift-basics/blob/master/stress/x_iterate.swift > > > My tests were on swift 0.94 run on my laptop (ubuntu 12.04). > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yadudoc1729 at gmail.com Fri Mar 1 14:28:08 2013 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Sat, 2 Mar 2013 01:58:08 +0530 Subject: [Swift-devel] Swift crashing for runs with 1M calls In-Reply-To: References: <680226079.1624588.1362168090499.JavaMail.root@mcs.anl.gov> Message-ID: Thank Mike, today was a good day :) Tim, sure, I can try that once we have a large enough net to catch bugs here. I will try adding some logging to see progress on the fibonacci code. Here's the stats for the fibonacci code, and as Tim noted the code is not recursive. fibonacci stats (x_func.swift) : 1K -> 3.059s 10K -> 5.047s (double check results here) 100K -> 31.85s 1M -> Exception in thread "Hang checker" java.lang.StackOverflowError at java.util.HashMap.put(HashMap.java:484) at java.util.HashSet.add(HashSet.java:217) I have a recursion code to calculate N+(N-1)... +2+1 (code : x_recursion.swift). This one fails much faster and with very different logs and exceptions: x_recursion stats: 1K -> 25s real (i5 8gb) 10K -> (10% ram 8gb in use by java) Uncaught exception: java.lang.StackOverflowError in vdl:unitstart @ x_recursion.kml, line: 45 java.lang.StackOverflowError at java.lang.String.valueOf(String.java:2959) at org.globus.cog.karajan.util.ThreadingContext.toString(ThreadingContext.java:87) at org.globus.cog.karajan.util.ThreadingContext.toString(ThreadingContext.java:87) ... Exception is: java.lang.StackOverflowError Near Karajan line: vdl:unitstart @ x_recursion.kml, line: 45 Another uncaught exception while handling an uncaught exception. java.lang.StackOverflowError at org.globus.cog.karajan.workflow.nodes.FlowNode.failImmediately(FlowNode.java:77) at org.globus.cog.karajan.workflow.nodes.FlowNode.failed(FlowNode.java:245) ... The initial exception was java.lang.StackOverflowError at java.lang.String.valueOf(String.java:2959) at org.globus.cog.karajan.util.ThreadingContext.toString(ThreadingContext.java:87) at org.globus.cog.karajan.util.ThreadingContext.toString(ThreadingContext.java:87) 100K -> ? -Yadu On Sat, Mar 2, 2013 at 1:35 AM, Tim Armstrong wrote: > Yadu, awesome, feel free to try and break Swift/T as well :) > > There isn't any recursion in Yadu's code - it looks like there's a > recursive call in the hang checker. > > - Tim > > > > On Fri, Mar 1, 2013 at 2:01 PM, Michael Wilde wrote: > >> Hurray!!!! Nice work, Yadu. >> >> I think you need to do a few things here: >> >> - put in some (perhaps periodic) logging to see how far fib progressed >> - try a non-recursive example: fib() is not reflective of most swift >> scripts and excessively challenges the call stack >> - learn how to control Java heap and stack limits (the former is >> documented *I think* in the User Guide and if not needs to be, and should >> be, in the release notes >> >> Others may have some more specific guidance for you. >> >> - Mike >> >> >> ----- Original Message ----- >> > From: "Yadu Nand" >> > To: "swift-devel" >> > Sent: Friday, March 1, 2013 1:23:27 PM >> > Subject: [Swift-devel] Swift crashing for runs with 1M calls >> > >> > >> > Hi, >> > >> > >> > I'm trying to see swift behavior with some stress on and I see >> > crashes at close to 1M calls/loops. >> > >> > >> > Here's one such case ( x_func.swift code is here -> >> > https://github.com/yadudoc/swift-basics/blob/master/stress/x_func.swift >> > ): >> > Test run with 0.94 ( corei5 with 8Gb ram) >> > >> > >> > yadu at Miranda:~/src/swift-basics/stress$ time swift x_func.swift >> > -loops=1000000 >> > Swift 0.94RC3 swift-r6268 cog-r3605 >> > >> > >> > RunID: 20130302-0019-l2alv5d2 >> > Progress: time: Sat, 02 Mar 2013 00:19:38 +0530 >> > Fibonacci[2] = 1 >> > No events in 10s. >> > Progress: time: Sat, 02 Mar 2013 00:20:08 +0530 >> > Progress: time: Sat, 02 Mar 2013 00:20:38 +0530 >> > Progress: time: Sat, 02 Mar 2013 00:21:08 +0530 >> > Progress: time: Sat, 02 Mar 2013 00:21:38 +0530 >> > Progress: time: Sat, 02 Mar 2013 00:22:08 +0530 >> > Progress: time: Sat, 02 Mar 2013 00:22:38 +0530 >> > Progress: time: Sat, 02 Mar 2013 00:23:08 +0530 >> > Finding dependency loops...Exception in thread "Hang checker" >> > java.lang.StackOverflowError >> > at java.util.HashMap.put(HashMap.java:484) >> > at java.util.HashSet.add(HashSet.java:217) >> > at org.griphyn.vdl.karajan.HangChecker.findLoop(HangChecker.java:299) >> > at org.griphyn.vdl.karajan.HangChecker.findLoop(HangChecker.java:303) >> > >> > >> > Earlier I had posted about iterate failing to compile on 0.94. I ran >> > a few tests on 0.93 >> > (code : >> > >> https://github.com/yadudoc/swift-basics/blob/master/stress/x_iterate.swift >> > ) with >> > results: >> > 100K loops -> 17s real >> > 1M loops -> No response for a while, then java.lang.OutOfMemory >> > exception thrown. >> > >> > >> > I'm finding it difficult to keep stats so, I'm placing the stats and >> > as much run info as >> > possible in comments in the scripts themselves. All the scripts are >> > here -> >> > https://github.com/yadudoc/swift-basics/tree/master/stress >> > >> > >> > >> > >> > -Yadu >> > _______________________________________________ >> > Swift-devel mailing list >> > Swift-devel at ci.uchicago.edu >> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel >> > >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel >> > > -- Thanks and Regards, Yadu Nand B -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidk at ci.uchicago.edu Fri Mar 1 15:13:34 2013 From: davidk at ci.uchicago.edu (David Kelly) Date: Fri, 1 Mar 2013 15:13:34 -0600 (CST) Subject: [Swift-devel] 0.94 release note draft In-Reply-To: <1362162963.29670.0.camel@echo> Message-ID: <1622402011.2580439.1362172414740.JavaMail.root@ci.uchicago.edu> I think it is there. It looks like your memory fixes were committed on 2-2, and the current 0.94 branch is based on a snapshot of trunk from 2-15. ----- Original Message ----- > From: "Mihael Hategan" > To: "David Kelly" > Cc: "Michael Wilde" , "Swift Devel" > > Sent: Friday, March 1, 2013 12:36:03 PM > Subject: Re: [Swift-devel] 0.94 release note draft > So I don't think I committed the memory leak fixes to 0.94 and I > think > they should be there. > Mihael > On Fri, 2013-03-01 at 07:06 -0600, David Kelly wrote: > > Here is the list, based mostly on what I could find in the svn > > logs. > > > > > > > > - To behavior of iterate has changed from 0.93 to 0.94. If you have > > scripts > > that use iterate, please read > > http://www.ci.uchicago.edu/swift/guides/release-0.94/userguide/userguide.html#_iterate > > > > > > - Walltimes are more strictly enforced by coasters. Previous to > > Swift version > > 0.94, if an application run with coasters would exceed its > > specified > > maxwalltime, it would be allowed to continue to execute. However, > > if this > > would cause the worker on which the application was running to > > exceed its > > maxwalltime, the queuing system would kill the worker. The > > resulting error > > message was not always very clear. Since version 0.94 coaster > > workers enforce > > the user-specified maxwalltime. If an application exceeds its > > maxwalltime, > > the coaster worker will not allow it to continue, but terminate it > > and report > > the error. > > > > > > - Swift will now use camel case for functions, for example, @toInt > > instead of > > @toint. The previous naming convention will still work, but you may > > see > > deprecated warnings. > > > > > > - Associative arrays have been added. More details and examples can > > be found at > > http://www.ci.uchicago.edu/swift/guides/release-0.94/userguide/userguide.html#_associative_arrays > > > > > > - Dynamic profiles. Many settings formerly only definable in > > sites.xml can now > > be set on a per-app basis. This can make things easier when running > > multiple apps that have different requirements for settings like > > processors > > per node and wallitme. > > http://www.ci.uchicago.edu/swift/guides/release-0.94/userguide/userguide.html#_dynamic_profiles > > > > > > - Added a new ssh command line provider. Previously ssh support was > > done by > > creating a file called ~/.ssh/auth.defaults. The ssh command line > > provider > > is more flexible and doesn't require this step. ssh-cl allows you > > to use > > SSH agents. You can use ssh-cl by adding something like this to > > your sites.xml: > > > > > > > jobmanager="ssh-cl:pbs"/> > > > > > > - Many fixes and improvements to improve the reliability and > > performance of > > coaster provider staging. > > > > > > - Added support for the Slurm scheduler > > > > > > - Added support for the LSF scheduler > > > > > > - Improvements to condor provider (non-shared jobtype and more > > flexibility > > to define what gets added to the submit script). > > > > > > - Fixes for the textual user interface (TUI). Adding the -tui > > option to the swift > > command line allows you to monitor progress in a curses based menu. > > A brief > > example of this can be found at > > http://www.ci.uchicago.edu/~davidk/modis.ogv. > > > > > > - Added the ability to call Java methods within swift using @java. > > For example: > > float f = @java("java.lang.Math", "sin", 0.5); > > http://www.ci.uchicago.edu/swift/guides/release-0.94/userguide/userguide.html#_java > > > > > > - Added a hang checker that provides the user with more information > > about > > potential hangs > > > > > > - @strjoin function for joining strings. > > http://www.ci.uchicago.edu/swift/guides/release-0.94/userguide/userguide.html#_strjoin > > > > > > - If you have a requirement that a command get run on the worker > > node before > > processing any work, worker.pl will now execute commands stored in > > the > > environment variable $WORKER_INIT_CMD > > > > > > - Use $SWIFT_USERHOME to determine where to swift should create > > some of its > > required files. This defaults to $HOME, but this may cause problems > > in some > > situations where $HOME is not accessable on worker nodes. > > > > > > - Experimental "wrapper staging" feature that delegates file > > staging to an > > external wrapper script. > > > > > > - Various improvements to the way that Swift runs MPI jobs. > > > > > > - Better OSG integration/support using GlideinWMS. > > > > > > ----- Original Message ----- > > > > > > From: "Michael Wilde" > > To: "David Kelly" > > Cc: "Swift Devel" > > Sent: Thursday, February 14, 2013 1:08:56 PM > > Subject: Fwd: 0.94 release note draft > > > > > > Some notes toward an 0.94 release notes document. > > > > There's a longer list, I think in an IM chat transcript, that we > > need to incorporate. > > > > Please send additional items to this thread for David to integrate. > > > > Thanks, > > > > - Mike -------------- next part -------------- An HTML attachment was scrubbed... URL: From lpesce at uchicago.edu Fri Mar 1 15:32:20 2013 From: lpesce at uchicago.edu (Lorenzo Pesce) Date: Fri, 1 Mar 2013 15:32:20 -0600 Subject: [Swift-devel] 0.94 release note draft In-Reply-To: <1622402011.2580439.1362172414740.JavaMail.root@ci.uchicago.edu> References: <1622402011.2580439.1362172414740.JavaMail.root@ci.uchicago.edu> Message-ID: <5C0CBEC1-DF0B-439D-9E69-7D81BDAD5091@uchicago.edu> David, Have mercy on me having lost track of the situation. Maybe tomorrow, most definitely next week I will start to increase the size of some of my swift runs till I hit the thousands of jobs. Some will be very heavy in I/O and computations (and pipeline complexity). I have taken today off to recover and be ready. What do you suggest should be my approach to this problem? I am currently trying to develop pipelines with SWIFT_HOME=/soft/swift/0.94-2012.1102 or #SWIFT_HOME=/home/davidk/swift-trunk/cog/modules/swift/dist/swift-svn then move them to fast and see where it breaks. Should i change the module I am using? I need to learn how to make the installation work and get over my hatred for java... All the pipelines I will be trying are expected eventually to reach sustained runs of 10K+ jobs in parallel with workflows as deep as 20 stages (including steps that increase the jobs by one or two orders of magnitude with a total file load of a few tens of TBs) Do you think that it is sensible? We will then mover to other machines and hopefully test fusion and other approaches. Lorenzo On Mar 1, 2013, at 3:13 PM, David Kelly wrote: > I think it is there. It looks like your memory fixes were committed on 2-2, and the current 0.94 branch is based on a snapshot of trunk from 2-15. > > From: "Mihael Hategan" > To: "David Kelly" > Cc: "Michael Wilde" , "Swift Devel" > Sent: Friday, March 1, 2013 12:36:03 PM > Subject: Re: [Swift-devel] 0.94 release note draft > > So I don't think I committed the memory leak fixes to 0.94 and I think > they should be there. > > Mihael > > On Fri, 2013-03-01 at 07:06 -0600, David Kelly wrote: > > Here is the list, based mostly on what I could find in the svn logs. > > > > > > > > - To behavior of iterate has changed from 0.93 to 0.94. If you have scripts > > that use iterate, please read > > http://www.ci.uchicago.edu/swift/guides/release-0.94/userguide/userguide.html#_iterate > > > > > > - Walltimes are more strictly enforced by coasters. Previous to Swift version > > 0.94, if an application run with coasters would exceed its specified > > maxwalltime, it would be allowed to continue to execute. However, if this > > would cause the worker on which the application was running to exceed its > > maxwalltime, the queuing system would kill the worker. The resulting error > > message was not always very clear. Since version 0.94 coaster workers enforce > > the user-specified maxwalltime. If an application exceeds its maxwalltime, > > the coaster worker will not allow it to continue, but terminate it and report > > the error. > > > > > > - Swift will now use camel case for functions, for example, @toInt instead of > > @toint. The previous naming convention will still work, but you may see > > deprecated warnings. > > > > > > - Associative arrays have been added. More details and examples can be found at > > http://www.ci.uchicago.edu/swift/guides/release-0.94/userguide/userguide.html#_associative_arrays > > > > > > - Dynamic profiles. Many settings formerly only definable in sites.xml can now > > be set on a per-app basis. This can make things easier when running > > multiple apps that have different requirements for settings like processors > > per node and wallitme. > > http://www.ci.uchicago.edu/swift/guides/release-0.94/userguide/userguide.html#_dynamic_profiles > > > > > > - Added a new ssh command line provider. Previously ssh support was done by > > creating a file called ~/.ssh/auth.defaults. The ssh command line provider > > is more flexible and doesn't require this step. ssh-cl allows you to use > > SSH agents. You can use ssh-cl by adding something like this to your sites.xml: > > > > > > > > > > > > - Many fixes and improvements to improve the reliability and performance of > > coaster provider staging. > > > > > > - Added support for the Slurm scheduler > > > > > > - Added support for the LSF scheduler > > > > > > - Improvements to condor provider (non-shared jobtype and more flexibility > > to define what gets added to the submit script). > > > > > > - Fixes for the textual user interface (TUI). Adding the -tui option to the swift > > command line allows you to monitor progress in a curses based menu. A brief > > example of this can be found at http://www.ci.uchicago.edu/~davidk/modis.ogv. > > > > > > - Added the ability to call Java methods within swift using @java. For example: > > float f = @java("java.lang.Math", "sin", 0.5); > > http://www.ci.uchicago.edu/swift/guides/release-0.94/userguide/userguide.html#_java > > > > > > - Added a hang checker that provides the user with more information about > > potential hangs > > > > > > - @strjoin function for joining strings. > > http://www.ci.uchicago.edu/swift/guides/release-0.94/userguide/userguide.html#_strjoin > > > > > > - If you have a requirement that a command get run on the worker node before > > processing any work, worker.pl will now execute commands stored in the > > environment variable $WORKER_INIT_CMD > > > > > > - Use $SWIFT_USERHOME to determine where to swift should create some of its > > required files. This defaults to $HOME, but this may cause problems in some > > situations where $HOME is not accessable on worker nodes. > > > > > > - Experimental "wrapper staging" feature that delegates file staging to an > > external wrapper script. > > > > > > - Various improvements to the way that Swift runs MPI jobs. > > > > > > - Better OSG integration/support using GlideinWMS. > > > > > > ----- Original Message ----- > > > > > > From: "Michael Wilde" > > To: "David Kelly" > > Cc: "Swift Devel" > > Sent: Thursday, February 14, 2013 1:08:56 PM > > Subject: Fwd: 0.94 release note draft > > > > > > Some notes toward an 0.94 release notes document. > > > > There's a longer list, I think in an IM chat transcript, that we need to incorporate. > > > > Please send additional items to this thread for David to integrate. > > > > Thanks, > > > > - Mike > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidk at ci.uchicago.edu Fri Mar 1 16:22:24 2013 From: davidk at ci.uchicago.edu (David Kelly) Date: Fri, 1 Mar 2013 16:22:24 -0600 (CST) Subject: [Swift-devel] 0.94 release note draft In-Reply-To: <5C0CBEC1-DF0B-439D-9E69-7D81BDAD5091@uchicago.edu> Message-ID: <2046830530.2678617.1362176544135.JavaMail.root@ci.uchicago.edu> Lorenzo, That seems reasonable to me. I haven't used the faster branch myself much at this point, so it's probably best to continue talking with Mihael and Mike to come up with a good strategy for how to approach those issues. But in terms of the swift versions/modules, I have put together the three latest builds for you at: /home/davidk/swift-0.94-03012013/ cog/modules/swift/dist/swift-svn /home/davidk/swift-faster-03012013/cog/modules/swift/dist/swift-svn /home/davidk/swift-trunk-03012013/cog/modules/swift/dist/swift-svn It's probably better to use these rather than ~davidk/swift-trunk, because I tend to use that for testing. Hope this helps. David ----- Original Message ----- > From: "Lorenzo Pesce" > To: "David Kelly" > Cc: "Mihael Hategan" , "Swift Devel" > > Sent: Friday, March 1, 2013 3:32:20 PM > Subject: Re: [Swift-devel] 0.94 release note draft > David, > Have mercy on me having lost track of the situation. Maybe tomorrow, > most definitely next week I will start to increase the size of some > of my swift runs till I hit the thousands of jobs. > Some will be very heavy in I/O and computations (and pipeline > complexity). > I have taken today off to recover and be ready. > What do you suggest should be my approach to this problem? > I am currently trying to develop pipelines with > SWIFT_HOME=/soft/swift/0.94-2012.1102 > or > #SWIFT_HOME=/home/davidk/swift-trunk/cog/modules/swift/dist/swift-svn > then move them to fast and see where it breaks. > Should i change the module I am using? I need to learn how to make > the installation work and get over my hatred for java... > All the pipelines I will be trying are expected eventually to reach > sustained runs of 10K+ jobs in parallel with workflows as deep as 20 > stages (including steps that increase the jobs by one or two orders > of magnitude with a total file load of a few tens of TBs) > Do you think that it is sensible? > We will then mover to other machines and hopefully test fusion and > other approaches. > Lorenzo > On Mar 1, 2013, at 3:13 PM, David Kelly wrote: > > I think it is there. It looks like your memory fixes were committed > > on 2-2, and the current 0.94 branch is based on a snapshot of trunk > > from 2-15. > > > ----- Original Message ----- > > > > From: "Mihael Hategan" < hategan at mcs.anl.gov > > > > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > Cc: "Michael Wilde" < wilde at mcs.anl.gov >, "Swift Devel" < > > > swift-devel at ci.uchicago.edu > > > > > > > Sent: Friday, March 1, 2013 12:36:03 PM > > > > > > Subject: Re: [Swift-devel] 0.94 release note draft > > > > > > So I don't think I committed the memory leak fixes to 0.94 and I > > > think > > > > > > they should be there. > > > > > > Mihael > > > > > > On Fri, 2013-03-01 at 07:06 -0600, David Kelly wrote: > > > > > > > Here is the list, based mostly on what I could find in the svn > > > > logs. > > > > > > > > > > > > > > > > > > > > > > > > > > > > - To behavior of iterate has changed from 0.93 to 0.94. If you > > > > have > > > > scripts > > > > > > > that use iterate, please read > > > > > > > http://www.ci.uchicago.edu/swift/guides/release-0.94/userguide/userguide.html#_iterate > > > > > > > > > > > > > > > > > > > > > - Walltimes are more strictly enforced by coasters. Previous to > > > > Swift version > > > > > > > 0.94, if an application run with coasters would exceed its > > > > specified > > > > > > > maxwalltime, it would be allowed to continue to execute. > > > > However, > > > > if this > > > > > > > would cause the worker on which the application was running to > > > > exceed its > > > > > > > maxwalltime, the queuing system would kill the worker. The > > > > resulting error > > > > > > > message was not always very clear. Since version 0.94 coaster > > > > workers enforce > > > > > > > the user-specified maxwalltime. If an application exceeds its > > > > maxwalltime, > > > > > > > the coaster worker will not allow it to continue, but terminate > > > > it > > > > and report > > > > > > > the error. > > > > > > > > > > > > > > > > > > > > > - Swift will now use camel case for functions, for example, > > > > @toInt > > > > instead of > > > > > > > @toint. The previous naming convention will still work, but you > > > > may > > > > see > > > > > > > deprecated warnings. > > > > > > > > > > > > > > > > > > > > > - Associative arrays have been added. More details and examples > > > > can > > > > be found at > > > > > > > http://www.ci.uchicago.edu/swift/guides/release-0.94/userguide/userguide.html#_associative_arrays > > > > > > > > > > > > > > > > > > > > > - Dynamic profiles. Many settings formerly only definable in > > > > sites.xml can now > > > > > > > be set on a per-app basis. This can make things easier when > > > > running > > > > > > > multiple apps that have different requirements for settings > > > > like > > > > processors > > > > > > > per node and wallitme. > > > > > > > http://www.ci.uchicago.edu/swift/guides/release-0.94/userguide/userguide.html#_dynamic_profiles > > > > > > > > > > > > > > > > > > > > > - Added a new ssh command line provider. Previously ssh support > > > > was > > > > done by > > > > > > > creating a file called ~/.ssh/auth.defaults. The ssh command > > > > line > > > > provider > > > > > > > is more flexible and doesn't require this step. ssh-cl allows > > > > you > > > > to use > > > > > > > SSH agents. You can use ssh-cl by adding something like this to > > > > your sites.xml: > > > > > > > > > > > > > > > > > > > > > > > > jobmanager="ssh-cl:pbs"/> > > > > > > > > > > > > > > > > > > > > > - Many fixes and improvements to improve the reliability and > > > > performance of > > > > > > > coaster provider staging. > > > > > > > > > > > > > > > > > > > > > - Added support for the Slurm scheduler > > > > > > > > > > > > > > > > > > > > > - Added support for the LSF scheduler > > > > > > > > > > > > > > > > > > > > > - Improvements to condor provider (non-shared jobtype and more > > > > flexibility > > > > > > > to define what gets added to the submit script). > > > > > > > > > > > > > > > > > > > > > - Fixes for the textual user interface (TUI). Adding the -tui > > > > option to the swift > > > > > > > command line allows you to monitor progress in a curses based > > > > menu. > > > > A brief > > > > > > > example of this can be found at > > > > http://www.ci.uchicago.edu/~davidk/modis.ogv . > > > > > > > > > > > > > > > > > > > > > - Added the ability to call Java methods within swift using > > > > @java. > > > > For example: > > > > > > > float f = @java("java.lang.Math", "sin", 0.5); > > > > > > > http://www.ci.uchicago.edu/swift/guides/release-0.94/userguide/userguide.html#_java > > > > > > > > > > > > > > > > > > > > > - Added a hang checker that provides the user with more > > > > information > > > > about > > > > > > > potential hangs > > > > > > > > > > > > > > > > > > > > > - @strjoin function for joining strings. > > > > > > > http://www.ci.uchicago.edu/swift/guides/release-0.94/userguide/userguide.html#_strjoin > > > > > > > > > > > > > > > > > > > > > - If you have a requirement that a command get run on the > > > > worker > > > > node before > > > > > > > processing any work, worker.pl will now execute commands stored > > > > in > > > > the > > > > > > > environment variable $WORKER_INIT_CMD > > > > > > > > > > > > > > > > > > > > > - Use $SWIFT_USERHOME to determine where to swift should create > > > > some of its > > > > > > > required files. This defaults to $HOME, but this may cause > > > > problems > > > > in some > > > > > > > situations where $HOME is not accessable on worker nodes. > > > > > > > > > > > > > > > > > > > > > - Experimental "wrapper staging" feature that delegates file > > > > staging to an > > > > > > > external wrapper script. > > > > > > > > > > > > > > > > > > > > > - Various improvements to the way that Swift runs MPI jobs. > > > > > > > > > > > > > > > > > > > > > - Better OSG integration/support using GlideinWMS. > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > > Cc: "Swift Devel" < swift-devel at ci.uchicago.edu > > > > > > > > Sent: Thursday, February 14, 2013 1:08:56 PM > > > > > > > Subject: Fwd: 0.94 release note draft > > > > > > > > > > > > > > > > > > > > > Some notes toward an 0.94 release notes document. > > > > > > > > > > > > > > There's a longer list, I think in an IM chat transcript, that > > > > we > > > > need to incorporate. > > > > > > > > > > > > > > Please send additional items to this thread for David to > > > > integrate. > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > - Mike > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lpesce at uchicago.edu Fri Mar 1 19:08:43 2013 From: lpesce at uchicago.edu (Lorenzo Pesce) Date: Fri, 1 Mar 2013 19:08:43 -0600 Subject: [Swift-devel] 0.94 release note draft In-Reply-To: <2046830530.2678617.1362176544135.JavaMail.root@ci.uchicago.edu> References: <2046830530.2678617.1362176544135.JavaMail.root@ci.uchicago.edu> Message-ID: <73C90765-6B8A-4BB3-9E98-D69507797716@uchicago.edu> Thanks a million. I will proceed with testing and try to report problems timely and clearly. Things are looking good. On Mar 1, 2013, at 4:22 PM, David Kelly wrote: > Lorenzo, > > That seems reasonable to me. I haven't used the faster branch myself much at this point, so it's probably best to continue talking with Mihael and Mike to come up with a good strategy for how to approach those issues. But in terms of the swift versions/modules, I have put together the three latest builds for you at: > > /home/davidk/swift-0.94-03012013/cog/modules/swift/dist/swift-svn > /home/davidk/swift-faster-03012013/cog/modules/swift/dist/swift-svn > /home/davidk/swift-trunk-03012013/cog/modules/swift/dist/swift-svn > > It's probably better to use these rather than ~davidk/swift-trunk, because I tend to use that for testing. Hope this helps. > > David > > From: "Lorenzo Pesce" > To: "David Kelly" > Cc: "Mihael Hategan" , "Swift Devel" > Sent: Friday, March 1, 2013 3:32:20 PM > Subject: Re: [Swift-devel] 0.94 release note draft > > David, > > Have mercy on me having lost track of the situation. Maybe tomorrow, most definitely next week I will start to increase the size of some of my swift runs till I hit the thousands of jobs. > Some will be very heavy in I/O and computations (and pipeline complexity). > > I have taken today off to recover and be ready. > > What do you suggest should be my approach to this problem? > I am currently trying to develop pipelines with > > SWIFT_HOME=/soft/swift/0.94-2012.1102 > > or > > #SWIFT_HOME=/home/davidk/swift-trunk/cog/modules/swift/dist/swift-svn > > then move them to fast and see where it breaks. > > Should i change the module I am using? I need to learn how to make the installation work and get over my hatred for java... > > All the pipelines I will be trying are expected eventually to reach sustained runs of 10K+ jobs in parallel with workflows as deep as 20 stages (including steps that increase the jobs by one or two orders of magnitude with a total file load of a few tens of TBs) > > Do you think that it is sensible? > > We will then mover to other machines and hopefully test fusion and other approaches. > > Lorenzo > > > On Mar 1, 2013, at 3:13 PM, David Kelly wrote: > > I think it is there. It looks like your memory fixes were committed on 2-2, and the current 0.94 branch is based on a snapshot of trunk from 2-15. > > From: "Mihael Hategan" > To: "David Kelly" > Cc: "Michael Wilde" , "Swift Devel" > Sent: Friday, March 1, 2013 12:36:03 PM > Subject: Re: [Swift-devel] 0.94 release note draft > > So I don't think I committed the memory leak fixes to 0.94 and I think > they should be there. > > Mihael > > On Fri, 2013-03-01 at 07:06 -0600, David Kelly wrote: > > Here is the list, based mostly on what I could find in the svn logs. > > > > > > > > - To behavior of iterate has changed from 0.93 to 0.94. If you have scripts > > that use iterate, please read > > http://www.ci.uchicago.edu/swift/guides/release-0.94/userguide/userguide.html#_iterate > > > > > > - Walltimes are more strictly enforced by coasters. Previous to Swift version > > 0.94, if an application run with coasters would exceed its specified > > maxwalltime, it would be allowed to continue to execute. However, if this > > would cause the worker on which the application was running to exceed its > > maxwalltime, the queuing system would kill the worker. The resulting error > > message was not always very clear. Since version 0.94 coaster workers enforce > > the user-specified maxwalltime. If an application exceeds its maxwalltime, > > the coaster worker will not allow it to continue, but terminate it and report > > the error. > > > > > > - Swift will now use camel case for functions, for example, @toInt instead of > > @toint. The previous naming convention will still work, but you may see > > deprecated warnings. > > > > > > - Associative arrays have been added. More details and examples can be found at > > http://www.ci.uchicago.edu/swift/guides/release-0.94/userguide/userguide.html#_associative_arrays > > > > > > - Dynamic profiles. Many settings formerly only definable in sites.xml can now > > be set on a per-app basis. This can make things easier when running > > multiple apps that have different requirements for settings like processors > > per node and wallitme. > > http://www.ci.uchicago.edu/swift/guides/release-0.94/userguide/userguide.html#_dynamic_profiles > > > > > > - Added a new ssh command line provider. Previously ssh support was done by > > creating a file called ~/.ssh/auth.defaults. The ssh command line provider > > is more flexible and doesn't require this step. ssh-cl allows you to use > > SSH agents. You can use ssh-cl by adding something like this to your sites.xml: > > > > > > > > > > > > - Many fixes and improvements to improve the reliability and performance of > > coaster provider staging. > > > > > > - Added support for the Slurm scheduler > > > > > > - Added support for the LSF scheduler > > > > > > - Improvements to condor provider (non-shared jobtype and more flexibility > > to define what gets added to the submit script). > > > > > > - Fixes for the textual user interface (TUI). Adding the -tui option to the swift > > command line allows you to monitor progress in a curses based menu. A brief > > example of this can be found at http://www.ci.uchicago.edu/~davidk/modis.ogv. > > > > > > - Added the ability to call Java methods within swift using @java. For example: > > float f = @java("java.lang.Math", "sin", 0.5); > > http://www.ci.uchicago.edu/swift/guides/release-0.94/userguide/userguide.html#_java > > > > > > - Added a hang checker that provides the user with more information about > > potential hangs > > > > > > - @strjoin function for joining strings. > > http://www.ci.uchicago.edu/swift/guides/release-0.94/userguide/userguide.html#_strjoin > > > > > > - If you have a requirement that a command get run on the worker node before > > processing any work, worker.pl will now execute commands stored in the > > environment variable $WORKER_INIT_CMD > > > > > > - Use $SWIFT_USERHOME to determine where to swift should create some of its > > required files. This defaults to $HOME, but this may cause problems in some > > situations where $HOME is not accessable on worker nodes. > > > > > > - Experimental "wrapper staging" feature that delegates file staging to an > > external wrapper script. > > > > > > - Various improvements to the way that Swift runs MPI jobs. > > > > > > - Better OSG integration/support using GlideinWMS. > > > > > > ----- Original Message ----- > > > > > > From: "Michael Wilde" > > To: "David Kelly" > > Cc: "Swift Devel" > > Sent: Thursday, February 14, 2013 1:08:56 PM > > Subject: Fwd: 0.94 release note draft > > > > > > Some notes toward an 0.94 release notes document. > > > > There's a longer list, I think in an IM chat transcript, that we need to incorporate. > > > > Please send additional items to this thread for David to integrate. > > > > Thanks, > > > > - Mike > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Sun Mar 3 01:11:15 2013 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 02 Mar 2013 23:11:15 -0800 Subject: [Swift-devel] Scripts with Iterate fails to compile In-Reply-To: <140868916.2476436.1362169705030.JavaMail.root@ci.uchicago.edu> References: <140868916.2476436.1362169705030.JavaMail.root@ci.uchicago.edu> Message-ID: <1362294675.4684.4.camel@echo> On Fri, 2013-03-01 at 14:28 -0600, David Kelly wrote: > Mihael, > > > It's probably worth discussing. Can you remind me, was variable > closing reference counting was the fix for bug #927, failure assigning > a file array from within an if statement? No. 927 was introduced with the reference counting code. > If so, I think that is important to have because without that ncar is > unable to run their scripts in 0.94. They're in the process of > migrating machines which will require the LSF provider, so 0.93 will > not be an option for them for very long. Well, I don't know. There is always the conflict between having stable code and having the latest stuff. Anyway, I fixed the issue in SVN (swift r6326). But I also noticed that when trunk was re-branched to 0.94, at least one fix (the one about swift warning that all user procedures are deprecated) got lost. So I think that we should be careful to merge the bug fixes that were there in the old 0.94 branch. Mihael From wilde at mcs.anl.gov Sun Mar 3 17:38:18 2013 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sun, 3 Mar 2013 17:38:18 -0600 (CST) Subject: [Swift-devel] Does 0.94 need more fixes? In-Reply-To: <20130303202614.CC5D19D015@svn.ci.uchicago.edu> Message-ID: <39485068.29733.1362353898957.JavaMail.root@mcs.anl.gov> Mihael, David, Yadu, Im a bit confused by recent email threads on bug fixes in 0.94: - the issue of incorrect "deprecated" messages from user camelCase functions -- how does this relate to fixes for same that David developed? - issues of array closing logic? - fixes for memory leakage / efficiency - other things? For all of these: are all the right fixes now in 0.94? And does 0.94 now need a complete new release certification test cycle? David, can you continue to work with Mihael to decide? Sounds like youre on top of this already. Lets do the right things wrt 0.94 even if it means more test cycles. Thanks, - Mike From wozniak at mcs.anl.gov Mon Mar 4 10:13:53 2013 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Mon, 04 Mar 2013 10:13:53 -0600 Subject: [Swift-devel] Swift Weekly code discussions Message-ID: <5134C841.50505@mcs.anl.gov> Hi all As discussed on swift-devel, we are going to start weekly code discussion meetings. We will do screen sharing and recording for everyone's benefit, regardless of location. I set up a Doodle poll for the time; link is below. We will set the time by the end of this week so we can start next week, that is, on/after March 11. Justin http://doodle.com/8ddu6zptb2uf7fgw -- Justin M Wozniak From hategan at mcs.anl.gov Tue Mar 5 01:27:26 2013 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 04 Mar 2013 23:27:26 -0800 Subject: [Swift-devel] stracing jobs Message-ID: <1362468446.31350.4.camel@echo> I added and option in the faster branch to strace wrapper invocations. It's enabled with -Djob.perf.trace.interval=n and setting worker logging level to WARN. If set, every nth job submitted by the coaster service to a worker will drop the logging level of that worker to DEBUG for the duration of the job and invoke the wrapper with strace sticking the output in the same dir as the worker logs. This is because doing it for every job would be an unacceptable (and not very useful) overhead, so this allows doing it sporadically so that the numbers are more meaningful. Mihael From yadudoc1729 at gmail.com Thu Mar 7 05:44:13 2013 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Thu, 7 Mar 2013 17:14:13 +0530 Subject: [Swift-devel] Fold tests fail. Message-ID: Hi, I am trying to do fold of results in swift, and I see some very weird results/errors. Here's my code : https://github.com/yadudoc/swift-basics/blob/master/stress/patterns/x_treefold.swift I've also attached it, in case that's easier. On 0.93, Here's a clip of the output showing that integer equality is failing Start, Mid : (0, 1) Mid+1, End : (2, 2) Equal start != mid 0 : 1 Equal mid+1 == end Equal start != mid 3 : 3 <--- This should'nt happen ? Equal start != mid 4 : 4 <--- This should'nt happen ? Start, Mid : (3, 3) Mid+1, End : (4, 4) Equal start != mid 3 : 3 <--- This should'nt happen ? On 0.94, I get an runtime error : Execution failed: Equal start != mid 0 : 2 Equal start != mid 3 : 4 Internal type error. Expected a Integer. Got null On Swift-faster, The code does not get past compilation as a circular dependency is detected. Do we not support recursion on swift-faster branch ? org.griphyn.vdl.karajan.CompilationException: Circular procedure dependency detected at org.griphyn.vdl.engine.Karajan.visit(Karajan.java:308) at org.griphyn.vdl.engine.Karajan.visit(Karajan.java:320) at org.griphyn.vdl.engine.Karajan.processProcedures(Karajan.java:296) at org.griphyn.vdl.engine.Karajan.program(Karajan.java:358) at org.griphyn.vdl.engine.Karajan.compile(Karajan.java:138) at org.griphyn.vdl.karajan.Loader.compile(Loader.java:340) at org.griphyn.vdl.karajan.Loader.main(Loader.java:165) Could not start execution: org.griphyn.vdl.karajan.CompilationException: Circular procedure dependency detected I would really appreciate help in understanding what is going on here. -- Thanks and Regards, Yadu Nand B -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: x_treefold.swift Type: application/octet-stream Size: 1357 bytes Desc: not available URL: From hategan at mcs.anl.gov Fri Mar 8 13:02:38 2013 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 08 Mar 2013 11:02:38 -0800 Subject: [Swift-devel] Fold tests fail. In-Reply-To: References: Message-ID: <1362769358.24064.2.camel@echo> I fixed the run-time errors in 0.94 and faster as well as the compilation errors in faster. There was an issue in 0.94 (and faster) related to the reference count code that caused variables in else branches that are assigned from a procedure call to be closed before they should have been (leading to them having a value of null). Mihael On Thu, 2013-03-07 at 17:14 +0530, Yadu Nand wrote: > Hi, > > I am trying to do fold of results in swift, and I see some very weird > results/errors. > > Here's my code : > https://github.com/yadudoc/swift-basics/blob/master/stress/patterns/x_treefold.swift > I've also attached it, in case that's easier. > > On 0.93, Here's a clip of the output showing that integer equality is > failing > Start, Mid : (0, 1) Mid+1, End : (2, 2) > Equal start != mid 0 : 1 > Equal mid+1 == end > Equal start != mid 3 : 3 <--- This should'nt happen ? > Equal start != mid 4 : 4 <--- This should'nt happen ? > Start, Mid : (3, 3) Mid+1, End : (4, 4) > Equal start != mid 3 : 3 <--- This should'nt happen ? > > On 0.94, I get an runtime error : > Execution failed: > Equal start != mid 0 : 2 > Equal start != mid 3 : 4 > Internal type error. Expected a Integer. Got null > > On Swift-faster, The code does not get past compilation as a circular > dependency > is detected. Do we not support recursion on swift-faster branch ? > > org.griphyn.vdl.karajan.CompilationException: Circular procedure dependency > detected > at org.griphyn.vdl.engine.Karajan.visit(Karajan.java:308) > at org.griphyn.vdl.engine.Karajan.visit(Karajan.java:320) > at org.griphyn.vdl.engine.Karajan.processProcedures(Karajan.java:296) > at org.griphyn.vdl.engine.Karajan.program(Karajan.java:358) > at org.griphyn.vdl.engine.Karajan.compile(Karajan.java:138) > at org.griphyn.vdl.karajan.Loader.compile(Loader.java:340) > at org.griphyn.vdl.karajan.Loader.main(Loader.java:165) > Could not start execution: > org.griphyn.vdl.karajan.CompilationException: Circular procedure dependency > detected > > I would really appreciate help in understanding what is going on here. > > Hi, > > > I am trying to do fold of results in swift, and I see some very weird > results/errors. > > > Here's my > code : https://github.com/yadudoc/swift-basics/blob/master/stress/patterns/x_treefold.swift > I've also attached it, in case that's easier. > > > On 0.93, Here's a clip of the output showing that integer equality is > failing > Start, Mid : (0, 1) Mid+1, End : (2, 2) > Equal start != mid 0 : 1 > Equal mid+1 == end > Equal start != mid 3 : 3 <--- This should'nt happen ? > Equal start != mid 4 : 4 <--- This should'nt happen ? > Start, Mid : (3, 3) Mid+1, End : (4, 4) > Equal start != mid 3 : 3 <--- This should'nt happen ? > > > On 0.94, I get an runtime error : > Execution failed: > Equal start != mid 0 : 2 > Equal start != mid 3 : 4 > Internal type error. Expected a Integer. Got null > > > On Swift-faster, The code does not get past compilation as a circular > dependency > is detected. Do we not support recursion on swift-faster branch ? > > > org.griphyn.vdl.karajan.CompilationException: Circular procedure > dependency detected > at org.griphyn.vdl.engine.Karajan.visit(Karajan.java:308) > at org.griphyn.vdl.engine.Karajan.visit(Karajan.java:320) > at org.griphyn.vdl.engine.Karajan.processProcedures(Karajan.java:296) > at org.griphyn.vdl.engine.Karajan.program(Karajan.java:358) > at org.griphyn.vdl.engine.Karajan.compile(Karajan.java:138) > at org.griphyn.vdl.karajan.Loader.compile(Loader.java:340) > at org.griphyn.vdl.karajan.Loader.main(Loader.java:165) > Could not start execution: > org.griphyn.vdl.karajan.CompilationException: Circular procedure > dependency detected > > > I would really appreciate help in understanding what is going on here. > > > -- > Thanks and Regards, > Yadu Nand B > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel From wilde at mcs.anl.gov Sat Mar 9 15:46:36 2013 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 9 Mar 2013 15:46:36 -0600 (CST) Subject: [Swift-devel] Cant get auto-coasters to run from midway to beagle In-Reply-To: <1370387327.1230714.1362865548698.JavaMail.root@mcs.anl.gov> Message-ID: <144448535.1230716.1362865596648.JavaMail.root@mcs.anl.gov> Mihael, can you advise on this problem? David and I are trying to run automatic coaster jobs from midway login hosts and swift.rcc to beagle using ssh-cl:pbs. My failed attempts are on midway under /home/wilde/osgdemo/modis/svn, see eg run020 (which has complete logs). Quick question about the proxy files that get copied. Does this look OK? : 2013-03-09 21:24:46,895+0000 INFO AutoCA Checking certificate /home/wilde/.globus/coasters/proxy.0.pem 2013-03-09 21:24:46,967+0000 INFO AutoCA Using certificate /home/wilde/.globus/coasters/proxy.0.pem with expiration date Sat Mar 23\ 19:25:53 GMT 2013 The proxy expiration time listed above is two hours *earlier* than the current time (as seen in the message's UTC timestamp). Is that correct, or a possible cause of this problem? The main symptom seems to be this: Execution failed: Exception in getlanduse: Arguments: [../data/modis/2002/h00v09.rgb] Host: beagle Directory: modis01-20130309-2124-7ua3bde3/jobs/d/getlanduse-d24rhd6l Caused by: Could not submit job Caused by: Could not start coaster service Caused by: Task ended before registration was received. Failed to download bootstrap jar from http://midway001.rcc.uchicago.edu:50001 --- Yet Ive verified that midway login4 (which is the target system) can connect to this hostname and port (with nc -l and telnet) - Mike -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Sat Mar 9 15:56:36 2013 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 09 Mar 2013 13:56:36 -0800 Subject: [Swift-devel] Cant get auto-coasters to run from midway to beagle In-Reply-To: <144448535.1230716.1362865596648.JavaMail.root@mcs.anl.gov> References: <144448535.1230716.1362865596648.JavaMail.root@mcs.anl.gov> Message-ID: <1362866196.24889.0.camel@echo> Can you post ,globus/coasters/coaster.log from beagle? On Sat, 2013-03-09 at 15:46 -0600, Michael Wilde wrote: > Mihael, can you advise on this problem? > > David and I are trying to run automatic coaster jobs from midway login hosts and swift.rcc to beagle using ssh-cl:pbs. > > My failed attempts are on midway under /home/wilde/osgdemo/modis/svn, see eg run020 (which has complete logs). > > Quick question about the proxy files that get copied. Does this look OK? : > > 2013-03-09 21:24:46,895+0000 INFO AutoCA Checking certificate /home/wilde/.globus/coasters/proxy.0.pem > 2013-03-09 21:24:46,967+0000 INFO AutoCA Using certificate /home/wilde/.globus/coasters/proxy.0.pem with expiration date Sat Mar 23\ > 19:25:53 GMT 2013 > > The proxy expiration time listed above is two hours *earlier* than the current time (as seen in the message's UTC timestamp). Is that correct, or a possible cause of this problem? > > The main symptom seems to be this: > > Execution failed: > Exception in getlanduse: > Arguments: [../data/modis/2002/h00v09.rgb] > Host: beagle > Directory: modis01-20130309-2124-7ua3bde3/jobs/d/getlanduse-d24rhd6l > > Caused by: > Could not submit job > Caused by: > Could not start coaster service > Caused by: > Task ended before registration was received. > Failed to download bootstrap jar from http://midway001.rcc.uchicago.edu:50001 > --- > > Yet Ive verified that midway login4 (which is the target system) can connect to this hostname and port (with nc -l and telnet) > > - Mike > > From wilde at mcs.anl.gov Sat Mar 9 15:59:22 2013 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 9 Mar 2013 15:59:22 -0600 (CST) Subject: [Swift-devel] Cant get auto-coasters to run from midway to beagle In-Reply-To: <1362866196.24889.0.camel@echo> Message-ID: <1195019919.1230849.1362866362717.JavaMail.root@mcs.anl.gov> I think we just got this working. Problems may have included the need to pre-create the workdirectory and to specify a dotted IP address on the external network for GLOBUS_HOSTNAME. Will need to experiment. So likely that proxy expiration time was not a problem (although its confusing). Will report back on this once the needed steps are clear. Thanks, - Mike ----- Original Message ----- > From: "Mihael Hategan" > To: "Michael Wilde" > Cc: "Swift Devel" > Sent: Saturday, March 9, 2013 3:56:36 PM > Subject: Re: Cant get auto-coasters to run from midway to beagle > > Can you post ,globus/coasters/coaster.log from beagle? > > On Sat, 2013-03-09 at 15:46 -0600, Michael Wilde wrote: > > Mihael, can you advise on this problem? > > > > David and I are trying to run automatic coaster jobs from midway > > login hosts and swift.rcc to beagle using ssh-cl:pbs. > > > > My failed attempts are on midway under > > /home/wilde/osgdemo/modis/svn, see eg run020 (which has complete > > logs). > > > > Quick question about the proxy files that get copied. Does this > > look OK? : > > > > 2013-03-09 21:24:46,895+0000 INFO AutoCA Checking certificate > > /home/wilde/.globus/coasters/proxy.0.pem > > 2013-03-09 21:24:46,967+0000 INFO AutoCA Using certificate > > /home/wilde/.globus/coasters/proxy.0.pem with expiration date Sat > > Mar 23\ > > 19:25:53 GMT 2013 > > > > The proxy expiration time listed above is two hours *earlier* than > > the current time (as seen in the message's UTC timestamp). Is > > that correct, or a possible cause of this problem? > > > > The main symptom seems to be this: > > > > Execution failed: > > Exception in getlanduse: > > Arguments: [../data/modis/2002/h00v09.rgb] > > Host: beagle > > Directory: > > modis01-20130309-2124-7ua3bde3/jobs/d/getlanduse-d24rhd6l > > > > Caused by: > > Could not submit job > > Caused by: > > Could not start coaster service > > Caused by: > > Task ended before registration was received. > > Failed to download bootstrap jar from > > http://midway001.rcc.uchicago.edu:50001 > > --- > > > > Yet Ive verified that midway login4 (which is the target system) > > can connect to this hostname and port (with nc -l and telnet) > > > > - Mike > > > > > > > From wilde at mcs.anl.gov Sat Mar 9 16:11:24 2013 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 9 Mar 2013 16:11:24 -0600 (CST) Subject: [Swift-devel] Cant get auto-coasters to run from midway to beagle In-Reply-To: <1195019919.1230849.1362866362717.JavaMail.root@mcs.anl.gov> Message-ID: <1806095379.1231142.1362867084821.JavaMail.root@mcs.anl.gov> Now Im getting the error below (from running 317 simple MODIS apps concurrently). Im going to dial down the throttle first to see if the staging load is overwhelming either coasters or the midway-beagle path. - Mike ----- Original Message ----- > From: "Michael Wilde" > To: "Mihael Hategan" > Cc: "Swift Devel" > Sent: Saturday, March 9, 2013 3:59:22 PM > Subject: Re: [Swift-devel] Cant get auto-coasters to run from midway to beagle > > I think we just got this working. Problems may have included the need > to pre-create the workdirectory and to specify a dotted IP address > on the external network for GLOBUS_HOSTNAME. Will need to > experiment. So likely that proxy expiration time was not a problem > (although its confusing). > > Will report back on this once the needed steps are clear. > > Thanks, > > - Mike > > ----- Original Message ----- > > From: "Mihael Hategan" > > To: "Michael Wilde" > > Cc: "Swift Devel" > > Sent: Saturday, March 9, 2013 3:56:36 PM > > Subject: Re: Cant get auto-coasters to run from midway to beagle > > > > Can you post ,globus/coasters/coaster.log from beagle? > > > > On Sat, 2013-03-09 at 15:46 -0600, Michael Wilde wrote: > > > Mihael, can you advise on this problem? > > > > > > David and I are trying to run automatic coaster jobs from midway > > > login hosts and swift.rcc to beagle using ssh-cl:pbs. > > > > > > My failed attempts are on midway under > > > /home/wilde/osgdemo/modis/svn, see eg run020 (which has complete > > > logs). > > > > > > Quick question about the proxy files that get copied. Does this > > > look OK? : > > > > > > 2013-03-09 21:24:46,895+0000 INFO AutoCA Checking certificate > > > /home/wilde/.globus/coasters/proxy.0.pem > > > 2013-03-09 21:24:46,967+0000 INFO AutoCA Using certificate > > > /home/wilde/.globus/coasters/proxy.0.pem with expiration date Sat > > > Mar 23\ > > > 19:25:53 GMT 2013 > > > > > > The proxy expiration time listed above is two hours *earlier* > > > than > > > the current time (as seen in the message's UTC timestamp). Is > > > that correct, or a possible cause of this problem? > > > > > > The main symptom seems to be this: > > > > > > Execution failed: > > > Exception in getlanduse: > > > Arguments: [../data/modis/2002/h00v09.rgb] > > > Host: beagle > > > Directory: > > > modis01-20130309-2124-7ua3bde3/jobs/d/getlanduse-d24rhd6l > > > > > > Caused by: > > > Could not submit job > > > Caused by: > > > Could not start coaster service > > > Caused by: > > > Task ended before registration was received. > > > Failed to download bootstrap jar from > > > http://midway001.rcc.uchicago.edu:50001 > > > --- > > > > > > Yet Ive verified that midway login4 (which is the target system) > > > can connect to this hostname and port (with nc -l and telnet) > > > > > > - Mike > > > > > > > > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > From wilde at mcs.anl.gov Sat Mar 9 16:24:17 2013 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 9 Mar 2013 16:24:17 -0600 (CST) Subject: [Swift-devel] Cant get auto-coasters to run from midway to beagle In-Reply-To: <1806095379.1231142.1362867084821.JavaMail.root@mcs.anl.gov> Message-ID: <1231351823.1231245.1362867857147.JavaMail.root@mcs.anl.gov> I forgot to paste the error, sorry. Its below now, fer real. When I dial down the throttle to 48 and only start 2 beagle nodes, I get further and the app calls make it to active state. The 317 files being staged in here are 17MB each. The swift progress output and error are below: RunID: 20130309-2204-qu9ck076 Progress: time: Sat, 09 Mar 2013 22:04:34 +0000 Progress: time: Sat, 09 Mar 2013 22:04:45 +0000 Submitting:316 Submitted:1 Progress: time: Sat, 09 Mar 2013 22:04:51 +0000 Stage in:1 Submitted:316 Progress: time: Sat, 09 Mar 2013 22:04:52 +0000 Stage in:25 Submitted:292 Progress: time: Sat, 09 Mar 2013 22:04:53 +0000 Stage in:68 Submitted:249 Progress: time: Sat, 09 Mar 2013 22:04:55 +0000 Stage in:113 Submitted:204 Progress: time: Sat, 09 Mar 2013 22:04:56 +0000 Stage in:165 Submitted:152 Progress: time: Sat, 09 Mar 2013 22:04:58 +0000 Stage in:177 Submitted:140 Progress: time: Sat, 09 Mar 2013 22:05:00 +0000 Stage in:225 Submitted:92 Progress: time: Sat, 09 Mar 2013 22:05:04 +0000 Stage in:241 Submitted:76 Progress: time: Sat, 09 Mar 2013 22:05:05 +0000 Stage in:289 Submitted:28 Progress: time: Sat, 09 Mar 2013 22:05:09 +0000 Stage in:305 Submitted:12 Progress: time: Sat, 09 Mar 2013 22:05:34 +0000 Stage in:317 Progress: time: Sat, 09 Mar 2013 22:06:04 +0000 Stage in:317 Progress: time: Sat, 09 Mar 2013 22:06:34 +0000 Stage in:317 Progress: time: Sat, 09 Mar 2013 22:07:04 +0000 Stage in:317 Progress: time: Sat, 09 Mar 2013 22:07:34 +0000 Stage in:317 Progress: time: Sat, 09 Mar 2013 22:08:04 +0000 Stage in:317 Progress: time: Sat, 09 Mar 2013 22:08:34 +0000 Stage in:317 Channels: {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60640] -> BufferingChannel, null at id://u-23c37c02-13d512f435d--7fff-u66598f98-13d512f434d--7fffC=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], null at id://u66598f98-13d512f434d--8000-u-23c37c02-13d512f435d--8000S=MetaChannel[service-60640] -> BufferingChannel} Context: service-60822 Meta context: service-60640 Channels: {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60640] -> BufferingChannel, null at id://u-23c37c02-13d512f435d--7fff-u66598f98-13d512f434d--7fffC=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], null at id://u66598f98-13d512f434d--8000-u-23c37c02-13d512f435d--8000S=MetaChannel[service-60640] -> BufferingChannel} Context: service-60116 Meta context: service-60640 Channels: {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60640] -> BufferingChannel, null at id://u-23c37c02-13d512f435d--7fff-u66598f98-13d512f434d--7fffC=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], null at id://u66598f98-13d512f434d--8000-u-23c37c02-13d512f435d--8000S=MetaChannel[service-60640] -> BufferingChannel} Context: service-60598 Meta context: service-60640 Progress: time: Sat, 09 Mar 2013 22:09:04 +0000 Stage in:317 Progress: time: Sat, 09 Mar 2013 22:09:08 +0000 Stage in:316 Active:1 Execution failed: Exception in getlanduse: Arguments: [home/wilde/osgdemo/modis/svn/data/modis/2002/h15v02.rgb] Host: beagle Directory: modis02-20130309-2204-qu9ck076/jobs/b/getlanduse-bmscjd6l Caused by: Shutting down worker getLandUse, modis02.swift, line 20 error null real 4m36.777s user 2m55.240s sys 0m3.837s --- With a throttle of 48 (.47) and 2 beagle nodes, I see: Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified locally) RunID: 20130309-2214-1oi3rvea Progress: time: Sat, 09 Mar 2013 22:14:06 +0000 Progress: time: Sat, 09 Mar 2013 22:14:17 +0000 Selecting site:269 Submitting:47 Submitted:1 Progress: time: Sat, 09 Mar 2013 22:14:22 +0000 Selecting site:269 Stage in:1 Submitted:47 Progress: time: Sat, 09 Mar 2013 22:14:28 +0000 Selecting site:269 Stage in:25 Submitted:23 Progress: time: Sat, 09 Mar 2013 22:14:36 +0000 Selecting site:269 Stage in:48 Progress: time: Sat, 09 Mar 2013 22:15:06 +0000 Selecting site:269 Stage in:48 Progress: time: Sat, 09 Mar 2013 22:15:36 +0000 Selecting site:269 Stage in:48 Progress: time: Sat, 09 Mar 2013 22:16:06 +0000 Selecting site:269 Stage in:48 Progress: time: Sat, 09 Mar 2013 22:16:26 +0000 Selecting site:269 Stage in:47 Active:1 Progress: time: Sat, 09 Mar 2013 22:16:27 +0000 Selecting site:269 Stage in:36 Active:12 Progress: time: Sat, 09 Mar 2013 22:16:29 +0000 Selecting site:269 Stage in:24 Active:24 Progress: time: Sat, 09 Mar 2013 22:16:34 +0000 Selecting site:269 Stage in:24 Active:23 Stage out:1 Progress: time: Sat, 09 Mar 2013 22:16:35 +0000 Selecting site:269 Stage in:14 Active:33 Stage out:1 Execution failed: Exception in getlanduse: Arguments: [home/wilde/osgdemo/modis/svn/data/modis/2002/h08v04.rgb] Host: beagle Directory: modis02-20130309-2214-1oi3rvea/jobs/k/getlanduse-ko5qjd6l Caused by: Application /lustre/beagle/davidk/modis/bin/getlanduse.sh failed with an exit code of 1 getLandUse, modis02.swift, line 20 real 2m31.463s user 1m33.238s sys 0m2.160s + mv /home/wilde/.swift/runs/current/run024.1362867244 /home/wilde/.swift/runs/completed This error is likely in the demo app code; just pasting here to show that with less concurrency it makes progress. ----- Original Message ----- > From: "Michael Wilde" > To: "Mihael Hategan" > Cc: "Swift Devel" > Sent: Saturday, March 9, 2013 4:11:24 PM > Subject: Re: [Swift-devel] Cant get auto-coasters to run from midway to beagle > > Now Im getting the error below (from running 317 simple MODIS apps > concurrently). Im going to dial down the throttle first to see if > the staging load is overwhelming either coasters or the > midway-beagle path. > > - Mike > > > ----- Original Message ----- > > From: "Michael Wilde" > > To: "Mihael Hategan" > > Cc: "Swift Devel" > > Sent: Saturday, March 9, 2013 3:59:22 PM > > Subject: Re: [Swift-devel] Cant get auto-coasters to run from > > midway to beagle > > > > I think we just got this working. Problems may have included the > > need > > to pre-create the workdirectory and to specify a dotted IP address > > on the external network for GLOBUS_HOSTNAME. Will need to > > experiment. So likely that proxy expiration time was not a problem > > (although its confusing). > > > > Will report back on this once the needed steps are clear. > > > > Thanks, > > > > - Mike > > > > ----- Original Message ----- > > > From: "Mihael Hategan" > > > To: "Michael Wilde" > > > Cc: "Swift Devel" > > > Sent: Saturday, March 9, 2013 3:56:36 PM > > > Subject: Re: Cant get auto-coasters to run from midway to beagle > > > > > > Can you post ,globus/coasters/coaster.log from beagle? > > > > > > On Sat, 2013-03-09 at 15:46 -0600, Michael Wilde wrote: > > > > Mihael, can you advise on this problem? > > > > > > > > David and I are trying to run automatic coaster jobs from > > > > midway > > > > login hosts and swift.rcc to beagle using ssh-cl:pbs. > > > > > > > > My failed attempts are on midway under > > > > /home/wilde/osgdemo/modis/svn, see eg run020 (which has > > > > complete > > > > logs). > > > > > > > > Quick question about the proxy files that get copied. Does this > > > > look OK? : > > > > > > > > 2013-03-09 21:24:46,895+0000 INFO AutoCA Checking > > > > certificate > > > > /home/wilde/.globus/coasters/proxy.0.pem > > > > 2013-03-09 21:24:46,967+0000 INFO AutoCA Using certificate > > > > /home/wilde/.globus/coasters/proxy.0.pem with expiration date > > > > Sat > > > > Mar 23\ > > > > 19:25:53 GMT 2013 > > > > > > > > The proxy expiration time listed above is two hours *earlier* > > > > than > > > > the current time (as seen in the message's UTC timestamp). Is > > > > that correct, or a possible cause of this problem? > > > > > > > > The main symptom seems to be this: > > > > > > > > Execution failed: > > > > Exception in getlanduse: > > > > Arguments: [../data/modis/2002/h00v09.rgb] > > > > Host: beagle > > > > Directory: > > > > modis01-20130309-2124-7ua3bde3/jobs/d/getlanduse-d24rhd6l > > > > > > > > Caused by: > > > > Could not submit job > > > > Caused by: > > > > Could not start coaster service > > > > Caused by: > > > > Task ended before registration was received. > > > > Failed to download bootstrap jar from > > > > http://midway001.rcc.uchicago.edu:50001 > > > > --- > > > > > > > > Yet Ive verified that midway login4 (which is the target > > > > system) > > > > can connect to this hostname and port (with nc -l and telnet) > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > From wilde at mcs.anl.gov Sat Mar 9 17:05:25 2013 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 9 Mar 2013 17:05:25 -0600 (CST) Subject: [Swift-devel] Cant get auto-coasters to run from midway to beagle In-Reply-To: <1231351823.1231245.1362867857147.JavaMail.root@mcs.anl.gov> Message-ID: <424118075.1231990.1362870325913.JavaMail.root@mcs.anl.gov> Mihael, now I think I have a coaster problem. Curiously, it always seems to happen at about 5 mins into the run. Logs for these runs are on midway in eg /home/wilde/osgdemo/modis/svn/run027 leading portion of error from stdout/err is below. - Mike Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified locally) RunID: 20130309-2252-x37dmuy0 Progress: time: Sat, 09 Mar 2013 22:52:24 +0000 Progress: time: Sat, 09 Mar 2013 22:52:34 +0000 Selecting site:269 Submitting:47 Submitted:1 Progress: time: Sat, 09 Mar 2013 22:52:42 +0000 Selecting site:269 Stage in:1 Submitted:47 Progress: time: Sat, 09 Mar 2013 22:52:54 +0000 Selecting site:269 Stage in:48 Progress: time: Sat, 09 Mar 2013 22:53:24 +0000 Selecting site:269 Stage in:48 Progress: time: Sat, 09 Mar 2013 22:53:54 +0000 Selecting site:269 Stage in:48 Progress: time: Sat, 09 Mar 2013 22:54:24 +0000 Selecting site:269 Stage in:48 Progress: time: Sat, 09 Mar 2013 22:54:51 +0000 Selecting site:269 Stage in:47 Active:1 Progress: time: Sat, 09 Mar 2013 22:54:52 +0000 Selecting site:269 Stage in:42 Active:6 Progress: time: Sat, 09 Mar 2013 22:54:54 +0000 Selecting site:269 Stage in:24 Active:24 Progress: time: Sat, 09 Mar 2013 22:54:57 +0000 Selecting site:269 Active:47 Stage out:1 Progress: time: Sat, 09 Mar 2013 22:54:58 +0000 Selecting site:266 Stage in:2 Submitted:1 Active:44 Stage out:1 Finished successfully:3 Progress: time: Sat, 09 Mar 2013 22:55:00 +0000 Selecting site:261 Stage in:5 Submitting:2 Submitted:1 Active:37 Stage out:3 Finished successfully:8 Progress: time: Sat, 09 Mar 2013 22:55:01 +0000 Selecting site:254 Stage in:12 Submitting:3 Active:24 Stage out:8 Finished successfully:16 Progress: time: Sat, 09 Mar 2013 22:55:02 +0000 Selecting site:241 Stage in:23 Submitting:5 Active:15 Stage out:4 Finished successfully:29 Progress: time: Sat, 09 Mar 2013 22:55:03 +0000 Selecting site:234 Stage in:28 Submitting:7 Stage out:12 Finished successfully:36 Progress: time: Sat, 09 Mar 2013 22:55:04 +0000 Selecting site:221 Stage in:35 Submitting:12 Submitted:1 Finished successfully:48 Progress: time: Sat, 09 Mar 2013 22:55:24 +0000 Selecting site:221 Stage in:48 Finished successfully:48 Progress: time: Sat, 09 Mar 2013 22:55:54 +0000 Selecting site:221 Stage in:48 Finished successfully:48 Progress: time: Sat, 09 Mar 2013 22:56:08 +0000 Selecting site:221 Stage in:47 Active:1 Finished successfully:48 Progress: time: Sat, 09 Mar 2013 22:56:14 +0000 Selecting site:221 Stage in:47 Stage out:1 Finished successfully:48 Progress: time: Sat, 09 Mar 2013 22:56:16 +0000 Selecting site:221 Stage in:47 Finished successfully:49 Progress: time: Sat, 09 Mar 2013 22:56:19 +0000 Selecting site:220 Stage in:47 Submitted:1 Finished successfully:49 Progress: time: Sat, 09 Mar 2013 22:56:24 +0000 Selecting site:220 Stage in:48 Finished successfully:49 Progress: time: Sat, 09 Mar 2013 22:56:29 +0000 Selecting site:220 Stage in:47 Active:1 Finished successfully:49 Progress: time: Sat, 09 Mar 2013 22:56:35 +0000 Selecting site:220 Stage in:47 Stage out:1 Finished successfully:49 Channels: {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60121] -> BufferingChannel, null at id://u-7c5b9ab4-13d515b0ca0--8000-u2ecf5a73-13d515b0cac--8000S=MetaChannel[service-60121] -> BufferingChannel, null at id://u2ecf5a73-13d515b0cac--7fff-u-7c5b9ab4-13d515b0ca0--7fffC=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]} Context: service-60231 Meta context: service-60121 Progress: time: Sat, 09 Mar 2013 22:56:37 +0000 Selecting site:220 Stage in:47 Finished successfully:50 Progress: time: Sat, 09 Mar 2013 22:56:40 +0000 Selecting site:219 Stage in:47 Submitted:1 Finished successfully:50 Channels: {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60121] -> BufferingChannel, null at id://u-7c5b9ab4-13d515b0ca0--8000-u2ecf5a73-13d515b0cac--8000S=MetaChannel[service-60121] -> BufferingChannel, null at id://u2ecf5a73-13d515b0cac--7fff-u-7c5b9ab4-13d515b0ca0--7fffC=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]} Context: service-60507 Meta context: service-60121 Progress: time: Sat, 09 Mar 2013 22:56:44 +0000 Selecting site:219 Stage in:47 Active:1 Finished successfully:50 Channels: {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60121] -> BufferingChannel, null at id://u-7c5b9ab4-13d515b0ca0--8000-u2ecf5a73-13d515b0cac--8000S=MetaChannel[service-60121] -> BufferingChannel, null at id://u2ecf5a73-13d515b0cac--7fff-u-7c5b9ab4-13d515b0ca0--7fffC=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]} Context: service-60742 Meta context: service-60121 Progress: time: Sat, 09 Mar 2013 22:56:48 +0000 Selecting site:219 Stage in:46 Active:2 Finished successfully:50 Execution failed: Exception in getlanduse: Arguments: [home/wilde/osgdemo/modis/svn/data/modis/2002/h16v07.rgb] Host: beagle Directory: modis02-20130309-2252-x37dmuy0/jobs/d/getlanduse-d3q9ld6l Caused by: Shutting down worker getLandUse, modis02.swift, line 20 Attempted to unregister unregistered handler with id 526 Attempted to unregister unregistered handler with id 534 Attempted to unregister unregistered handler with id 430 Attempted to unregister unregistered handler with id 476 Failed to abort transfer java.util.ConcurrentModificationException at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) at java.util.LinkedList$ListItr.next(LinkedList.java:886) at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313) at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292) at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560) at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74) Attempted to unregister unregistered handler with id 337 Failed to abort transfer java.util.ConcurrentModificationException at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) at java.util.LinkedList$ListItr.next(LinkedList.java:886) at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) at org.globus.cog.abstraction.impl.execution.coaster.ServiceManager.statusChanged(ServiceManager.java:267) at org.globus.cog.abstraction.impl.common.task.TaskImpl.notifyListeners(TaskImpl.java:240) at org.globus.cog.abstraction.impl.common.task.TaskImpl.setStatus(TaskImpl.java:228) at org.globus.cog.abstraction.impl.common.AbstractDelegatedTaskHandler.failTask(AbstractDelegatedTaskHandler.java:57) at org.globus.cog.abstraction.impl.execution.local.JobSubmissionTaskHandler.run(JobSubmissionTaskHandler.java:257) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Failed to abort transfer java.util.ConcurrentModificationException at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) at java.util.LinkedList$ListItr.next(LinkedList.java:886) at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226 ----- Original Message ----- > From: "Michael Wilde" > To: "Mihael Hategan" > Cc: "Swift Devel" > Sent: Saturday, March 9, 2013 4:24:17 PM > Subject: Re: [Swift-devel] Cant get auto-coasters to run from midway to beagle > > I forgot to paste the error, sorry. Its below now, fer real. When I > dial down the throttle to 48 and only start 2 beagle nodes, I get > further and the app calls make it to active state. The 317 files > being staged in here are 17MB each. > > The swift progress output and error are below: > > RunID: 20130309-2204-qu9ck076 > Progress: time: Sat, 09 Mar 2013 22:04:34 +0000 > Progress: time: Sat, 09 Mar 2013 22:04:45 +0000 Submitting:316 > Submitted:1 > Progress: time: Sat, 09 Mar 2013 22:04:51 +0000 Stage in:1 > Submitted:316 > Progress: time: Sat, 09 Mar 2013 22:04:52 +0000 Stage in:25 > Submitted:292 > Progress: time: Sat, 09 Mar 2013 22:04:53 +0000 Stage in:68 > Submitted:249 > Progress: time: Sat, 09 Mar 2013 22:04:55 +0000 Stage in:113 > Submitted:204 > Progress: time: Sat, 09 Mar 2013 22:04:56 +0000 Stage in:165 > Submitted:152 > Progress: time: Sat, 09 Mar 2013 22:04:58 +0000 Stage in:177 > Submitted:140 > Progress: time: Sat, 09 Mar 2013 22:05:00 +0000 Stage in:225 > Submitted:92 > Progress: time: Sat, 09 Mar 2013 22:05:04 +0000 Stage in:241 > Submitted:76 > Progress: time: Sat, 09 Mar 2013 22:05:05 +0000 Stage in:289 > Submitted:28 > Progress: time: Sat, 09 Mar 2013 22:05:09 +0000 Stage in:305 > Submitted:12 > Progress: time: Sat, 09 Mar 2013 22:05:34 +0000 Stage in:317 > Progress: time: Sat, 09 Mar 2013 22:06:04 +0000 Stage in:317 > Progress: time: Sat, 09 Mar 2013 22:06:34 +0000 Stage in:317 > Progress: time: Sat, 09 Mar 2013 22:07:04 +0000 Stage in:317 > Progress: time: Sat, 09 Mar 2013 22:07:34 +0000 Stage in:317 > Progress: time: Sat, 09 Mar 2013 22:08:04 +0000 Stage in:317 > Progress: time: Sat, 09 Mar 2013 22:08:34 +0000 Stage in:317 > Channels: > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] > -> > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60640] > -> BufferingChannel, > null at id://u-23c37c02-13d512f435d--7fff-u66598f98-13d512f434d--7fffC=MetaChannel[https://192.5.86.107:50000] > -> > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > null at id://u66598f98-13d512f434d--8000-u-23c37c02-13d512f435d--8000S=MetaChannel[service-60640] > -> BufferingChannel} > Context: service-60822 > Meta context: service-60640 > Channels: > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] > -> > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60640] > -> BufferingChannel, > null at id://u-23c37c02-13d512f435d--7fff-u66598f98-13d512f434d--7fffC=MetaChannel[https://192.5.86.107:50000] > -> > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > null at id://u66598f98-13d512f434d--8000-u-23c37c02-13d512f435d--8000S=MetaChannel[service-60640] > -> BufferingChannel} > Context: service-60116 > Meta context: service-60640 > Channels: > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] > -> > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60640] > -> BufferingChannel, > null at id://u-23c37c02-13d512f435d--7fff-u66598f98-13d512f434d--7fffC=MetaChannel[https://192.5.86.107:50000] > -> > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > null at id://u66598f98-13d512f434d--8000-u-23c37c02-13d512f435d--8000S=MetaChannel[service-60640] > -> BufferingChannel} > Context: service-60598 > Meta context: service-60640 > Progress: time: Sat, 09 Mar 2013 22:09:04 +0000 Stage in:317 > Progress: time: Sat, 09 Mar 2013 22:09:08 +0000 Stage in:316 > Active:1 > Execution failed: > Exception in getlanduse: > Arguments: > [home/wilde/osgdemo/modis/svn/data/modis/2002/h15v02.rgb] > Host: beagle > Directory: > modis02-20130309-2204-qu9ck076/jobs/b/getlanduse-bmscjd6l > > Caused by: > Shutting down worker > getLandUse, modis02.swift, line 20 > error null > > real 4m36.777s > user 2m55.240s > sys 0m3.837s > > > --- > > With a throttle of 48 (.47) and 2 beagle nodes, I see: > > Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified locally) > > RunID: 20130309-2214-1oi3rvea > Progress: time: Sat, 09 Mar 2013 22:14:06 +0000 > Progress: time: Sat, 09 Mar 2013 22:14:17 +0000 Selecting site:269 > Submitting:47 Submitted:1 > Progress: time: Sat, 09 Mar 2013 22:14:22 +0000 Selecting site:269 > Stage in:1 Submitted:47 > Progress: time: Sat, 09 Mar 2013 22:14:28 +0000 Selecting site:269 > Stage in:25 Submitted:23 > Progress: time: Sat, 09 Mar 2013 22:14:36 +0000 Selecting site:269 > Stage in:48 > Progress: time: Sat, 09 Mar 2013 22:15:06 +0000 Selecting site:269 > Stage in:48 > Progress: time: Sat, 09 Mar 2013 22:15:36 +0000 Selecting site:269 > Stage in:48 > Progress: time: Sat, 09 Mar 2013 22:16:06 +0000 Selecting site:269 > Stage in:48 > Progress: time: Sat, 09 Mar 2013 22:16:26 +0000 Selecting site:269 > Stage in:47 Active:1 > Progress: time: Sat, 09 Mar 2013 22:16:27 +0000 Selecting site:269 > Stage in:36 Active:12 > Progress: time: Sat, 09 Mar 2013 22:16:29 +0000 Selecting site:269 > Stage in:24 Active:24 > Progress: time: Sat, 09 Mar 2013 22:16:34 +0000 Selecting site:269 > Stage in:24 Active:23 Stage out:1 > Progress: time: Sat, 09 Mar 2013 22:16:35 +0000 Selecting site:269 > Stage in:14 Active:33 Stage out:1 > Execution failed: > Exception in getlanduse: > Arguments: > [home/wilde/osgdemo/modis/svn/data/modis/2002/h08v04.rgb] > Host: beagle > Directory: > modis02-20130309-2214-1oi3rvea/jobs/k/getlanduse-ko5qjd6l > > Caused by: > Application /lustre/beagle/davidk/modis/bin/getlanduse.sh failed > with an exit code of 1 > getLandUse, modis02.swift, line 20 > > real 2m31.463s > user 1m33.238s > sys 0m2.160s > + mv /home/wilde/.swift/runs/current/run024.1362867244 > /home/wilde/.swift/runs/completed > > This error is likely in the demo app code; just pasting here to show > that with less concurrency it makes progress. > > ----- Original Message ----- > > From: "Michael Wilde" > > To: "Mihael Hategan" > > Cc: "Swift Devel" > > Sent: Saturday, March 9, 2013 4:11:24 PM > > Subject: Re: [Swift-devel] Cant get auto-coasters to run from > > midway to beagle > > > > Now Im getting the error below (from running 317 simple MODIS apps > > concurrently). Im going to dial down the throttle first to see if > > the staging load is overwhelming either coasters or the > > midway-beagle path. > > > > - Mike > > > > > > ----- Original Message ----- > > > From: "Michael Wilde" > > > To: "Mihael Hategan" > > > Cc: "Swift Devel" > > > Sent: Saturday, March 9, 2013 3:59:22 PM > > > Subject: Re: [Swift-devel] Cant get auto-coasters to run from > > > midway to beagle > > > > > > I think we just got this working. Problems may have included the > > > need > > > to pre-create the workdirectory and to specify a dotted IP > > > address > > > on the external network for GLOBUS_HOSTNAME. Will need to > > > experiment. So likely that proxy expiration time was not a > > > problem > > > (although its confusing). > > > > > > Will report back on this once the needed steps are clear. > > > > > > Thanks, > > > > > > - Mike > > > > > > ----- Original Message ----- > > > > From: "Mihael Hategan" > > > > To: "Michael Wilde" > > > > Cc: "Swift Devel" > > > > Sent: Saturday, March 9, 2013 3:56:36 PM > > > > Subject: Re: Cant get auto-coasters to run from midway to > > > > beagle > > > > > > > > Can you post ,globus/coasters/coaster.log from beagle? > > > > > > > > On Sat, 2013-03-09 at 15:46 -0600, Michael Wilde wrote: > > > > > Mihael, can you advise on this problem? > > > > > > > > > > David and I are trying to run automatic coaster jobs from > > > > > midway > > > > > login hosts and swift.rcc to beagle using ssh-cl:pbs. > > > > > > > > > > My failed attempts are on midway under > > > > > /home/wilde/osgdemo/modis/svn, see eg run020 (which has > > > > > complete > > > > > logs). > > > > > > > > > > Quick question about the proxy files that get copied. Does > > > > > this > > > > > look OK? : > > > > > > > > > > 2013-03-09 21:24:46,895+0000 INFO AutoCA Checking > > > > > certificate > > > > > /home/wilde/.globus/coasters/proxy.0.pem > > > > > 2013-03-09 21:24:46,967+0000 INFO AutoCA Using certificate > > > > > /home/wilde/.globus/coasters/proxy.0.pem with expiration date > > > > > Sat > > > > > Mar 23\ > > > > > 19:25:53 GMT 2013 > > > > > > > > > > The proxy expiration time listed above is two hours *earlier* > > > > > than > > > > > the current time (as seen in the message's UTC timestamp). > > > > > Is > > > > > that correct, or a possible cause of this problem? > > > > > > > > > > The main symptom seems to be this: > > > > > > > > > > Execution failed: > > > > > Exception in getlanduse: > > > > > Arguments: [../data/modis/2002/h00v09.rgb] > > > > > Host: beagle > > > > > Directory: > > > > > modis01-20130309-2124-7ua3bde3/jobs/d/getlanduse-d24rhd6l > > > > > > > > > > Caused by: > > > > > Could not submit job > > > > > Caused by: > > > > > Could not start coaster service > > > > > Caused by: > > > > > Task ended before registration was received. > > > > > Failed to download bootstrap jar from > > > > > http://midway001.rcc.uchicago.edu:50001 > > > > > --- > > > > > > > > > > Yet Ive verified that midway login4 (which is the target > > > > > system) > > > > > can connect to this hostname and port (with nc -l and telnet) > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > From wilde at mcs.anl.gov Sat Mar 9 17:09:16 2013 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 9 Mar 2013 17:09:16 -0600 (CST) Subject: [Swift-devel] Cant get auto-coasters to run from midway to beagle In-Reply-To: <424118075.1231990.1362870325913.JavaMail.root@mcs.anl.gov> Message-ID: <1522642899.1232101.1362870556551.JavaMail.root@mcs.anl.gov> See instead run028. Errors below. Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified locally) RunID: 20130309-2252-x37dmuy0 Progress: time: Sat, 09 Mar 2013 22:52:24 +0000 Progress: time: Sat, 09 Mar 2013 22:52:34 +0000 Selecting site:269 Submitting:47 Submitted:1 Progress: time: Sat, 09 Mar 2013 22:52:42 +0000 Selecting site:269 Stage in:1 Submitted:47 Progress: time: Sat, 09 Mar 2013 22:52:54 +0000 Selecting site:269 Stage in:48 Progress: time: Sat, 09 Mar 2013 22:53:24 +0000 Selecting site:269 Stage in:48 Progress: time: Sat, 09 Mar 2013 22:53:54 +0000 Selecting site:269 Stage in:48 Progress: time: Sat, 09 Mar 2013 22:54:24 +0000 Selecting site:269 Stage in:48 Progress: time: Sat, 09 Mar 2013 22:54:51 +0000 Selecting site:269 Stage in:47 Active:1 Progress: time: Sat, 09 Mar 2013 22:54:52 +0000 Selecting site:269 Stage in:42 Active:6 Progress: time: Sat, 09 Mar 2013 22:54:54 +0000 Selecting site:269 Stage in:24 Active:24 Progress: time: Sat, 09 Mar 2013 22:54:57 +0000 Selecting site:269 Active:47 Stage out:1 Progress: time: Sat, 09 Mar 2013 22:54:58 +0000 Selecting site:266 Stage in:2 Submitted:1 Active:44 Stage out:1 Finished successfully:3 Progress: time: Sat, 09 Mar 2013 22:55:00 +0000 Selecting site:261 Stage in:5 Submitting:2 Submitted:1 Active:37 Stage out:3 Finished successfully:8 Progress: time: Sat, 09 Mar 2013 22:55:01 +0000 Selecting site:254 Stage in:12 Submitting:3 Active:24 Stage out:8 Finished successfully:16 Progress: time: Sat, 09 Mar 2013 22:55:02 +0000 Selecting site:241 Stage in:23 Submitting:5 Active:15 Stage out:4 Finished successfully:29 Progress: time: Sat, 09 Mar 2013 22:55:03 +0000 Selecting site:234 Stage in:28 Submitting:7 Stage out:12 Finished successfully:36 Progress: time: Sat, 09 Mar 2013 22:55:04 +0000 Selecting site:221 Stage in:35 Submitting:12 Submitted:1 Finished successfully:48 Progress: time: Sat, 09 Mar 2013 22:55:24 +0000 Selecting site:221 Stage in:48 Finished successfully:48 Progress: time: Sat, 09 Mar 2013 22:55:54 +0000 Selecting site:221 Stage in:48 Finished successfully:48 Progress: time: Sat, 09 Mar 2013 22:56:08 +0000 Selecting site:221 Stage in:47 Active:1 Finished successfully:48 Progress: time: Sat, 09 Mar 2013 22:56:14 +0000 Selecting site:221 Stage in:47 Stage out:1 Finished successfully:48 Progress: time: Sat, 09 Mar 2013 22:56:16 +0000 Selecting site:221 Stage in:47 Finished successfully:49 Progress: time: Sat, 09 Mar 2013 22:56:19 +0000 Selecting site:220 Stage in:47 Submitted:1 Finished successfully:49 Progress: time: Sat, 09 Mar 2013 22:56:24 +0000 Selecting site:220 Stage in:48 Finished successfully:49 Progress: time: Sat, 09 Mar 2013 22:56:29 +0000 Selecting site:220 Stage in:47 Active:1 Finished successfully:49 Progress: time: Sat, 09 Mar 2013 22:56:35 +0000 Selecting site:220 Stage in:47 Stage out:1 Finished successfully:49 Channels: {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60121] -> BufferingChannel, null at id://u-7c5b9ab4-13d515b0ca0--8000-u2ecf5a73-13d515b0cac--8000S=MetaChannel[service-60121] -> BufferingChannel, null at id://u2ecf5a73-13d515b0cac--7fff-u-7c5b9ab4-13d515b0ca0--7fffC=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]} Context: service-60231 Meta context: service-60121 Progress: time: Sat, 09 Mar 2013 22:56:37 +0000 Selecting site:220 Stage in:47 Finished successfully:50 Progress: time: Sat, 09 Mar 2013 22:56:40 +0000 Selecting site:219 Stage in:47 Submitted:1 Finished successfully:50 Channels: {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60121] -> BufferingChannel, null at id://u-7c5b9ab4-13d515b0ca0--8000-u2ecf5a73-13d515b0cac--8000S=MetaChannel[service-60121] -> BufferingChannel, null at id://u2ecf5a73-13d515b0cac--7fff-u-7c5b9ab4-13d515b0ca0--7fffC=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]} Context: service-60507 Meta context: service-60121 Progress: time: Sat, 09 Mar 2013 22:56:44 +0000 Selecting site:219 Stage in:47 Active:1 Finished successfully:50 Channels: {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60121] -> BufferingChannel, null at id://u-7c5b9ab4-13d515b0ca0--8000-u2ecf5a73-13d515b0cac--8000S=MetaChannel[service-60121] -> BufferingChannel, null at id://u2ecf5a73-13d515b0cac--7fff-u-7c5b9ab4-13d515b0ca0--7fffC=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]} Context: service-60742 Meta context: service-60121 Progress: time: Sat, 09 Mar 2013 22:56:48 +0000 Selecting site:219 Stage in:46 Active:2 Finished successfully:50 Execution failed: Exception in getlanduse: Arguments: [home/wilde/osgdemo/modis/svn/data/modis/2002/h16v07.rgb] Host: beagle Directory: modis02-20130309-2252-x37dmuy0/jobs/d/getlanduse-d3q9ld6l Caused by: Shutting down worker getLandUse, modis02.swift, line 20 Attempted to unregister unregistered handler with id 526 Attempted to unregister unregistered handler with id 534 Attempted to unregister unregistered handler with id 430 Attempted to unregister unregistered handler with id 476 Failed to abort transfer java.util.ConcurrentModificationException at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) at java.util.LinkedList$ListItr.next(LinkedList.java:886) at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313) at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292) at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560) at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74) Attempted to unregister unregistered handler with id 337 Failed to abort transfer java.util.ConcurrentModificationException at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) at java.util.LinkedList$ListItr.next(LinkedList.java:886) at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) at org.globus.cog.abstraction.impl.execution.coaster.ServiceManager.statusChanged(ServiceManager.java:267) at org.globus.cog.abstraction.impl.common.task.TaskImpl.notifyListeners(TaskImpl.java:240) at org.globus.cog.abstraction.impl.common.task.TaskImpl.setStatus(TaskImpl.java:228) at org.globus.cog.abstraction.impl.common.AbstractDelegatedTaskHandler.failTask(AbstractDelegatedTaskHandler.java:57) at org.globus.cog.abstraction.impl.execution.local.JobSubmissionTaskHandler.run(JobSubmissionTaskHandler.java:257) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Failed to abort transfer java.util.ConcurrentModificationException at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) at java.util.LinkedList$ListItr.next(LinkedList.java:886) at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313) at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292) at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560) at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74) Failed to abort transfer java.util.ConcurrentModificationException at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) at java.util.LinkedList$ListItr.next(LinkedList.java:886) at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313) at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292) at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560) at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74) Attempted to unregister unregistered handler with id 348 Attempted to unregister unregistered handler with id 466 Attempted to unregister unregistered handler with id 347 Failed to abort transfer java.util.ConcurrentModificationException at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) at java.util.LinkedList$ListItr.next(LinkedList.java:886) at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313) at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292) at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560) at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74) Attempted to unregister unregistered handler with id 456 Attempted to unregister unregistered handler with id 454 Attempted to unregister unregistered handler with id 508 Failed to abort transfer java.util.ConcurrentModificationException at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) at java.util.LinkedList$ListItr.next(LinkedList.java:886) at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313) at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292) at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560) at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74) Failed to abort transfer java.util.ConcurrentModificationException at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) at java.util.LinkedList$ListItr.next(LinkedList.java:886) at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313) at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292) at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560) at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74) Attempted to unregister unregistered handler with id 511 Attempted to unregister unregistered handler with id 506 Failed to abort transfer java.util.ConcurrentModificationException at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) at java.util.LinkedList$ListItr.next(LinkedList.java:886) at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313) at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292) at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560) at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74) Attempted to unregister unregistered handler with id 380 Attempted to unregister unregistered handler with id 502 Attempted to unregister unregistered handler with id 376 Attempted to unregister unregistered handler with id 226 Failed to abort transfer java.util.ConcurrentModificationException at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) at java.util.LinkedList$ListItr.next(LinkedList.java:886) at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313) at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292) at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560) at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74) Failed to abort transfer java.util.ConcurrentModificationException at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) at java.util.LinkedList$ListItr.next(LinkedList.java:886) at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313) at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292) at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560) at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74) Attempted to unregister unregistered handler with id 484 Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-66-1-1-1362869544093) Task being removed twice? java.lang.Throwable at org.globus.cog.karajan.workflow.nodes.grid.AbstractGridNode.removeTask(AbstractGridNode.java:291) at org.globus.cog.karajan.workflow.nodes.grid.AbstractGridNode.statusChanged(AbstractGridNode.java:263) at org.griphyn.vdl.karajan.lib.Execute.statusChanged(Execute.java:136) at org.globus.cog.karajan.scheduler.AbstractScheduler.fireJobStatusChangeEvent(AbstractScheduler.java:168) at org.globus.cog.karajan.scheduler.LateBindingScheduler.statusChanged(LateBindingScheduler.java:665) at org.globus.cog.karajan.scheduler.WeightedHostScoreScheduler.statusChanged(WeightedHostScoreScheduler.java:428) at org.griphyn.vdl.karajan.VDSAdaptiveScheduler.statusChanged(VDSAdaptiveScheduler.java:426) at org.globus.cog.abstraction.impl.common.task.TaskImpl.notifyListeners(TaskImpl.java:240) at org.globus.cog.abstraction.impl.common.task.TaskImpl.setStatus(TaskImpl.java:228) at org.globus.cog.abstraction.impl.execution.coaster.JobSubmissionTaskHandler.errorReceived(JobSubmissionTaskHandler.java:219) at org.globus.cog.karajan.workflow.service.commands.Command.errorReceived(Command.java:191) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:227) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313) at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292) at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560) at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74) Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-244-1-1-1362869544091) Ex098 java.lang.NullPointerException at org.globus.cog.karajan.arguments.NamedArgumentsImpl.merge(NamedArgumentsImpl.java:52) at org.globus.cog.karajan.workflow.nodes.SequentialChoice.commitBuffers(SequentialChoice.java:46) at org.globus.cog.karajan.workflow.nodes.SequentialChoice.completed(SequentialChoice.java:40) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:194) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) at org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:98) at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Could not fail element Attempted to close nonexistent channel buffers at org.globus.cog.karajan.arguments.ArgUtil.closeBuffers(ArgUtil.java:279) at org.globus.cog.karajan.workflow.nodes.AbstractParallelIterator.closeBuffers(AbstractParallelIterator.java:107) at org.globus.cog.karajan.workflow.nodes.AbstractParallelIterator.failed(AbstractParallelIterator.java:143) at org.globus.cog.karajan.workflow.nodes.FlowNode.failImmediately(FlowNode.java:89) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:151) at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) at org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:98) at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-84-1-1-1362869544098) Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-64-1-1-1362869544095) Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-45-1-1-1362869544101) Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-79-1-1-1362869544099) Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-68-1-1-1362869544108) Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-63-1-1-1362869544103) Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-256-1-1-1362869544077) error null error null real 4m27.856s user 2m45.576s sys 0m3.697s + mv /home/wilde/.swift/runs/current/run028.1362869541 /home/wilde/.swift/runs/completed midway001$ ----- Original Message ----- > From: "Michael Wilde" > To: "Mihael Hategan" > Cc: "Swift Devel" > Sent: Saturday, March 9, 2013 5:05:25 PM > Subject: Re: [Swift-devel] Cant get auto-coasters to run from midway to beagle > > Mihael, now I think I have a coaster problem. Curiously, it always > seems to happen at about 5 mins into the run. > > Logs for these runs are on midway in eg > /home/wilde/osgdemo/modis/svn/run027 > > leading portion of error from stdout/err is below. > > - Mike > > Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified locally) > > RunID: 20130309-2252-x37dmuy0 > Progress: time: Sat, 09 Mar 2013 22:52:24 +0000 > Progress: time: Sat, 09 Mar 2013 22:52:34 +0000 Selecting site:269 > Submitting:47 Submitted:1 > Progress: time: Sat, 09 Mar 2013 22:52:42 +0000 Selecting site:269 > Stage in:1 Submitted:47 > Progress: time: Sat, 09 Mar 2013 22:52:54 +0000 Selecting site:269 > Stage in:48 > Progress: time: Sat, 09 Mar 2013 22:53:24 +0000 Selecting site:269 > Stage in:48 > Progress: time: Sat, 09 Mar 2013 22:53:54 +0000 Selecting site:269 > Stage in:48 > Progress: time: Sat, 09 Mar 2013 22:54:24 +0000 Selecting site:269 > Stage in:48 > Progress: time: Sat, 09 Mar 2013 22:54:51 +0000 Selecting site:269 > Stage in:47 Active:1 > Progress: time: Sat, 09 Mar 2013 22:54:52 +0000 Selecting site:269 > Stage in:42 Active:6 > Progress: time: Sat, 09 Mar 2013 22:54:54 +0000 Selecting site:269 > Stage in:24 Active:24 > Progress: time: Sat, 09 Mar 2013 22:54:57 +0000 Selecting site:269 > Active:47 Stage out:1 > Progress: time: Sat, 09 Mar 2013 22:54:58 +0000 Selecting site:266 > Stage in:2 Submitted:1 Active:44 Stage out:1 Finished > successfully:3 > Progress: time: Sat, 09 Mar 2013 22:55:00 +0000 Selecting site:261 > Stage in:5 Submitting:2 Submitted:1 Active:37 Stage out:3 > Finished successfully:8 > Progress: time: Sat, 09 Mar 2013 22:55:01 +0000 Selecting site:254 > Stage in:12 Submitting:3 Active:24 Stage out:8 Finished > successfully:16 > Progress: time: Sat, 09 Mar 2013 22:55:02 +0000 Selecting site:241 > Stage in:23 Submitting:5 Active:15 Stage out:4 Finished > successfully:29 > Progress: time: Sat, 09 Mar 2013 22:55:03 +0000 Selecting site:234 > Stage in:28 Submitting:7 Stage out:12 Finished successfully:36 > Progress: time: Sat, 09 Mar 2013 22:55:04 +0000 Selecting site:221 > Stage in:35 Submitting:12 Submitted:1 Finished successfully:48 > Progress: time: Sat, 09 Mar 2013 22:55:24 +0000 Selecting site:221 > Stage in:48 Finished successfully:48 > Progress: time: Sat, 09 Mar 2013 22:55:54 +0000 Selecting site:221 > Stage in:48 Finished successfully:48 > Progress: time: Sat, 09 Mar 2013 22:56:08 +0000 Selecting site:221 > Stage in:47 Active:1 Finished successfully:48 > Progress: time: Sat, 09 Mar 2013 22:56:14 +0000 Selecting site:221 > Stage in:47 Stage out:1 Finished successfully:48 > Progress: time: Sat, 09 Mar 2013 22:56:16 +0000 Selecting site:221 > Stage in:47 Finished successfully:49 > Progress: time: Sat, 09 Mar 2013 22:56:19 +0000 Selecting site:220 > Stage in:47 Submitted:1 Finished successfully:49 > Progress: time: Sat, 09 Mar 2013 22:56:24 +0000 Selecting site:220 > Stage in:48 Finished successfully:49 > Progress: time: Sat, 09 Mar 2013 22:56:29 +0000 Selecting site:220 > Stage in:47 Active:1 Finished successfully:49 > Progress: time: Sat, 09 Mar 2013 22:56:35 +0000 Selecting site:220 > Stage in:47 Stage out:1 Finished successfully:49 > Channels: > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] > -> > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60121] > -> BufferingChannel, > null at id://u-7c5b9ab4-13d515b0ca0--8000-u2ecf5a73-13d515b0cac--8000S=MetaChannel[service-60121] > -> BufferingChannel, > null at id://u2ecf5a73-13d515b0cac--7fff-u-7c5b9ab4-13d515b0ca0--7fffC=MetaChannel[https://192.5.86.107:50000] > -> > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]} > Context: service-60231 > Meta context: service-60121 > Progress: time: Sat, 09 Mar 2013 22:56:37 +0000 Selecting site:220 > Stage in:47 Finished successfully:50 > Progress: time: Sat, 09 Mar 2013 22:56:40 +0000 Selecting site:219 > Stage in:47 Submitted:1 Finished successfully:50 > Channels: > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] > -> > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60121] > -> BufferingChannel, > null at id://u-7c5b9ab4-13d515b0ca0--8000-u2ecf5a73-13d515b0cac--8000S=MetaChannel[service-60121] > -> BufferingChannel, > null at id://u2ecf5a73-13d515b0cac--7fff-u-7c5b9ab4-13d515b0ca0--7fffC=MetaChannel[https://192.5.86.107:50000] > -> > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]} > Context: service-60507 > Meta context: service-60121 > Progress: time: Sat, 09 Mar 2013 22:56:44 +0000 Selecting site:219 > Stage in:47 Active:1 Finished successfully:50 > Channels: > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] > -> > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60121] > -> BufferingChannel, > null at id://u-7c5b9ab4-13d515b0ca0--8000-u2ecf5a73-13d515b0cac--8000S=MetaChannel[service-60121] > -> BufferingChannel, > null at id://u2ecf5a73-13d515b0cac--7fff-u-7c5b9ab4-13d515b0ca0--7fffC=MetaChannel[https://192.5.86.107:50000] > -> > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]} > Context: service-60742 > Meta context: service-60121 > Progress: time: Sat, 09 Mar 2013 22:56:48 +0000 Selecting site:219 > Stage in:46 Active:2 Finished successfully:50 > Execution failed: > Exception in getlanduse: > Arguments: > [home/wilde/osgdemo/modis/svn/data/modis/2002/h16v07.rgb] > Host: beagle > Directory: > modis02-20130309-2252-x37dmuy0/jobs/d/getlanduse-d3q9ld6l > > Caused by: > Shutting down worker > getLandUse, modis02.swift, line 20 > Attempted to unregister unregistered handler with id 526 > Attempted to unregister unregistered handler with id 534 > Attempted to unregister unregistered handler with id 430 > Attempted to unregister unregistered handler with id 476 > Failed to abort transfer > java.util.ConcurrentModificationException > at > java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) > at java.util.LinkedList$ListItr.next(LinkedList.java:886) > at > org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) > at > org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) > at > org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) > at > org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) > at > org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) > at > org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) > at > org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) > at > org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313) > at > org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292) > at > org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560) > at > org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74) > Attempted to unregister unregistered handler with id 337 > Failed to abort transfer > java.util.ConcurrentModificationException > at > java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) > at java.util.LinkedList$ListItr.next(LinkedList.java:886) > at > org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) > at > org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) > at > org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) > at > org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) > at > org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) > at > org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) > at > org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) > at > org.globus.cog.abstraction.impl.execution.coaster.ServiceManager.statusChanged(ServiceManager.java:267) > at > org.globus.cog.abstraction.impl.common.task.TaskImpl.notifyListeners(TaskImpl.java:240) > at > org.globus.cog.abstraction.impl.common.task.TaskImpl.setStatus(TaskImpl.java:228) > at > org.globus.cog.abstraction.impl.common.AbstractDelegatedTaskHandler.failTask(AbstractDelegatedTaskHandler.java:57) > at > org.globus.cog.abstraction.impl.execution.local.JobSubmissionTaskHandler.run(JobSubmissionTaskHandler.java:257) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:722) > Failed to abort transfer > java.util.ConcurrentModificationException > at > java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) > at java.util.LinkedList$ListItr.next(LinkedList.java:886) > at > org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) > at > org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) > at > org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) > at > org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) > at > org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) > at > org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) > at > org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226 > > ----- Original Message ----- > > From: "Michael Wilde" > > To: "Mihael Hategan" > > Cc: "Swift Devel" > > Sent: Saturday, March 9, 2013 4:24:17 PM > > Subject: Re: [Swift-devel] Cant get auto-coasters to run from > > midway to beagle > > > > I forgot to paste the error, sorry. Its below now, fer real. When > > I > > dial down the throttle to 48 and only start 2 beagle nodes, I get > > further and the app calls make it to active state. The 317 files > > being staged in here are 17MB each. > > > > The swift progress output and error are below: > > > > RunID: 20130309-2204-qu9ck076 > > Progress: time: Sat, 09 Mar 2013 22:04:34 +0000 > > Progress: time: Sat, 09 Mar 2013 22:04:45 +0000 Submitting:316 > > Submitted:1 > > Progress: time: Sat, 09 Mar 2013 22:04:51 +0000 Stage in:1 > > Submitted:316 > > Progress: time: Sat, 09 Mar 2013 22:04:52 +0000 Stage in:25 > > Submitted:292 > > Progress: time: Sat, 09 Mar 2013 22:04:53 +0000 Stage in:68 > > Submitted:249 > > Progress: time: Sat, 09 Mar 2013 22:04:55 +0000 Stage in:113 > > Submitted:204 > > Progress: time: Sat, 09 Mar 2013 22:04:56 +0000 Stage in:165 > > Submitted:152 > > Progress: time: Sat, 09 Mar 2013 22:04:58 +0000 Stage in:177 > > Submitted:140 > > Progress: time: Sat, 09 Mar 2013 22:05:00 +0000 Stage in:225 > > Submitted:92 > > Progress: time: Sat, 09 Mar 2013 22:05:04 +0000 Stage in:241 > > Submitted:76 > > Progress: time: Sat, 09 Mar 2013 22:05:05 +0000 Stage in:289 > > Submitted:28 > > Progress: time: Sat, 09 Mar 2013 22:05:09 +0000 Stage in:305 > > Submitted:12 > > Progress: time: Sat, 09 Mar 2013 22:05:34 +0000 Stage in:317 > > Progress: time: Sat, 09 Mar 2013 22:06:04 +0000 Stage in:317 > > Progress: time: Sat, 09 Mar 2013 22:06:34 +0000 Stage in:317 > > Progress: time: Sat, 09 Mar 2013 22:07:04 +0000 Stage in:317 > > Progress: time: Sat, 09 Mar 2013 22:07:34 +0000 Stage in:317 > > Progress: time: Sat, 09 Mar 2013 22:08:04 +0000 Stage in:317 > > Progress: time: Sat, 09 Mar 2013 22:08:34 +0000 Stage in:317 > > Channels: > > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] > > -> > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60640] > > -> BufferingChannel, > > null at id://u-23c37c02-13d512f435d--7fff-u66598f98-13d512f434d--7fffC=MetaChannel[https://192.5.86.107:50000] > > -> > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > null at id://u66598f98-13d512f434d--8000-u-23c37c02-13d512f435d--8000S=MetaChannel[service-60640] > > -> BufferingChannel} > > Context: service-60822 > > Meta context: service-60640 > > Channels: > > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] > > -> > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60640] > > -> BufferingChannel, > > null at id://u-23c37c02-13d512f435d--7fff-u66598f98-13d512f434d--7fffC=MetaChannel[https://192.5.86.107:50000] > > -> > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > null at id://u66598f98-13d512f434d--8000-u-23c37c02-13d512f435d--8000S=MetaChannel[service-60640] > > -> BufferingChannel} > > Context: service-60116 > > Meta context: service-60640 > > Channels: > > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] > > -> > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60640] > > -> BufferingChannel, > > null at id://u-23c37c02-13d512f435d--7fff-u66598f98-13d512f434d--7fffC=MetaChannel[https://192.5.86.107:50000] > > -> > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > null at id://u66598f98-13d512f434d--8000-u-23c37c02-13d512f435d--8000S=MetaChannel[service-60640] > > -> BufferingChannel} > > Context: service-60598 > > Meta context: service-60640 > > Progress: time: Sat, 09 Mar 2013 22:09:04 +0000 Stage in:317 > > Progress: time: Sat, 09 Mar 2013 22:09:08 +0000 Stage in:316 > > Active:1 > > Execution failed: > > Exception in getlanduse: > > Arguments: > > [home/wilde/osgdemo/modis/svn/data/modis/2002/h15v02.rgb] > > Host: beagle > > Directory: > > modis02-20130309-2204-qu9ck076/jobs/b/getlanduse-bmscjd6l > > > > Caused by: > > Shutting down worker > > getLandUse, modis02.swift, line 20 > > error null > > > > real 4m36.777s > > user 2m55.240s > > sys 0m3.837s > > > > > > --- > > > > With a throttle of 48 (.47) and 2 beagle nodes, I see: > > > > Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified locally) > > > > RunID: 20130309-2214-1oi3rvea > > Progress: time: Sat, 09 Mar 2013 22:14:06 +0000 > > Progress: time: Sat, 09 Mar 2013 22:14:17 +0000 Selecting > > site:269 > > Submitting:47 Submitted:1 > > Progress: time: Sat, 09 Mar 2013 22:14:22 +0000 Selecting > > site:269 > > Stage in:1 Submitted:47 > > Progress: time: Sat, 09 Mar 2013 22:14:28 +0000 Selecting > > site:269 > > Stage in:25 Submitted:23 > > Progress: time: Sat, 09 Mar 2013 22:14:36 +0000 Selecting > > site:269 > > Stage in:48 > > Progress: time: Sat, 09 Mar 2013 22:15:06 +0000 Selecting > > site:269 > > Stage in:48 > > Progress: time: Sat, 09 Mar 2013 22:15:36 +0000 Selecting > > site:269 > > Stage in:48 > > Progress: time: Sat, 09 Mar 2013 22:16:06 +0000 Selecting > > site:269 > > Stage in:48 > > Progress: time: Sat, 09 Mar 2013 22:16:26 +0000 Selecting > > site:269 > > Stage in:47 Active:1 > > Progress: time: Sat, 09 Mar 2013 22:16:27 +0000 Selecting > > site:269 > > Stage in:36 Active:12 > > Progress: time: Sat, 09 Mar 2013 22:16:29 +0000 Selecting > > site:269 > > Stage in:24 Active:24 > > Progress: time: Sat, 09 Mar 2013 22:16:34 +0000 Selecting > > site:269 > > Stage in:24 Active:23 Stage out:1 > > Progress: time: Sat, 09 Mar 2013 22:16:35 +0000 Selecting > > site:269 > > Stage in:14 Active:33 Stage out:1 > > Execution failed: > > Exception in getlanduse: > > Arguments: > > [home/wilde/osgdemo/modis/svn/data/modis/2002/h08v04.rgb] > > Host: beagle > > Directory: > > modis02-20130309-2214-1oi3rvea/jobs/k/getlanduse-ko5qjd6l > > > > Caused by: > > Application /lustre/beagle/davidk/modis/bin/getlanduse.sh failed > > with an exit code of 1 > > getLandUse, modis02.swift, line 20 > > > > real 2m31.463s > > user 1m33.238s > > sys 0m2.160s > > + mv /home/wilde/.swift/runs/current/run024.1362867244 > > /home/wilde/.swift/runs/completed > > > > This error is likely in the demo app code; just pasting here to > > show > > that with less concurrency it makes progress. > > > > ----- Original Message ----- > > > From: "Michael Wilde" > > > To: "Mihael Hategan" > > > Cc: "Swift Devel" > > > Sent: Saturday, March 9, 2013 4:11:24 PM > > > Subject: Re: [Swift-devel] Cant get auto-coasters to run from > > > midway to beagle > > > > > > Now Im getting the error below (from running 317 simple MODIS > > > apps > > > concurrently). Im going to dial down the throttle first to see > > > if > > > the staging load is overwhelming either coasters or the > > > midway-beagle path. > > > > > > - Mike > > > > > > > > > ----- Original Message ----- > > > > From: "Michael Wilde" > > > > To: "Mihael Hategan" > > > > Cc: "Swift Devel" > > > > Sent: Saturday, March 9, 2013 3:59:22 PM > > > > Subject: Re: [Swift-devel] Cant get auto-coasters to run from > > > > midway to beagle > > > > > > > > I think we just got this working. Problems may have included > > > > the > > > > need > > > > to pre-create the workdirectory and to specify a dotted IP > > > > address > > > > on the external network for GLOBUS_HOSTNAME. Will need to > > > > experiment. So likely that proxy expiration time was not a > > > > problem > > > > (although its confusing). > > > > > > > > Will report back on this once the needed steps are clear. > > > > > > > > Thanks, > > > > > > > > - Mike > > > > > > > > ----- Original Message ----- > > > > > From: "Mihael Hategan" > > > > > To: "Michael Wilde" > > > > > Cc: "Swift Devel" > > > > > Sent: Saturday, March 9, 2013 3:56:36 PM > > > > > Subject: Re: Cant get auto-coasters to run from midway to > > > > > beagle > > > > > > > > > > Can you post ,globus/coasters/coaster.log from beagle? > > > > > > > > > > On Sat, 2013-03-09 at 15:46 -0600, Michael Wilde wrote: > > > > > > Mihael, can you advise on this problem? > > > > > > > > > > > > David and I are trying to run automatic coaster jobs from > > > > > > midway > > > > > > login hosts and swift.rcc to beagle using ssh-cl:pbs. > > > > > > > > > > > > My failed attempts are on midway under > > > > > > /home/wilde/osgdemo/modis/svn, see eg run020 (which has > > > > > > complete > > > > > > logs). > > > > > > > > > > > > Quick question about the proxy files that get copied. Does > > > > > > this > > > > > > look OK? : > > > > > > > > > > > > 2013-03-09 21:24:46,895+0000 INFO AutoCA Checking > > > > > > certificate > > > > > > /home/wilde/.globus/coasters/proxy.0.pem > > > > > > 2013-03-09 21:24:46,967+0000 INFO AutoCA Using certificate > > > > > > /home/wilde/.globus/coasters/proxy.0.pem with expiration > > > > > > date > > > > > > Sat > > > > > > Mar 23\ > > > > > > 19:25:53 GMT 2013 > > > > > > > > > > > > The proxy expiration time listed above is two hours > > > > > > *earlier* > > > > > > than > > > > > > the current time (as seen in the message's UTC timestamp). > > > > > > Is > > > > > > that correct, or a possible cause of this problem? > > > > > > > > > > > > The main symptom seems to be this: > > > > > > > > > > > > Execution failed: > > > > > > Exception in getlanduse: > > > > > > Arguments: [../data/modis/2002/h00v09.rgb] > > > > > > Host: beagle > > > > > > Directory: > > > > > > modis01-20130309-2124-7ua3bde3/jobs/d/getlanduse-d24rhd6l > > > > > > > > > > > > Caused by: > > > > > > Could not submit job > > > > > > Caused by: > > > > > > Could not start coaster service > > > > > > Caused by: > > > > > > Task ended before registration was received. > > > > > > Failed to download bootstrap jar from > > > > > > http://midway001.rcc.uchicago.edu:50001 > > > > > > --- > > > > > > > > > > > > Yet Ive verified that midway login4 (which is the target > > > > > > system) > > > > > > can connect to this hostname and port (with nc -l and > > > > > > telnet) > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > From hategan at mcs.anl.gov Sat Mar 9 17:11:40 2013 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 09 Mar 2013 15:11:40 -0800 Subject: [Swift-devel] Cant get auto-coasters to run from midway to beagle In-Reply-To: <1522642899.1232101.1362870556551.JavaMail.root@mcs.anl.gov> References: <1522642899.1232101.1362870556551.JavaMail.root@mcs.anl.gov> Message-ID: <1362870700.24889.1.camel@echo> Got it. I'll look a bit later. Right now I'm working on Lorenzo's stuff. Mihael On Sat, 2013-03-09 at 17:09 -0600, Michael Wilde wrote: > See instead run028. Errors below. > > Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified locally) > > RunID: 20130309-2252-x37dmuy0 > Progress: time: Sat, 09 Mar 2013 22:52:24 +0000 > Progress: time: Sat, 09 Mar 2013 22:52:34 +0000 Selecting site:269 Submitting:47 Submitted:1 > Progress: time: Sat, 09 Mar 2013 22:52:42 +0000 Selecting site:269 Stage in:1 Submitted:47 > Progress: time: Sat, 09 Mar 2013 22:52:54 +0000 Selecting site:269 Stage in:48 > Progress: time: Sat, 09 Mar 2013 22:53:24 +0000 Selecting site:269 Stage in:48 > Progress: time: Sat, 09 Mar 2013 22:53:54 +0000 Selecting site:269 Stage in:48 > Progress: time: Sat, 09 Mar 2013 22:54:24 +0000 Selecting site:269 Stage in:48 > Progress: time: Sat, 09 Mar 2013 22:54:51 +0000 Selecting site:269 Stage in:47 Active:1 > Progress: time: Sat, 09 Mar 2013 22:54:52 +0000 Selecting site:269 Stage in:42 Active:6 > Progress: time: Sat, 09 Mar 2013 22:54:54 +0000 Selecting site:269 Stage in:24 Active:24 > Progress: time: Sat, 09 Mar 2013 22:54:57 +0000 Selecting site:269 Active:47 Stage out:1 > Progress: time: Sat, 09 Mar 2013 22:54:58 +0000 Selecting site:266 Stage in:2 Submitted:1 Active:44 Stage out:1 Finished successfully:3 > Progress: time: Sat, 09 Mar 2013 22:55:00 +0000 Selecting site:261 Stage in:5 Submitting:2 Submitted:1 Active:37 Stage out:3 Finished successfully:8 > Progress: time: Sat, 09 Mar 2013 22:55:01 +0000 Selecting site:254 Stage in:12 Submitting:3 Active:24 Stage out:8 Finished successfully:16 > Progress: time: Sat, 09 Mar 2013 22:55:02 +0000 Selecting site:241 Stage in:23 Submitting:5 Active:15 Stage out:4 Finished successfully:29 > Progress: time: Sat, 09 Mar 2013 22:55:03 +0000 Selecting site:234 Stage in:28 Submitting:7 Stage out:12 Finished successfully:36 > Progress: time: Sat, 09 Mar 2013 22:55:04 +0000 Selecting site:221 Stage in:35 Submitting:12 Submitted:1 Finished successfully:48 > Progress: time: Sat, 09 Mar 2013 22:55:24 +0000 Selecting site:221 Stage in:48 Finished successfully:48 > Progress: time: Sat, 09 Mar 2013 22:55:54 +0000 Selecting site:221 Stage in:48 Finished successfully:48 > Progress: time: Sat, 09 Mar 2013 22:56:08 +0000 Selecting site:221 Stage in:47 Active:1 Finished successfully:48 > Progress: time: Sat, 09 Mar 2013 22:56:14 +0000 Selecting site:221 Stage in:47 Stage out:1 Finished successfully:48 > Progress: time: Sat, 09 Mar 2013 22:56:16 +0000 Selecting site:221 Stage in:47 Finished successfully:49 > Progress: time: Sat, 09 Mar 2013 22:56:19 +0000 Selecting site:220 Stage in:47 Submitted:1 Finished successfully:49 > Progress: time: Sat, 09 Mar 2013 22:56:24 +0000 Selecting site:220 Stage in:48 Finished successfully:49 > Progress: time: Sat, 09 Mar 2013 22:56:29 +0000 Selecting site:220 Stage in:47 Active:1 Finished successfully:49 > Progress: time: Sat, 09 Mar 2013 22:56:35 +0000 Selecting site:220 Stage in:47 Stage out:1 Finished successfully:49 > Channels: {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60121] -> BufferingChannel, null at id://u-7c5b9ab4-13d515b0ca0--8000-u2ecf5a73-13d515b0cac--8000S=MetaChannel[service-60121] -> BufferingChannel, null at id://u2ecf5a73-13d515b0cac--7fff-u-7c5b9ab4-13d515b0ca0--7fffC=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]} > Context: service-60231 > Meta context: service-60121 > Progress: time: Sat, 09 Mar 2013 22:56:37 +0000 Selecting site:220 Stage in:47 Finished successfully:50 > Progress: time: Sat, 09 Mar 2013 22:56:40 +0000 Selecting site:219 Stage in:47 Submitted:1 Finished successfully:50 > Channels: {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60121] -> BufferingChannel, null at id://u-7c5b9ab4-13d515b0ca0--8000-u2ecf5a73-13d515b0cac--8000S=MetaChannel[service-60121] -> BufferingChannel, null at id://u2ecf5a73-13d515b0cac--7fff-u-7c5b9ab4-13d515b0ca0--7fffC=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]} > Context: service-60507 > Meta context: service-60121 > Progress: time: Sat, 09 Mar 2013 22:56:44 +0000 Selecting site:219 Stage in:47 Active:1 Finished successfully:50 > Channels: {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60121] -> BufferingChannel, null at id://u-7c5b9ab4-13d515b0ca0--8000-u2ecf5a73-13d515b0cac--8000S=MetaChannel[service-60121] -> BufferingChannel, null at id://u2ecf5a73-13d515b0cac--7fff-u-7c5b9ab4-13d515b0ca0--7fffC=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]} > Context: service-60742 > Meta context: service-60121 > Progress: time: Sat, 09 Mar 2013 22:56:48 +0000 Selecting site:219 Stage in:46 Active:2 Finished successfully:50 > Execution failed: > Exception in getlanduse: > Arguments: [home/wilde/osgdemo/modis/svn/data/modis/2002/h16v07.rgb] > Host: beagle > Directory: modis02-20130309-2252-x37dmuy0/jobs/d/getlanduse-d3q9ld6l > > Caused by: > Shutting down worker > getLandUse, modis02.swift, line 20 > Attempted to unregister unregistered handler with id 526 > Attempted to unregister unregistered handler with id 534 > Attempted to unregister unregistered handler with id 430 > Attempted to unregister unregistered handler with id 476 > Failed to abort transfer > java.util.ConcurrentModificationException > at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) > at java.util.LinkedList$ListItr.next(LinkedList.java:886) > at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) > at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) > at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) > at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) > at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313) > at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292) > at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560) > at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74) > Attempted to unregister unregistered handler with id 337 > Failed to abort transfer > java.util.ConcurrentModificationException > at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) > at java.util.LinkedList$ListItr.next(LinkedList.java:886) > at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) > at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) > at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) > at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) > at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) > at org.globus.cog.abstraction.impl.execution.coaster.ServiceManager.statusChanged(ServiceManager.java:267) > at org.globus.cog.abstraction.impl.common.task.TaskImpl.notifyListeners(TaskImpl.java:240) > at org.globus.cog.abstraction.impl.common.task.TaskImpl.setStatus(TaskImpl.java:228) > at org.globus.cog.abstraction.impl.common.AbstractDelegatedTaskHandler.failTask(AbstractDelegatedTaskHandler.java:57) > at org.globus.cog.abstraction.impl.execution.local.JobSubmissionTaskHandler.run(JobSubmissionTaskHandler.java:257) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:722) > Failed to abort transfer > java.util.ConcurrentModificationException > at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) > at java.util.LinkedList$ListItr.next(LinkedList.java:886) > at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) > at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) > at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) > at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) > at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313) > at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292) > at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560) > at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74) > Failed to abort transfer > java.util.ConcurrentModificationException > at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) > at java.util.LinkedList$ListItr.next(LinkedList.java:886) > at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) > at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) > at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) > at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) > at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313) > at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292) > at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560) > at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74) > Attempted to unregister unregistered handler with id 348 > Attempted to unregister unregistered handler with id 466 > Attempted to unregister unregistered handler with id 347 > Failed to abort transfer > java.util.ConcurrentModificationException > at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) > at java.util.LinkedList$ListItr.next(LinkedList.java:886) > at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) > at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) > at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) > at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) > at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313) > at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292) > at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560) > at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74) > Attempted to unregister unregistered handler with id 456 > Attempted to unregister unregistered handler with id 454 > Attempted to unregister unregistered handler with id 508 > Failed to abort transfer > java.util.ConcurrentModificationException > at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) > at java.util.LinkedList$ListItr.next(LinkedList.java:886) > at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) > at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) > at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) > at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) > at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313) > at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292) > at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560) > at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74) > Failed to abort transfer > java.util.ConcurrentModificationException > at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) > at java.util.LinkedList$ListItr.next(LinkedList.java:886) > at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) > at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) > at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) > at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) > at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313) > at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292) > at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560) > at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74) > Attempted to unregister unregistered handler with id 511 > Attempted to unregister unregistered handler with id 506 > Failed to abort transfer > java.util.ConcurrentModificationException > at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) > at java.util.LinkedList$ListItr.next(LinkedList.java:886) > at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) > at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) > at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) > at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) > at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313) > at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292) > at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560) > at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74) > Attempted to unregister unregistered handler with id 380 > Attempted to unregister unregistered handler with id 502 > Attempted to unregister unregistered handler with id 376 > Attempted to unregister unregistered handler with id 226 > Failed to abort transfer > java.util.ConcurrentModificationException > at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) > at java.util.LinkedList$ListItr.next(LinkedList.java:886) > at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) > at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) > at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) > at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) > at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313) > at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292) > at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560) > at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74) > Failed to abort transfer > java.util.ConcurrentModificationException > at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) > at java.util.LinkedList$ListItr.next(LinkedList.java:886) > at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) > at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) > at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) > at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) > at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313) > at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292) > at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560) > at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74) > Attempted to unregister unregistered handler with id 484 > Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-66-1-1-1362869544093) > Task being removed twice? > java.lang.Throwable > at org.globus.cog.karajan.workflow.nodes.grid.AbstractGridNode.removeTask(AbstractGridNode.java:291) > at org.globus.cog.karajan.workflow.nodes.grid.AbstractGridNode.statusChanged(AbstractGridNode.java:263) > at org.griphyn.vdl.karajan.lib.Execute.statusChanged(Execute.java:136) > at org.globus.cog.karajan.scheduler.AbstractScheduler.fireJobStatusChangeEvent(AbstractScheduler.java:168) > at org.globus.cog.karajan.scheduler.LateBindingScheduler.statusChanged(LateBindingScheduler.java:665) > at org.globus.cog.karajan.scheduler.WeightedHostScoreScheduler.statusChanged(WeightedHostScoreScheduler.java:428) > at org.griphyn.vdl.karajan.VDSAdaptiveScheduler.statusChanged(VDSAdaptiveScheduler.java:426) > at org.globus.cog.abstraction.impl.common.task.TaskImpl.notifyListeners(TaskImpl.java:240) > at org.globus.cog.abstraction.impl.common.task.TaskImpl.setStatus(TaskImpl.java:228) > at org.globus.cog.abstraction.impl.execution.coaster.JobSubmissionTaskHandler.errorReceived(JobSubmissionTaskHandler.java:219) > at org.globus.cog.karajan.workflow.service.commands.Command.errorReceived(Command.java:191) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:227) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313) > at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292) > at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560) > at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74) > Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-244-1-1-1362869544091) > Ex098 > java.lang.NullPointerException > at org.globus.cog.karajan.arguments.NamedArgumentsImpl.merge(NamedArgumentsImpl.java:52) > at org.globus.cog.karajan.workflow.nodes.SequentialChoice.commitBuffers(SequentialChoice.java:46) > at org.globus.cog.karajan.workflow.nodes.SequentialChoice.completed(SequentialChoice.java:40) > at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:194) > at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > at org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:98) > at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:722) > Could not fail element > Attempted to close nonexistent channel buffers > > at org.globus.cog.karajan.arguments.ArgUtil.closeBuffers(ArgUtil.java:279) > at org.globus.cog.karajan.workflow.nodes.AbstractParallelIterator.closeBuffers(AbstractParallelIterator.java:107) > at org.globus.cog.karajan.workflow.nodes.AbstractParallelIterator.failed(AbstractParallelIterator.java:143) > at org.globus.cog.karajan.workflow.nodes.FlowNode.failImmediately(FlowNode.java:89) > at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:151) > at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > at org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:98) > at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:722) > Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-84-1-1-1362869544098) > Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-64-1-1-1362869544095) > Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-45-1-1-1362869544101) > Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-79-1-1-1362869544099) > Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-68-1-1-1362869544108) > Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-63-1-1-1362869544103) > Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-256-1-1-1362869544077) > error null > error null > > real 4m27.856s > user 2m45.576s > sys 0m3.697s > + mv /home/wilde/.swift/runs/current/run028.1362869541 /home/wilde/.swift/runs/completed > midway001$ > > > ----- Original Message ----- > > From: "Michael Wilde" > > To: "Mihael Hategan" > > Cc: "Swift Devel" > > Sent: Saturday, March 9, 2013 5:05:25 PM > > Subject: Re: [Swift-devel] Cant get auto-coasters to run from midway to beagle > > > > Mihael, now I think I have a coaster problem. Curiously, it always > > seems to happen at about 5 mins into the run. > > > > Logs for these runs are on midway in eg > > /home/wilde/osgdemo/modis/svn/run027 > > > > leading portion of error from stdout/err is below. > > > > - Mike > > > > Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified locally) > > > > RunID: 20130309-2252-x37dmuy0 > > Progress: time: Sat, 09 Mar 2013 22:52:24 +0000 > > Progress: time: Sat, 09 Mar 2013 22:52:34 +0000 Selecting site:269 > > Submitting:47 Submitted:1 > > Progress: time: Sat, 09 Mar 2013 22:52:42 +0000 Selecting site:269 > > Stage in:1 Submitted:47 > > Progress: time: Sat, 09 Mar 2013 22:52:54 +0000 Selecting site:269 > > Stage in:48 > > Progress: time: Sat, 09 Mar 2013 22:53:24 +0000 Selecting site:269 > > Stage in:48 > > Progress: time: Sat, 09 Mar 2013 22:53:54 +0000 Selecting site:269 > > Stage in:48 > > Progress: time: Sat, 09 Mar 2013 22:54:24 +0000 Selecting site:269 > > Stage in:48 > > Progress: time: Sat, 09 Mar 2013 22:54:51 +0000 Selecting site:269 > > Stage in:47 Active:1 > > Progress: time: Sat, 09 Mar 2013 22:54:52 +0000 Selecting site:269 > > Stage in:42 Active:6 > > Progress: time: Sat, 09 Mar 2013 22:54:54 +0000 Selecting site:269 > > Stage in:24 Active:24 > > Progress: time: Sat, 09 Mar 2013 22:54:57 +0000 Selecting site:269 > > Active:47 Stage out:1 > > Progress: time: Sat, 09 Mar 2013 22:54:58 +0000 Selecting site:266 > > Stage in:2 Submitted:1 Active:44 Stage out:1 Finished > > successfully:3 > > Progress: time: Sat, 09 Mar 2013 22:55:00 +0000 Selecting site:261 > > Stage in:5 Submitting:2 Submitted:1 Active:37 Stage out:3 > > Finished successfully:8 > > Progress: time: Sat, 09 Mar 2013 22:55:01 +0000 Selecting site:254 > > Stage in:12 Submitting:3 Active:24 Stage out:8 Finished > > successfully:16 > > Progress: time: Sat, 09 Mar 2013 22:55:02 +0000 Selecting site:241 > > Stage in:23 Submitting:5 Active:15 Stage out:4 Finished > > successfully:29 > > Progress: time: Sat, 09 Mar 2013 22:55:03 +0000 Selecting site:234 > > Stage in:28 Submitting:7 Stage out:12 Finished successfully:36 > > Progress: time: Sat, 09 Mar 2013 22:55:04 +0000 Selecting site:221 > > Stage in:35 Submitting:12 Submitted:1 Finished successfully:48 > > Progress: time: Sat, 09 Mar 2013 22:55:24 +0000 Selecting site:221 > > Stage in:48 Finished successfully:48 > > Progress: time: Sat, 09 Mar 2013 22:55:54 +0000 Selecting site:221 > > Stage in:48 Finished successfully:48 > > Progress: time: Sat, 09 Mar 2013 22:56:08 +0000 Selecting site:221 > > Stage in:47 Active:1 Finished successfully:48 > > Progress: time: Sat, 09 Mar 2013 22:56:14 +0000 Selecting site:221 > > Stage in:47 Stage out:1 Finished successfully:48 > > Progress: time: Sat, 09 Mar 2013 22:56:16 +0000 Selecting site:221 > > Stage in:47 Finished successfully:49 > > Progress: time: Sat, 09 Mar 2013 22:56:19 +0000 Selecting site:220 > > Stage in:47 Submitted:1 Finished successfully:49 > > Progress: time: Sat, 09 Mar 2013 22:56:24 +0000 Selecting site:220 > > Stage in:48 Finished successfully:49 > > Progress: time: Sat, 09 Mar 2013 22:56:29 +0000 Selecting site:220 > > Stage in:47 Active:1 Finished successfully:49 > > Progress: time: Sat, 09 Mar 2013 22:56:35 +0000 Selecting site:220 > > Stage in:47 Stage out:1 Finished successfully:49 > > Channels: > > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] > > -> > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60121] > > -> BufferingChannel, > > null at id://u-7c5b9ab4-13d515b0ca0--8000-u2ecf5a73-13d515b0cac--8000S=MetaChannel[service-60121] > > -> BufferingChannel, > > null at id://u2ecf5a73-13d515b0cac--7fff-u-7c5b9ab4-13d515b0ca0--7fffC=MetaChannel[https://192.5.86.107:50000] > > -> > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]} > > Context: service-60231 > > Meta context: service-60121 > > Progress: time: Sat, 09 Mar 2013 22:56:37 +0000 Selecting site:220 > > Stage in:47 Finished successfully:50 > > Progress: time: Sat, 09 Mar 2013 22:56:40 +0000 Selecting site:219 > > Stage in:47 Submitted:1 Finished successfully:50 > > Channels: > > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] > > -> > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60121] > > -> BufferingChannel, > > null at id://u-7c5b9ab4-13d515b0ca0--8000-u2ecf5a73-13d515b0cac--8000S=MetaChannel[service-60121] > > -> BufferingChannel, > > null at id://u2ecf5a73-13d515b0cac--7fff-u-7c5b9ab4-13d515b0ca0--7fffC=MetaChannel[https://192.5.86.107:50000] > > -> > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]} > > Context: service-60507 > > Meta context: service-60121 > > Progress: time: Sat, 09 Mar 2013 22:56:44 +0000 Selecting site:219 > > Stage in:47 Active:1 Finished successfully:50 > > Channels: > > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] > > -> > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60121] > > -> BufferingChannel, > > null at id://u-7c5b9ab4-13d515b0ca0--8000-u2ecf5a73-13d515b0cac--8000S=MetaChannel[service-60121] > > -> BufferingChannel, > > null at id://u2ecf5a73-13d515b0cac--7fff-u-7c5b9ab4-13d515b0ca0--7fffC=MetaChannel[https://192.5.86.107:50000] > > -> > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]} > > Context: service-60742 > > Meta context: service-60121 > > Progress: time: Sat, 09 Mar 2013 22:56:48 +0000 Selecting site:219 > > Stage in:46 Active:2 Finished successfully:50 > > Execution failed: > > Exception in getlanduse: > > Arguments: > > [home/wilde/osgdemo/modis/svn/data/modis/2002/h16v07.rgb] > > Host: beagle > > Directory: > > modis02-20130309-2252-x37dmuy0/jobs/d/getlanduse-d3q9ld6l > > > > Caused by: > > Shutting down worker > > getLandUse, modis02.swift, line 20 > > Attempted to unregister unregistered handler with id 526 > > Attempted to unregister unregistered handler with id 534 > > Attempted to unregister unregistered handler with id 430 > > Attempted to unregister unregistered handler with id 476 > > Failed to abort transfer > > java.util.ConcurrentModificationException > > at > > java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) > > at java.util.LinkedList$ListItr.next(LinkedList.java:886) > > at > > org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) > > at > > org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) > > at > > org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) > > at > > org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) > > at > > org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) > > at > > org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) > > at > > org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) > > at > > org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313) > > at > > org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292) > > at > > org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560) > > at > > org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74) > > Attempted to unregister unregistered handler with id 337 > > Failed to abort transfer > > java.util.ConcurrentModificationException > > at > > java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) > > at java.util.LinkedList$ListItr.next(LinkedList.java:886) > > at > > org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) > > at > > org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) > > at > > org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) > > at > > org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) > > at > > org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) > > at > > org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) > > at > > org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) > > at > > org.globus.cog.abstraction.impl.execution.coaster.ServiceManager.statusChanged(ServiceManager.java:267) > > at > > org.globus.cog.abstraction.impl.common.task.TaskImpl.notifyListeners(TaskImpl.java:240) > > at > > org.globus.cog.abstraction.impl.common.task.TaskImpl.setStatus(TaskImpl.java:228) > > at > > org.globus.cog.abstraction.impl.common.AbstractDelegatedTaskHandler.failTask(AbstractDelegatedTaskHandler.java:57) > > at > > org.globus.cog.abstraction.impl.execution.local.JobSubmissionTaskHandler.run(JobSubmissionTaskHandler.java:257) > > at > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > > at > > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > > at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > > at java.lang.Thread.run(Thread.java:722) > > Failed to abort transfer > > java.util.ConcurrentModificationException > > at > > java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) > > at java.util.LinkedList$ListItr.next(LinkedList.java:886) > > at > > org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) > > at > > org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) > > at > > org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) > > at > > org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) > > at > > org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) > > at > > org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) > > at > > org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226 > > > > ----- Original Message ----- > > > From: "Michael Wilde" > > > To: "Mihael Hategan" > > > Cc: "Swift Devel" > > > Sent: Saturday, March 9, 2013 4:24:17 PM > > > Subject: Re: [Swift-devel] Cant get auto-coasters to run from > > > midway to beagle > > > > > > I forgot to paste the error, sorry. Its below now, fer real. When > > > I > > > dial down the throttle to 48 and only start 2 beagle nodes, I get > > > further and the app calls make it to active state. The 317 files > > > being staged in here are 17MB each. > > > > > > The swift progress output and error are below: > > > > > > RunID: 20130309-2204-qu9ck076 > > > Progress: time: Sat, 09 Mar 2013 22:04:34 +0000 > > > Progress: time: Sat, 09 Mar 2013 22:04:45 +0000 Submitting:316 > > > Submitted:1 > > > Progress: time: Sat, 09 Mar 2013 22:04:51 +0000 Stage in:1 > > > Submitted:316 > > > Progress: time: Sat, 09 Mar 2013 22:04:52 +0000 Stage in:25 > > > Submitted:292 > > > Progress: time: Sat, 09 Mar 2013 22:04:53 +0000 Stage in:68 > > > Submitted:249 > > > Progress: time: Sat, 09 Mar 2013 22:04:55 +0000 Stage in:113 > > > Submitted:204 > > > Progress: time: Sat, 09 Mar 2013 22:04:56 +0000 Stage in:165 > > > Submitted:152 > > > Progress: time: Sat, 09 Mar 2013 22:04:58 +0000 Stage in:177 > > > Submitted:140 > > > Progress: time: Sat, 09 Mar 2013 22:05:00 +0000 Stage in:225 > > > Submitted:92 > > > Progress: time: Sat, 09 Mar 2013 22:05:04 +0000 Stage in:241 > > > Submitted:76 > > > Progress: time: Sat, 09 Mar 2013 22:05:05 +0000 Stage in:289 > > > Submitted:28 > > > Progress: time: Sat, 09 Mar 2013 22:05:09 +0000 Stage in:305 > > > Submitted:12 > > > Progress: time: Sat, 09 Mar 2013 22:05:34 +0000 Stage in:317 > > > Progress: time: Sat, 09 Mar 2013 22:06:04 +0000 Stage in:317 > > > Progress: time: Sat, 09 Mar 2013 22:06:34 +0000 Stage in:317 > > > Progress: time: Sat, 09 Mar 2013 22:07:04 +0000 Stage in:317 > > > Progress: time: Sat, 09 Mar 2013 22:07:34 +0000 Stage in:317 > > > Progress: time: Sat, 09 Mar 2013 22:08:04 +0000 Stage in:317 > > > Progress: time: Sat, 09 Mar 2013 22:08:34 +0000 Stage in:317 > > > Channels: > > > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] > > > -> > > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60640] > > > -> BufferingChannel, > > > null at id://u-23c37c02-13d512f435d--7fff-u66598f98-13d512f434d--7fffC=MetaChannel[https://192.5.86.107:50000] > > > -> > > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > > null at id://u66598f98-13d512f434d--8000-u-23c37c02-13d512f435d--8000S=MetaChannel[service-60640] > > > -> BufferingChannel} > > > Context: service-60822 > > > Meta context: service-60640 > > > Channels: > > > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] > > > -> > > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60640] > > > -> BufferingChannel, > > > null at id://u-23c37c02-13d512f435d--7fff-u66598f98-13d512f434d--7fffC=MetaChannel[https://192.5.86.107:50000] > > > -> > > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > > null at id://u66598f98-13d512f434d--8000-u-23c37c02-13d512f435d--8000S=MetaChannel[service-60640] > > > -> BufferingChannel} > > > Context: service-60116 > > > Meta context: service-60640 > > > Channels: > > > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] > > > -> > > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60640] > > > -> BufferingChannel, > > > null at id://u-23c37c02-13d512f435d--7fff-u66598f98-13d512f434d--7fffC=MetaChannel[https://192.5.86.107:50000] > > > -> > > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > > null at id://u66598f98-13d512f434d--8000-u-23c37c02-13d512f435d--8000S=MetaChannel[service-60640] > > > -> BufferingChannel} > > > Context: service-60598 > > > Meta context: service-60640 > > > Progress: time: Sat, 09 Mar 2013 22:09:04 +0000 Stage in:317 > > > Progress: time: Sat, 09 Mar 2013 22:09:08 +0000 Stage in:316 > > > Active:1 > > > Execution failed: > > > Exception in getlanduse: > > > Arguments: > > > [home/wilde/osgdemo/modis/svn/data/modis/2002/h15v02.rgb] > > > Host: beagle > > > Directory: > > > modis02-20130309-2204-qu9ck076/jobs/b/getlanduse-bmscjd6l > > > > > > Caused by: > > > Shutting down worker > > > getLandUse, modis02.swift, line 20 > > > error null > > > > > > real 4m36.777s > > > user 2m55.240s > > > sys 0m3.837s > > > > > > > > > --- > > > > > > With a throttle of 48 (.47) and 2 beagle nodes, I see: > > > > > > Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified locally) > > > > > > RunID: 20130309-2214-1oi3rvea > > > Progress: time: Sat, 09 Mar 2013 22:14:06 +0000 > > > Progress: time: Sat, 09 Mar 2013 22:14:17 +0000 Selecting > > > site:269 > > > Submitting:47 Submitted:1 > > > Progress: time: Sat, 09 Mar 2013 22:14:22 +0000 Selecting > > > site:269 > > > Stage in:1 Submitted:47 > > > Progress: time: Sat, 09 Mar 2013 22:14:28 +0000 Selecting > > > site:269 > > > Stage in:25 Submitted:23 > > > Progress: time: Sat, 09 Mar 2013 22:14:36 +0000 Selecting > > > site:269 > > > Stage in:48 > > > Progress: time: Sat, 09 Mar 2013 22:15:06 +0000 Selecting > > > site:269 > > > Stage in:48 > > > Progress: time: Sat, 09 Mar 2013 22:15:36 +0000 Selecting > > > site:269 > > > Stage in:48 > > > Progress: time: Sat, 09 Mar 2013 22:16:06 +0000 Selecting > > > site:269 > > > Stage in:48 > > > Progress: time: Sat, 09 Mar 2013 22:16:26 +0000 Selecting > > > site:269 > > > Stage in:47 Active:1 > > > Progress: time: Sat, 09 Mar 2013 22:16:27 +0000 Selecting > > > site:269 > > > Stage in:36 Active:12 > > > Progress: time: Sat, 09 Mar 2013 22:16:29 +0000 Selecting > > > site:269 > > > Stage in:24 Active:24 > > > Progress: time: Sat, 09 Mar 2013 22:16:34 +0000 Selecting > > > site:269 > > > Stage in:24 Active:23 Stage out:1 > > > Progress: time: Sat, 09 Mar 2013 22:16:35 +0000 Selecting > > > site:269 > > > Stage in:14 Active:33 Stage out:1 > > > Execution failed: > > > Exception in getlanduse: > > > Arguments: > > > [home/wilde/osgdemo/modis/svn/data/modis/2002/h08v04.rgb] > > > Host: beagle > > > Directory: > > > modis02-20130309-2214-1oi3rvea/jobs/k/getlanduse-ko5qjd6l > > > > > > Caused by: > > > Application /lustre/beagle/davidk/modis/bin/getlanduse.sh failed > > > with an exit code of 1 > > > getLandUse, modis02.swift, line 20 > > > > > > real 2m31.463s > > > user 1m33.238s > > > sys 0m2.160s > > > + mv /home/wilde/.swift/runs/current/run024.1362867244 > > > /home/wilde/.swift/runs/completed > > > > > > This error is likely in the demo app code; just pasting here to > > > show > > > that with less concurrency it makes progress. > > > > > > ----- Original Message ----- > > > > From: "Michael Wilde" > > > > To: "Mihael Hategan" > > > > Cc: "Swift Devel" > > > > Sent: Saturday, March 9, 2013 4:11:24 PM > > > > Subject: Re: [Swift-devel] Cant get auto-coasters to run from > > > > midway to beagle > > > > > > > > Now Im getting the error below (from running 317 simple MODIS > > > > apps > > > > concurrently). Im going to dial down the throttle first to see > > > > if > > > > the staging load is overwhelming either coasters or the > > > > midway-beagle path. > > > > > > > > - Mike > > > > > > > > > > > > ----- Original Message ----- > > > > > From: "Michael Wilde" > > > > > To: "Mihael Hategan" > > > > > Cc: "Swift Devel" > > > > > Sent: Saturday, March 9, 2013 3:59:22 PM > > > > > Subject: Re: [Swift-devel] Cant get auto-coasters to run from > > > > > midway to beagle > > > > > > > > > > I think we just got this working. Problems may have included > > > > > the > > > > > need > > > > > to pre-create the workdirectory and to specify a dotted IP > > > > > address > > > > > on the external network for GLOBUS_HOSTNAME. Will need to > > > > > experiment. So likely that proxy expiration time was not a > > > > > problem > > > > > (although its confusing). > > > > > > > > > > Will report back on this once the needed steps are clear. > > > > > > > > > > Thanks, > > > > > > > > > > - Mike > > > > > > > > > > ----- Original Message ----- > > > > > > From: "Mihael Hategan" > > > > > > To: "Michael Wilde" > > > > > > Cc: "Swift Devel" > > > > > > Sent: Saturday, March 9, 2013 3:56:36 PM > > > > > > Subject: Re: Cant get auto-coasters to run from midway to > > > > > > beagle > > > > > > > > > > > > Can you post ,globus/coasters/coaster.log from beagle? > > > > > > > > > > > > On Sat, 2013-03-09 at 15:46 -0600, Michael Wilde wrote: > > > > > > > Mihael, can you advise on this problem? > > > > > > > > > > > > > > David and I are trying to run automatic coaster jobs from > > > > > > > midway > > > > > > > login hosts and swift.rcc to beagle using ssh-cl:pbs. > > > > > > > > > > > > > > My failed attempts are on midway under > > > > > > > /home/wilde/osgdemo/modis/svn, see eg run020 (which has > > > > > > > complete > > > > > > > logs). > > > > > > > > > > > > > > Quick question about the proxy files that get copied. Does > > > > > > > this > > > > > > > look OK? : > > > > > > > > > > > > > > 2013-03-09 21:24:46,895+0000 INFO AutoCA Checking > > > > > > > certificate > > > > > > > /home/wilde/.globus/coasters/proxy.0.pem > > > > > > > 2013-03-09 21:24:46,967+0000 INFO AutoCA Using certificate > > > > > > > /home/wilde/.globus/coasters/proxy.0.pem with expiration > > > > > > > date > > > > > > > Sat > > > > > > > Mar 23\ > > > > > > > 19:25:53 GMT 2013 > > > > > > > > > > > > > > The proxy expiration time listed above is two hours > > > > > > > *earlier* > > > > > > > than > > > > > > > the current time (as seen in the message's UTC timestamp). > > > > > > > Is > > > > > > > that correct, or a possible cause of this problem? > > > > > > > > > > > > > > The main symptom seems to be this: > > > > > > > > > > > > > > Execution failed: > > > > > > > Exception in getlanduse: > > > > > > > Arguments: [../data/modis/2002/h00v09.rgb] > > > > > > > Host: beagle > > > > > > > Directory: > > > > > > > modis01-20130309-2124-7ua3bde3/jobs/d/getlanduse-d24rhd6l > > > > > > > > > > > > > > Caused by: > > > > > > > Could not submit job > > > > > > > Caused by: > > > > > > > Could not start coaster service > > > > > > > Caused by: > > > > > > > Task ended before registration was received. > > > > > > > Failed to download bootstrap jar from > > > > > > > http://midway001.rcc.uchicago.edu:50001 > > > > > > > --- > > > > > > > > > > > > > > Yet Ive verified that midway login4 (which is the target > > > > > > > system) > > > > > > > can connect to this hostname and port (with nc -l and > > > > > > > telnet) > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > Swift-devel mailing list > > > > > Swift-devel at ci.uchicago.edu > > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > From wilde at mcs.anl.gov Sat Mar 9 17:12:58 2013 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 9 Mar 2013 17:12:58 -0600 (CST) Subject: [Swift-devel] a note on stress testing In-Reply-To: <1522642899.1232101.1362870556551.JavaMail.root@mcs.anl.gov> Message-ID: <434037777.1232168.1362870778578.JavaMail.root@mcs.anl.gov> Yadu, below is exactly the kind of error Im hoping we can catch in the test suite. The one below is happening on remote submissions from midway to beagle using coaster provider staging of 17MB input files. So it might need both site-config and stress testing concurrently, to detect. - Mike ----- Forwarded Message ----- From: "Michael Wilde" To: "Mihael Hategan" Cc: "Swift Devel" Sent: Saturday, March 9, 2013 5:09:16 PM Subject: Re: [Swift-devel] Cant get auto-coasters to run from midway to beagle See instead run028. Errors below. Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified locally) RunID: 20130309-2252-x37dmuy0 Progress: time: Sat, 09 Mar 2013 22:52:24 +0000 Progress: time: Sat, 09 Mar 2013 22:52:34 +0000 Selecting site:269 Submitting:47 Submitted:1 Progress: time: Sat, 09 Mar 2013 22:52:42 +0000 Selecting site:269 Stage in:1 Submitted:47 Progress: time: Sat, 09 Mar 2013 22:52:54 +0000 Selecting site:269 Stage in:48 Progress: time: Sat, 09 Mar 2013 22:53:24 +0000 Selecting site:269 Stage in:48 Progress: time: Sat, 09 Mar 2013 22:53:54 +0000 Selecting site:269 Stage in:48 Progress: time: Sat, 09 Mar 2013 22:54:24 +0000 Selecting site:269 Stage in:48 Progress: time: Sat, 09 Mar 2013 22:54:51 +0000 Selecting site:269 Stage in:47 Active:1 Progress: time: Sat, 09 Mar 2013 22:54:52 +0000 Selecting site:269 Stage in:42 Active:6 Progress: time: Sat, 09 Mar 2013 22:54:54 +0000 Selecting site:269 Stage in:24 Active:24 Progress: time: Sat, 09 Mar 2013 22:54:57 +0000 Selecting site:269 Active:47 Stage out:1 Progress: time: Sat, 09 Mar 2013 22:54:58 +0000 Selecting site:266 Stage in:2 Submitted:1 Active:44 Stage out:1 Finished successfully:3 Progress: time: Sat, 09 Mar 2013 22:55:00 +0000 Selecting site:261 Stage in:5 Submitting:2 Submitted:1 Active:37 Stage out:3 Finished successfully:8 Progress: time: Sat, 09 Mar 2013 22:55:01 +0000 Selecting site:254 Stage in:12 Submitting:3 Active:24 Stage out:8 Finished successfully:16 Progress: time: Sat, 09 Mar 2013 22:55:02 +0000 Selecting site:241 Stage in:23 Submitting:5 Active:15 Stage out:4 Finished successfully:29 Progress: time: Sat, 09 Mar 2013 22:55:03 +0000 Selecting site:234 Stage in:28 Submitting:7 Stage out:12 Finished successfully:36 Progress: time: Sat, 09 Mar 2013 22:55:04 +0000 Selecting site:221 Stage in:35 Submitting:12 Submitted:1 Finished successfully:48 Progress: time: Sat, 09 Mar 2013 22:55:24 +0000 Selecting site:221 Stage in:48 Finished successfully:48 Progress: time: Sat, 09 Mar 2013 22:55:54 +0000 Selecting site:221 Stage in:48 Finished successfully:48 Progress: time: Sat, 09 Mar 2013 22:56:08 +0000 Selecting site:221 Stage in:47 Active:1 Finished successfully:48 Progress: time: Sat, 09 Mar 2013 22:56:14 +0000 Selecting site:221 Stage in:47 Stage out:1 Finished successfully:48 Progress: time: Sat, 09 Mar 2013 22:56:16 +0000 Selecting site:221 Stage in:47 Finished successfully:49 Progress: time: Sat, 09 Mar 2013 22:56:19 +0000 Selecting site:220 Stage in:47 Submitted:1 Finished successfully:49 Progress: time: Sat, 09 Mar 2013 22:56:24 +0000 Selecting site:220 Stage in:48 Finished successfully:49 Progress: time: Sat, 09 Mar 2013 22:56:29 +0000 Selecting site:220 Stage in:47 Active:1 Finished successfully:49 Progress: time: Sat, 09 Mar 2013 22:56:35 +0000 Selecting site:220 Stage in:47 Stage out:1 Finished successfully:49 Channels: {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60121] -> BufferingChannel, null at id://u-7c5b9ab4-13d515b0ca0--8000-u2ecf5a73-13d515b0cac--8000S=MetaChannel[service-60121] -> BufferingChannel, null at id://u2ecf5a73-13d515b0cac--7fff-u-7c5b9ab4-13d515b0ca0--7fffC=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]} Context: service-60231 Meta context: service-60121 Progress: time: Sat, 09 Mar 2013 22:56:37 +0000 Selecting site:220 Stage in:47 Finished successfully:50 Progress: time: Sat, 09 Mar 2013 22:56:40 +0000 Selecting site:219 Stage in:47 Submitted:1 Finished successfully:50 Channels: {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60121] -> BufferingChannel, null at id://u-7c5b9ab4-13d515b0ca0--8000-u2ecf5a73-13d515b0cac--8000S=MetaChannel[service-60121] -> BufferingChannel, null at id://u2ecf5a73-13d515b0cac--7fff-u-7c5b9ab4-13d515b0ca0--7fffC=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]} Context: service-60507 Meta context: service-60121 Progress: time: Sat, 09 Mar 2013 22:56:44 +0000 Selecting site:219 Stage in:47 Active:1 Finished successfully:50 Channels: {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60121] -> BufferingChannel, null at id://u-7c5b9ab4-13d515b0ca0--8000-u2ecf5a73-13d515b0cac--8000S=MetaChannel[service-60121] -> BufferingChannel, null at id://u2ecf5a73-13d515b0cac--7fff-u-7c5b9ab4-13d515b0ca0--7fffC=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]} Context: service-60742 Meta context: service-60121 Progress: time: Sat, 09 Mar 2013 22:56:48 +0000 Selecting site:219 Stage in:46 Active:2 Finished successfully:50 Execution failed: Exception in getlanduse: Arguments: [home/wilde/osgdemo/modis/svn/data/modis/2002/h16v07.rgb] Host: beagle Directory: modis02-20130309-2252-x37dmuy0/jobs/d/getlanduse-d3q9ld6l Caused by: Shutting down worker getLandUse, modis02.swift, line 20 Attempted to unregister unregistered handler with id 526 Attempted to unregister unregistered handler with id 534 Attempted to unregister unregistered handler with id 430 Attempted to unregister unregistered handler with id 476 Failed to abort transfer java.util.ConcurrentModificationException at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) at java.util.LinkedList$ListItr.next(LinkedList.java:886) at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313) at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292) at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560) at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74) Attempted to unregister unregistered handler with id 337 Failed to abort transfer java.util.ConcurrentModificationException at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) at java.util.LinkedList$ListItr.next(LinkedList.java:886) at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) at org.globus.cog.abstraction.impl.execution.coaster.ServiceManager.statusChanged(ServiceManager.java:267) at org.globus.cog.abstraction.impl.common.task.TaskImpl.notifyListeners(TaskImpl.java:240) at org.globus.cog.abstraction.impl.common.task.TaskImpl.setStatus(TaskImpl.java:228) at org.globus.cog.abstraction.impl.common.AbstractDelegatedTaskHandler.failTask(AbstractDelegatedTaskHandler.java:57) at org.globus.cog.abstraction.impl.execution.local.JobSubmissionTaskHandler.run(JobSubmissionTaskHandler.java:257) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Failed to abort transfer java.util.ConcurrentModificationException at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) at java.util.LinkedList$ListItr.next(LinkedList.java:886) at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313) at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292) at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560) at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74) Failed to abort transfer java.util.ConcurrentModificationException at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) at java.util.LinkedList$ListItr.next(LinkedList.java:886) at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313) at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292) at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560) at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74) Attempted to unregister unregistered handler with id 348 Attempted to unregister unregistered handler with id 466 Attempted to unregister unregistered handler with id 347 Failed to abort transfer java.util.ConcurrentModificationException at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) at java.util.LinkedList$ListItr.next(LinkedList.java:886) at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313) at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292) at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560) at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74) Attempted to unregister unregistered handler with id 456 Attempted to unregister unregistered handler with id 454 Attempted to unregister unregistered handler with id 508 Failed to abort transfer java.util.ConcurrentModificationException at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) at java.util.LinkedList$ListItr.next(LinkedList.java:886) at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313) at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292) at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560) at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74) Failed to abort transfer java.util.ConcurrentModificationException at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) at java.util.LinkedList$ListItr.next(LinkedList.java:886) at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313) at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292) at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560) at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74) Attempted to unregister unregistered handler with id 511 Attempted to unregister unregistered handler with id 506 Failed to abort transfer java.util.ConcurrentModificationException at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) at java.util.LinkedList$ListItr.next(LinkedList.java:886) at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313) at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292) at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560) at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74) Attempted to unregister unregistered handler with id 380 Attempted to unregister unregistered handler with id 502 Attempted to unregister unregistered handler with id 376 Attempted to unregister unregistered handler with id 226 Failed to abort transfer java.util.ConcurrentModificationException at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) at java.util.LinkedList$ListItr.next(LinkedList.java:886) at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313) at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292) at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560) at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74) Failed to abort transfer java.util.ConcurrentModificationException at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) at java.util.LinkedList$ListItr.next(LinkedList.java:886) at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313) at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292) at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560) at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74) Attempted to unregister unregistered handler with id 484 Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-66-1-1-1362869544093) Task being removed twice? java.lang.Throwable at org.globus.cog.karajan.workflow.nodes.grid.AbstractGridNode.removeTask(AbstractGridNode.java:291) at org.globus.cog.karajan.workflow.nodes.grid.AbstractGridNode.statusChanged(AbstractGridNode.java:263) at org.griphyn.vdl.karajan.lib.Execute.statusChanged(Execute.java:136) at org.globus.cog.karajan.scheduler.AbstractScheduler.fireJobStatusChangeEvent(AbstractScheduler.java:168) at org.globus.cog.karajan.scheduler.LateBindingScheduler.statusChanged(LateBindingScheduler.java:665) at org.globus.cog.karajan.scheduler.WeightedHostScoreScheduler.statusChanged(WeightedHostScoreScheduler.java:428) at org.griphyn.vdl.karajan.VDSAdaptiveScheduler.statusChanged(VDSAdaptiveScheduler.java:426) at org.globus.cog.abstraction.impl.common.task.TaskImpl.notifyListeners(TaskImpl.java:240) at org.globus.cog.abstraction.impl.common.task.TaskImpl.setStatus(TaskImpl.java:228) at org.globus.cog.abstraction.impl.execution.coaster.JobSubmissionTaskHandler.errorReceived(JobSubmissionTaskHandler.java:219) at org.globus.cog.karajan.workflow.service.commands.Command.errorReceived(Command.java:191) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:227) at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313) at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292) at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560) at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74) Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-244-1-1-1362869544091) Ex098 java.lang.NullPointerException at org.globus.cog.karajan.arguments.NamedArgumentsImpl.merge(NamedArgumentsImpl.java:52) at org.globus.cog.karajan.workflow.nodes.SequentialChoice.commitBuffers(SequentialChoice.java:46) at org.globus.cog.karajan.workflow.nodes.SequentialChoice.completed(SequentialChoice.java:40) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:194) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) at org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:98) at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Could not fail element Attempted to close nonexistent channel buffers at org.globus.cog.karajan.arguments.ArgUtil.closeBuffers(ArgUtil.java:279) at org.globus.cog.karajan.workflow.nodes.AbstractParallelIterator.closeBuffers(AbstractParallelIterator.java:107) at org.globus.cog.karajan.workflow.nodes.AbstractParallelIterator.failed(AbstractParallelIterator.java:143) at org.globus.cog.karajan.workflow.nodes.FlowNode.failImmediately(FlowNode.java:89) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:151) at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) at org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:98) at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-84-1-1-1362869544098) Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-64-1-1-1362869544095) Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-45-1-1-1362869544101) Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-79-1-1-1362869544099) Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-68-1-1-1362869544108) Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-63-1-1-1362869544103) Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-256-1-1-1362869544077) error null error null real 4m27.856s user 2m45.576s sys 0m3.697s + mv /home/wilde/.swift/runs/current/run028.1362869541 /home/wilde/.swift/runs/completed midway001$ ----- Original Message ----- > From: "Michael Wilde" > To: "Mihael Hategan" > Cc: "Swift Devel" > Sent: Saturday, March 9, 2013 5:05:25 PM > Subject: Re: [Swift-devel] Cant get auto-coasters to run from midway to beagle > > Mihael, now I think I have a coaster problem. Curiously, it always > seems to happen at about 5 mins into the run. > > Logs for these runs are on midway in eg > /home/wilde/osgdemo/modis/svn/run027 > > leading portion of error from stdout/err is below. > > - Mike > > Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified locally) > > RunID: 20130309-2252-x37dmuy0 > Progress: time: Sat, 09 Mar 2013 22:52:24 +0000 > Progress: time: Sat, 09 Mar 2013 22:52:34 +0000 Selecting site:269 > Submitting:47 Submitted:1 > Progress: time: Sat, 09 Mar 2013 22:52:42 +0000 Selecting site:269 > Stage in:1 Submitted:47 > Progress: time: Sat, 09 Mar 2013 22:52:54 +0000 Selecting site:269 > Stage in:48 > Progress: time: Sat, 09 Mar 2013 22:53:24 +0000 Selecting site:269 > Stage in:48 > Progress: time: Sat, 09 Mar 2013 22:53:54 +0000 Selecting site:269 > Stage in:48 > Progress: time: Sat, 09 Mar 2013 22:54:24 +0000 Selecting site:269 > Stage in:48 > Progress: time: Sat, 09 Mar 2013 22:54:51 +0000 Selecting site:269 > Stage in:47 Active:1 > Progress: time: Sat, 09 Mar 2013 22:54:52 +0000 Selecting site:269 > Stage in:42 Active:6 > Progress: time: Sat, 09 Mar 2013 22:54:54 +0000 Selecting site:269 > Stage in:24 Active:24 > Progress: time: Sat, 09 Mar 2013 22:54:57 +0000 Selecting site:269 > Active:47 Stage out:1 > Progress: time: Sat, 09 Mar 2013 22:54:58 +0000 Selecting site:266 > Stage in:2 Submitted:1 Active:44 Stage out:1 Finished > successfully:3 > Progress: time: Sat, 09 Mar 2013 22:55:00 +0000 Selecting site:261 > Stage in:5 Submitting:2 Submitted:1 Active:37 Stage out:3 > Finished successfully:8 > Progress: time: Sat, 09 Mar 2013 22:55:01 +0000 Selecting site:254 > Stage in:12 Submitting:3 Active:24 Stage out:8 Finished > successfully:16 > Progress: time: Sat, 09 Mar 2013 22:55:02 +0000 Selecting site:241 > Stage in:23 Submitting:5 Active:15 Stage out:4 Finished > successfully:29 > Progress: time: Sat, 09 Mar 2013 22:55:03 +0000 Selecting site:234 > Stage in:28 Submitting:7 Stage out:12 Finished successfully:36 > Progress: time: Sat, 09 Mar 2013 22:55:04 +0000 Selecting site:221 > Stage in:35 Submitting:12 Submitted:1 Finished successfully:48 > Progress: time: Sat, 09 Mar 2013 22:55:24 +0000 Selecting site:221 > Stage in:48 Finished successfully:48 > Progress: time: Sat, 09 Mar 2013 22:55:54 +0000 Selecting site:221 > Stage in:48 Finished successfully:48 > Progress: time: Sat, 09 Mar 2013 22:56:08 +0000 Selecting site:221 > Stage in:47 Active:1 Finished successfully:48 > Progress: time: Sat, 09 Mar 2013 22:56:14 +0000 Selecting site:221 > Stage in:47 Stage out:1 Finished successfully:48 > Progress: time: Sat, 09 Mar 2013 22:56:16 +0000 Selecting site:221 > Stage in:47 Finished successfully:49 > Progress: time: Sat, 09 Mar 2013 22:56:19 +0000 Selecting site:220 > Stage in:47 Submitted:1 Finished successfully:49 > Progress: time: Sat, 09 Mar 2013 22:56:24 +0000 Selecting site:220 > Stage in:48 Finished successfully:49 > Progress: time: Sat, 09 Mar 2013 22:56:29 +0000 Selecting site:220 > Stage in:47 Active:1 Finished successfully:49 > Progress: time: Sat, 09 Mar 2013 22:56:35 +0000 Selecting site:220 > Stage in:47 Stage out:1 Finished successfully:49 > Channels: > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] > -> > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60121] > -> BufferingChannel, > null at id://u-7c5b9ab4-13d515b0ca0--8000-u2ecf5a73-13d515b0cac--8000S=MetaChannel[service-60121] > -> BufferingChannel, > null at id://u2ecf5a73-13d515b0cac--7fff-u-7c5b9ab4-13d515b0ca0--7fffC=MetaChannel[https://192.5.86.107:50000] > -> > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]} > Context: service-60231 > Meta context: service-60121 > Progress: time: Sat, 09 Mar 2013 22:56:37 +0000 Selecting site:220 > Stage in:47 Finished successfully:50 > Progress: time: Sat, 09 Mar 2013 22:56:40 +0000 Selecting site:219 > Stage in:47 Submitted:1 Finished successfully:50 > Channels: > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] > -> > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60121] > -> BufferingChannel, > null at id://u-7c5b9ab4-13d515b0ca0--8000-u2ecf5a73-13d515b0cac--8000S=MetaChannel[service-60121] > -> BufferingChannel, > null at id://u2ecf5a73-13d515b0cac--7fff-u-7c5b9ab4-13d515b0ca0--7fffC=MetaChannel[https://192.5.86.107:50000] > -> > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]} > Context: service-60507 > Meta context: service-60121 > Progress: time: Sat, 09 Mar 2013 22:56:44 +0000 Selecting site:219 > Stage in:47 Active:1 Finished successfully:50 > Channels: > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] > -> > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60121] > -> BufferingChannel, > null at id://u-7c5b9ab4-13d515b0ca0--8000-u2ecf5a73-13d515b0cac--8000S=MetaChannel[service-60121] > -> BufferingChannel, > null at id://u2ecf5a73-13d515b0cac--7fff-u-7c5b9ab4-13d515b0ca0--7fffC=MetaChannel[https://192.5.86.107:50000] > -> > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]} > Context: service-60742 > Meta context: service-60121 > Progress: time: Sat, 09 Mar 2013 22:56:48 +0000 Selecting site:219 > Stage in:46 Active:2 Finished successfully:50 > Execution failed: > Exception in getlanduse: > Arguments: > [home/wilde/osgdemo/modis/svn/data/modis/2002/h16v07.rgb] > Host: beagle > Directory: > modis02-20130309-2252-x37dmuy0/jobs/d/getlanduse-d3q9ld6l > > Caused by: > Shutting down worker > getLandUse, modis02.swift, line 20 > Attempted to unregister unregistered handler with id 526 > Attempted to unregister unregistered handler with id 534 > Attempted to unregister unregistered handler with id 430 > Attempted to unregister unregistered handler with id 476 > Failed to abort transfer > java.util.ConcurrentModificationException > at > java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) > at java.util.LinkedList$ListItr.next(LinkedList.java:886) > at > org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) > at > org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) > at > org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) > at > org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) > at > org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) > at > org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) > at > org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) > at > org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313) > at > org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292) > at > org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560) > at > org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74) > Attempted to unregister unregistered handler with id 337 > Failed to abort transfer > java.util.ConcurrentModificationException > at > java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) > at java.util.LinkedList$ListItr.next(LinkedList.java:886) > at > org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) > at > org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) > at > org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) > at > org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) > at > org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) > at > org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) > at > org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) > at > org.globus.cog.abstraction.impl.execution.coaster.ServiceManager.statusChanged(ServiceManager.java:267) > at > org.globus.cog.abstraction.impl.common.task.TaskImpl.notifyListeners(TaskImpl.java:240) > at > org.globus.cog.abstraction.impl.common.task.TaskImpl.setStatus(TaskImpl.java:228) > at > org.globus.cog.abstraction.impl.common.AbstractDelegatedTaskHandler.failTask(AbstractDelegatedTaskHandler.java:57) > at > org.globus.cog.abstraction.impl.execution.local.JobSubmissionTaskHandler.run(JobSubmissionTaskHandler.java:257) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:722) > Failed to abort transfer > java.util.ConcurrentModificationException > at > java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) > at java.util.LinkedList$ListItr.next(LinkedList.java:886) > at > org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) > at > org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) > at > org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) > at > org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) > at > org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) > at > org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) > at > org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226 > > ----- Original Message ----- > > From: "Michael Wilde" > > To: "Mihael Hategan" > > Cc: "Swift Devel" > > Sent: Saturday, March 9, 2013 4:24:17 PM > > Subject: Re: [Swift-devel] Cant get auto-coasters to run from > > midway to beagle > > > > I forgot to paste the error, sorry. Its below now, fer real. When > > I > > dial down the throttle to 48 and only start 2 beagle nodes, I get > > further and the app calls make it to active state. The 317 files > > being staged in here are 17MB each. > > > > The swift progress output and error are below: > > > > RunID: 20130309-2204-qu9ck076 > > Progress: time: Sat, 09 Mar 2013 22:04:34 +0000 > > Progress: time: Sat, 09 Mar 2013 22:04:45 +0000 Submitting:316 > > Submitted:1 > > Progress: time: Sat, 09 Mar 2013 22:04:51 +0000 Stage in:1 > > Submitted:316 > > Progress: time: Sat, 09 Mar 2013 22:04:52 +0000 Stage in:25 > > Submitted:292 > > Progress: time: Sat, 09 Mar 2013 22:04:53 +0000 Stage in:68 > > Submitted:249 > > Progress: time: Sat, 09 Mar 2013 22:04:55 +0000 Stage in:113 > > Submitted:204 > > Progress: time: Sat, 09 Mar 2013 22:04:56 +0000 Stage in:165 > > Submitted:152 > > Progress: time: Sat, 09 Mar 2013 22:04:58 +0000 Stage in:177 > > Submitted:140 > > Progress: time: Sat, 09 Mar 2013 22:05:00 +0000 Stage in:225 > > Submitted:92 > > Progress: time: Sat, 09 Mar 2013 22:05:04 +0000 Stage in:241 > > Submitted:76 > > Progress: time: Sat, 09 Mar 2013 22:05:05 +0000 Stage in:289 > > Submitted:28 > > Progress: time: Sat, 09 Mar 2013 22:05:09 +0000 Stage in:305 > > Submitted:12 > > Progress: time: Sat, 09 Mar 2013 22:05:34 +0000 Stage in:317 > > Progress: time: Sat, 09 Mar 2013 22:06:04 +0000 Stage in:317 > > Progress: time: Sat, 09 Mar 2013 22:06:34 +0000 Stage in:317 > > Progress: time: Sat, 09 Mar 2013 22:07:04 +0000 Stage in:317 > > Progress: time: Sat, 09 Mar 2013 22:07:34 +0000 Stage in:317 > > Progress: time: Sat, 09 Mar 2013 22:08:04 +0000 Stage in:317 > > Progress: time: Sat, 09 Mar 2013 22:08:34 +0000 Stage in:317 > > Channels: > > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] > > -> > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60640] > > -> BufferingChannel, > > null at id://u-23c37c02-13d512f435d--7fff-u66598f98-13d512f434d--7fffC=MetaChannel[https://192.5.86.107:50000] > > -> > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > null at id://u66598f98-13d512f434d--8000-u-23c37c02-13d512f435d--8000S=MetaChannel[service-60640] > > -> BufferingChannel} > > Context: service-60822 > > Meta context: service-60640 > > Channels: > > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] > > -> > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60640] > > -> BufferingChannel, > > null at id://u-23c37c02-13d512f435d--7fff-u66598f98-13d512f434d--7fffC=MetaChannel[https://192.5.86.107:50000] > > -> > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > null at id://u66598f98-13d512f434d--8000-u-23c37c02-13d512f435d--8000S=MetaChannel[service-60640] > > -> BufferingChannel} > > Context: service-60116 > > Meta context: service-60640 > > Channels: > > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] > > -> > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60640] > > -> BufferingChannel, > > null at id://u-23c37c02-13d512f435d--7fff-u66598f98-13d512f434d--7fffC=MetaChannel[https://192.5.86.107:50000] > > -> > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > null at id://u66598f98-13d512f434d--8000-u-23c37c02-13d512f435d--8000S=MetaChannel[service-60640] > > -> BufferingChannel} > > Context: service-60598 > > Meta context: service-60640 > > Progress: time: Sat, 09 Mar 2013 22:09:04 +0000 Stage in:317 > > Progress: time: Sat, 09 Mar 2013 22:09:08 +0000 Stage in:316 > > Active:1 > > Execution failed: > > Exception in getlanduse: > > Arguments: > > [home/wilde/osgdemo/modis/svn/data/modis/2002/h15v02.rgb] > > Host: beagle > > Directory: > > modis02-20130309-2204-qu9ck076/jobs/b/getlanduse-bmscjd6l > > > > Caused by: > > Shutting down worker > > getLandUse, modis02.swift, line 20 > > error null > > > > real 4m36.777s > > user 2m55.240s > > sys 0m3.837s > > > > > > --- > > > > With a throttle of 48 (.47) and 2 beagle nodes, I see: > > > > Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified locally) > > > > RunID: 20130309-2214-1oi3rvea > > Progress: time: Sat, 09 Mar 2013 22:14:06 +0000 > > Progress: time: Sat, 09 Mar 2013 22:14:17 +0000 Selecting > > site:269 > > Submitting:47 Submitted:1 > > Progress: time: Sat, 09 Mar 2013 22:14:22 +0000 Selecting > > site:269 > > Stage in:1 Submitted:47 > > Progress: time: Sat, 09 Mar 2013 22:14:28 +0000 Selecting > > site:269 > > Stage in:25 Submitted:23 > > Progress: time: Sat, 09 Mar 2013 22:14:36 +0000 Selecting > > site:269 > > Stage in:48 > > Progress: time: Sat, 09 Mar 2013 22:15:06 +0000 Selecting > > site:269 > > Stage in:48 > > Progress: time: Sat, 09 Mar 2013 22:15:36 +0000 Selecting > > site:269 > > Stage in:48 > > Progress: time: Sat, 09 Mar 2013 22:16:06 +0000 Selecting > > site:269 > > Stage in:48 > > Progress: time: Sat, 09 Mar 2013 22:16:26 +0000 Selecting > > site:269 > > Stage in:47 Active:1 > > Progress: time: Sat, 09 Mar 2013 22:16:27 +0000 Selecting > > site:269 > > Stage in:36 Active:12 > > Progress: time: Sat, 09 Mar 2013 22:16:29 +0000 Selecting > > site:269 > > Stage in:24 Active:24 > > Progress: time: Sat, 09 Mar 2013 22:16:34 +0000 Selecting > > site:269 > > Stage in:24 Active:23 Stage out:1 > > Progress: time: Sat, 09 Mar 2013 22:16:35 +0000 Selecting > > site:269 > > Stage in:14 Active:33 Stage out:1 > > Execution failed: > > Exception in getlanduse: > > Arguments: > > [home/wilde/osgdemo/modis/svn/data/modis/2002/h08v04.rgb] > > Host: beagle > > Directory: > > modis02-20130309-2214-1oi3rvea/jobs/k/getlanduse-ko5qjd6l > > > > Caused by: > > Application /lustre/beagle/davidk/modis/bin/getlanduse.sh failed > > with an exit code of 1 > > getLandUse, modis02.swift, line 20 > > > > real 2m31.463s > > user 1m33.238s > > sys 0m2.160s > > + mv /home/wilde/.swift/runs/current/run024.1362867244 > > /home/wilde/.swift/runs/completed > > > > This error is likely in the demo app code; just pasting here to > > show > > that with less concurrency it makes progress. > > > > ----- Original Message ----- > > > From: "Michael Wilde" > > > To: "Mihael Hategan" > > > Cc: "Swift Devel" > > > Sent: Saturday, March 9, 2013 4:11:24 PM > > > Subject: Re: [Swift-devel] Cant get auto-coasters to run from > > > midway to beagle > > > > > > Now Im getting the error below (from running 317 simple MODIS > > > apps > > > concurrently). Im going to dial down the throttle first to see > > > if > > > the staging load is overwhelming either coasters or the > > > midway-beagle path. > > > > > > - Mike > > > > > > > > > ----- Original Message ----- > > > > From: "Michael Wilde" > > > > To: "Mihael Hategan" > > > > Cc: "Swift Devel" > > > > Sent: Saturday, March 9, 2013 3:59:22 PM > > > > Subject: Re: [Swift-devel] Cant get auto-coasters to run from > > > > midway to beagle > > > > > > > > I think we just got this working. Problems may have included > > > > the > > > > need > > > > to pre-create the workdirectory and to specify a dotted IP > > > > address > > > > on the external network for GLOBUS_HOSTNAME. Will need to > > > > experiment. So likely that proxy expiration time was not a > > > > problem > > > > (although its confusing). > > > > > > > > Will report back on this once the needed steps are clear. > > > > > > > > Thanks, > > > > > > > > - Mike > > > > > > > > ----- Original Message ----- > > > > > From: "Mihael Hategan" > > > > > To: "Michael Wilde" > > > > > Cc: "Swift Devel" > > > > > Sent: Saturday, March 9, 2013 3:56:36 PM > > > > > Subject: Re: Cant get auto-coasters to run from midway to > > > > > beagle > > > > > > > > > > Can you post ,globus/coasters/coaster.log from beagle? > > > > > > > > > > On Sat, 2013-03-09 at 15:46 -0600, Michael Wilde wrote: > > > > > > Mihael, can you advise on this problem? > > > > > > > > > > > > David and I are trying to run automatic coaster jobs from > > > > > > midway > > > > > > login hosts and swift.rcc to beagle using ssh-cl:pbs. > > > > > > > > > > > > My failed attempts are on midway under > > > > > > /home/wilde/osgdemo/modis/svn, see eg run020 (which has > > > > > > complete > > > > > > logs). > > > > > > > > > > > > Quick question about the proxy files that get copied. Does > > > > > > this > > > > > > look OK? : > > > > > > > > > > > > 2013-03-09 21:24:46,895+0000 INFO AutoCA Checking > > > > > > certificate > > > > > > /home/wilde/.globus/coasters/proxy.0.pem > > > > > > 2013-03-09 21:24:46,967+0000 INFO AutoCA Using certificate > > > > > > /home/wilde/.globus/coasters/proxy.0.pem with expiration > > > > > > date > > > > > > Sat > > > > > > Mar 23\ > > > > > > 19:25:53 GMT 2013 > > > > > > > > > > > > The proxy expiration time listed above is two hours > > > > > > *earlier* > > > > > > than > > > > > > the current time (as seen in the message's UTC timestamp). > > > > > > Is > > > > > > that correct, or a possible cause of this problem? > > > > > > > > > > > > The main symptom seems to be this: > > > > > > > > > > > > Execution failed: > > > > > > Exception in getlanduse: > > > > > > Arguments: [../data/modis/2002/h00v09.rgb] > > > > > > Host: beagle > > > > > > Directory: > > > > > > modis01-20130309-2124-7ua3bde3/jobs/d/getlanduse-d24rhd6l > > > > > > > > > > > > Caused by: > > > > > > Could not submit job > > > > > > Caused by: > > > > > > Could not start coaster service > > > > > > Caused by: > > > > > > Task ended before registration was received. > > > > > > Failed to download bootstrap jar from > > > > > > http://midway001.rcc.uchicago.edu:50001 > > > > > > --- > > > > > > > > > > > > Yet Ive verified that midway login4 (which is the target > > > > > > system) > > > > > > can connect to this hostname and port (with nc -l and > > > > > > telnet) > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel From wilde at mcs.anl.gov Sat Mar 9 17:49:41 2013 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 9 Mar 2013 17:49:41 -0600 (CST) Subject: [Swift-devel] Cant get auto-coasters to run from midway to beagle In-Reply-To: <1003631403.1232200.1362871309165.JavaMail.root@mcs.anl.gov> Message-ID: <1737988661.1232489.1362872981749.JavaMail.root@mcs.anl.gov> An update on this provider staging related issue: reducing filesize from 17MB to 600KB runs well. So seems like some kind of flow control or buffer management problem, possibly? May need to take that problem offline - would be a perfect test case for Yadu to develop a new stress test for. - Mike ----- Forwarded Message ----- From: "Michael Wilde" To: "David Kelly" Sent: Saturday, March 9, 2013 5:21:49 PM Subject: Re: runs for OSG talk OK, much better: with 600K files (5x5 reduction or 25X smaller) it works well, and fast (form midway to beagle!) Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified locally) RunID: 20130309-2319-5zq0jrfg Progress: time: Sat, 09 Mar 2013 23:19:45 +0000 Progress: time: Sat, 09 Mar 2013 23:19:56 +0000 Selecting site:269 Submitting:47 Submitted:1 Progress: time: Sat, 09 Mar 2013 23:20:05 +0000 Selecting site:269 Stage in:1 Submitted:47 Progress: time: Sat, 09 Mar 2013 23:20:09 +0000 Selecting site:269 Stage in:47 Active:1 Progress: time: Sat, 09 Mar 2013 23:20:10 +0000 Selecting site:269 Stage in:46 Active:1 Stage out:1 Progress: time: Sat, 09 Mar 2013 23:20:11 +0000 Selecting site:250 Stage in:19 Active:28 Stage out:1 Finished successfully:19 Progress: time: Sat, 09 Mar 2013 23:20:12 +0000 Selecting site:229 Stage in:18 Submitting:21 Active:1 Stage out:7 Finished successfully:41 Progress: time: Sat, 09 Mar 2013 23:20:13 +0000 Selecting site:220 Stage in:41 Submitting:1 Active:5 Stage out:1 Finished successfully:49 Progress: time: Sat, 09 Mar 2013 23:20:14 +0000 Selecting site:220 Stage in:38 Active:1 Stage out:9 Finished successfully:49 Progress: time: Sat, 09 Mar 2013 23:20:15 +0000 Selecting site:212 Stage in:30 Submitting:8 Stage out:9 Finished successfully:58 Progress: time: Sat, 09 Mar 2013 23:20:16 +0000 Selecting site:203 Stage in:38 Submitting:8 Submitted:1 Finished successfully:67 Progress: time: Sat, 09 Mar 2013 23:20:18 +0000 Selecting site:202 Stage in:19 Stage out:28 Finished successfully:68 Progress: time: Sat, 09 Mar 2013 23:20:19 +0000 Selecting site:172 Stage in:33 Submitting:2 Submitted:6 Active:5 Stage out:2 Finished successfully:97 Progress: time: Sat, 09 Mar 2013 23:20:20 +0000 Selecting site:170 Stage in:31 Submitting:2 Stage out:14 Finished successfully:100 Progress: time: Sat, 09 Mar 2013 23:20:21 +0000 Selecting site:162 Stage in:30 Submitting:10 Stage out:6 Finished successfully:109 Progress: time: Sat, 09 Mar 2013 23:20:22 +0000 Selecting site:154 Stage in:39 Submitting:5 Submitted:3 Active:1 Finished successfully:115 Progress: time: Sat, 09 Mar 2013 23:20:23 +0000 Selecting site:154 Stage in:21 Active:10 Stage out:16 Finished successfully:116 Progress: time: Sat, 09 Mar 2013 23:20:24 +0000 Selecting site:126 Stage in:20 Submitting:25 Submitted:1 Stage out:2 Finished successfully:143 Progress: time: Sat, 09 Mar 2013 23:20:25 +0000 Selecting site:124 Stage in:31 Active:2 Stage out:15 Finished successfully:145 Progress: time: Sat, 09 Mar 2013 23:20:26 +0000 Selecting site:110 Stage in:30 Submitting:14 Stage out:3 Finished successfully:160 Progress: time: Sat, 09 Mar 2013 23:20:27 +0000 Selecting site:106 Stage in:43 Submitting:1 Submitted:1 Active:1 Stage out:2 Finished successfully:163 Progress: time: Sat, 09 Mar 2013 23:20:28 +0000 Selecting site:104 Stage in:20 Submitting:2 Active:7 Stage out:19 Finished successfully:165 Progress: time: Sat, 09 Mar 2013 23:20:29 +0000 Selecting site:78 Stage in:29 Submitting:16 Submitted:1 Stage out:2 Finished successfully:191 Progress: time: Sat, 09 Mar 2013 23:20:31 +0000 Selecting site:76 Stage in:30 Stage out:17 Finished successfully:194 Progress: time: Sat, 09 Mar 2013 23:20:32 +0000 Selecting site:58 Stage in:29 Submitting:18 Active:1 Finished successfully:211 Progress: time: Sat, 09 Mar 2013 23:20:33 +0000 Selecting site:58 Stage in:33 Active:3 Stage out:12 Finished successfully:211 Progress: time: Sat, 09 Mar 2013 23:20:34 +0000 Selecting site:46 Stage in:18 Submitting:11 Submitted:1 Active:2 Stage out:14 Finished successfully:225 Progress: time: Sat, 09 Mar 2013 23:20:35 +0000 Selecting site:30 Stage in:29 Active:14 Stage out:3 Finished successfully:241 Progress: time: Sat, 09 Mar 2013 23:20:36 +0000 Selecting site:28 Stage in:28 Submitting:2 Stage out:17 Finished successfully:242 Progress: time: Sat, 09 Mar 2013 23:20:37 +0000 Selecting site:10 Stage in:30 Submitting:17 Submitted:1 Finished successfully:259 Progress: time: Sat, 09 Mar 2013 23:20:38 +0000 Selecting site:10 Stage in:35 Stage out:13 Finished successfully:259 Progress: time: Sat, 09 Mar 2013 23:20:39 +0000 Stage in:21 Submitting:6 Submitted:3 Stage out:15 Finished successfully:272 Progress: time: Sat, 09 Mar 2013 23:20:40 +0000 Stage in:10 Active:5 Stage out:14 Finished successfully:288 Final status: Sat, 09 Mar 2013 23:20:41 +0000 Finished successfully:317 real 0m58.953s user 0m32.573s sys 0m1.263s + mv /home/wilde/.swift/runs/current/run029.1362871183 /home/wilde/.swift/runs/completed midway001$ ----- Original Message ----- > From: "David Kelly" > To: "Michael Wilde" > Sent: Saturday, March 9, 2013 5:12:59 PM > Subject: Re: runs for OSG talk > > > Yep - I had a version where the input files were in a very similar > format (PGM, 1 byte per pixel). I'll add that back, but without the > small PGM header in the files. > > ----- Original Message ----- > > > From: "Michael Wilde" > To: "David Kelly" > Sent: Saturday, March 9, 2013 5:04:43 PM > Subject: Re: runs for OSG talk > > I think we need to cut down the size of these files for a demo > (although they are great for a stress test). > > First, the RGB format by itself uses 3 bytes per pixel when it only > needs one (for land use) > > Second, we should cut down by a factor of 9 (3x3) or 16 (4x4). > > I tried that using simple convert statements, but it always seems to > yield a file exactly double what it should be. > > More on this later; was hoping to get things working "as is" first. > > I assume you could get the perl code to work on one-byte-per-pixel > instead of the default 3 for the convert rgb format? > > - Mike > > ----- Original Message ----- > > From: "David Kelly" > > To: "Michael Wilde" > > Sent: Saturday, March 9, 2013 4:36:30 PM > > Subject: Re: runs for OSG talk > > > > > > That would probably be a good idea for a new script, to show how to > > stage apps like that. For now I updated the scripts on lustre.. > > hopefully that helps. > > > > ----- Original Message ----- > > > > > > From: "Michael Wilde" > > To: "David Kelly" > > Sent: Saturday, March 9, 2013 4:29:14 PM > > Subject: Re: runs for OSG talk > > > > OK, I see that its trying to run getlanduse.sh from your /lustre > > dir > > on beagle, which is different than the one Ive got checked out. It > > seems to get an error in a stderr redirect??? Let me se what I need > > to do to get the beagle side in sync. > > > > Seems like since these are perl scripts, we should make the app() > > /bin/sh and send the script as data, perhaps? > > > > - Mike > > > > ----- Original Message ----- > > > From: "Michael Wilde" > > > To: "David Kelly" > > > Sent: Saturday, March 9, 2013 4:19:31 PM > > > Subject: Re: runs for OSG talk > > > > > > OK, making progress. Now I dialed down the throttle and node > > > counts > > > to 48 jobs. > > > > > > Now I get further, for ./demo and site=4 script=2: > > > > > > RunID: 20130309-2214-1oi3rvea > > > Progress: time: Sat, 09 Mar 2013 22:14:06 +0000 > > > Progress: time: Sat, 09 Mar 2013 22:14:17 +0000 Selecting > > > site:269 > > > Submitting:47 Submitted:1 > > > Progress: time: Sat, 09 Mar 2013 22:14:22 +0000 Selecting > > > site:269 > > > Stage in:1 Submitted:47 > > > Progress: time: Sat, 09 Mar 2013 22:14:28 +0000 Selecting > > > site:269 > > > Stage in:25 Submitted:23 > > > Progress: time: Sat, 09 Mar 2013 22:14:36 +0000 Selecting > > > site:269 > > > Stage in:48 > > > Progress: time: Sat, 09 Mar 2013 22:15:06 +0000 Selecting > > > site:269 > > > Stage in:48 > > > Progress: time: Sat, 09 Mar 2013 22:15:36 +0000 Selecting > > > site:269 > > > Stage in:48 > > > Progress: time: Sat, 09 Mar 2013 22:16:06 +0000 Selecting > > > site:269 > > > Stage in:48 > > > Progress: time: Sat, 09 Mar 2013 22:16:26 +0000 Selecting > > > site:269 > > > Stage in:47 Active:1 > > > Progress: time: Sat, 09 Mar 2013 22:16:27 +0000 Selecting > > > site:269 > > > Stage in:36 Active:12 > > > Progress: time: Sat, 09 Mar 2013 22:16:29 +0000 Selecting > > > site:269 > > > Stage in:24 Active:24 > > > Progress: time: Sat, 09 Mar 2013 22:16:34 +0000 Selecting > > > site:269 > > > Stage in:24 Active:23 Stage out:1 > > > Progress: time: Sat, 09 Mar 2013 22:16:35 +0000 Selecting > > > site:269 > > > Stage in:14 Active:33 Stage out:1 > > > Execution failed: > > > Exception in getlanduse: > > > Arguments: > > > [home/wilde/osgdemo/modis/svn/data/modis/2002/h08v04.rgb] > > > Host: beagle > > > Directory: > > > modis02-20130309-2214-1oi3rvea/jobs/k/getlanduse-ko5qjd6l > > > > > > Caused by: > > > Application /lustre/beagle/davidk/modis/bin/getlanduse.sh failed > > > with an exit code of 1 > > > getLandUse, modis02.swift, line 20 > > > > > > real 2m31.463s > > > user 1m33.238s > > > sys 0m2.160s > > > + mv /home/wilde/.swift/runs/current/run024.1362867244 > > > /home/wilde/.swift/runs/completed > > > midway001$ > > > > > > > > > ----- Original Message ----- > > > > From: "David Kelly" > > > > To: "Michael Wilde" > > > > Sent: Saturday, March 9, 2013 3:55:30 PM > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > ok, I'll take a look at that. The run dir I used was > > > > /scratch/midway/davidkelly999/modis/run011 > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > From: "Michael Wilde" > > > > To: "David Kelly" > > > > Sent: Saturday, March 9, 2013 3:52:28 PM > > > > Subject: Re: runs for OSG talk > > > > > > > > I just tried this, but didnt work - same prob. > > > > > > > > But if its working for you now, we must be close. > > > > > > > > Not yet sure what the diff is... > > > > > > > > My run dir is /home/wilde/osgdemo/modis/svn/run021 > > > > > > > > - Mike > > > > > > > > ----- Original Message ----- > > > > > From: "David Kelly" > > > > > To: "Michael Wilde" > > > > > Sent: Saturday, March 9, 2013 3:46:13 PM > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > Had to make sure I was using the IP address on eth4 > > > > > (128.135.112.71 > > > > > for midway-login1), not a local address or an infiniband > > > > > address. > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > From: "David Kelly" > > > > > To: "Michael Wilde" > > > > > Sent: Saturday, March 9, 2013 3:43:51 PM > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > I just got it working. I had to adjust for the differences in > > > > > my > > > > > username on Beagle/Midway, then I had to set GLOBUS_HOSTNAME > > > > > on > > > > > Midway to the IP address, rather than the full hostname > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > To: "David Kelly" > > > > > Sent: Saturday, March 9, 2013 3:40:03 PM > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > From: "David Kelly" > > > > > > To: "Michael Wilde" > > > > > > Sent: Saturday, March 9, 2013 3:34:58 PM > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > Is your username the same on beagle and midway? > > > > > > > > > > Yes. And I verified that I can ssh to login4 on beagle from > > > > > my > > > > > midway > > > > > session (as indeed the scp's of the proxy files seem to be > > > > > working) > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > To: "David Kelly" > > > > > > Sent: Saturday, March 9, 2013 3:34:28 PM > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > OK. > > > > > > > > > > > > Ignore what I said about "problem finding java" - thats > > > > > > code > > > > > > in > > > > > > the > > > > > > very long escaped shell command that gets sent to the > > > > > > remote > > > > > > side. > > > > > > I > > > > > > dont *think* thats the problem. > > > > > > > > > > > > I also verified that beagle can connect to ports 50001 etc > > > > > > on > > > > > > swift.rcc, and that seems OK. > > > > > > > > > > > > I exported GLOBUS_HOSTNAME=midway001.rcc.uchicago.edu on > > > > > > the > > > > > > midway > > > > > > side. And the beagle side seems to be connecting there. > > > > > > > > > > > > Im a bit confused about the timestamps I see for the proxy > > > > > > expiration > > > > > > time, but am not yet suspicious of that (although it seems > > > > > > less > > > > > > than > > > > > > 5 hours past GMT... not sure.) > > > > > > > > > > > > - Mike > > > > > > > > > > > > ----- Original Message ----- > > > > > > > From: "David Kelly" > > > > > > > To: "Michael Wilde" > > > > > > > Sent: Saturday, March 9, 2013 3:26:32 PM > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > I'm seeing the same error now.. looking into it > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > To: "David Kelly" > > > > > > > Sent: Saturday, March 9, 2013 3:21:30 PM > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > Looking deeper I see that the logs show problems with > > > > > > > finding > > > > > > > Java, > > > > > > > I > > > > > > > assume on beagle, ans also service ending (presumably > > > > > > > coaster > > > > > > > service on midway host). > > > > > > > > > > > > > > I'll dig into these two. > > > > > > > > > > > > > > I see that it scp's the proxies to beagle which I think > > > > > > > answers > > > > > > > my > > > > > > > question about security. > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > From: "Michael Wilde" > > > > > > > > To: "David Kelly" > > > > > > > > Sent: Saturday, March 9, 2013 3:15:01 PM > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > OK. Any thoughts about beagle? > > > > > > > > > > > > > > > > Ive been experimenting but still cant get it to work, > > > > > > > > same > > > > > > > > error > > > > > > > > (cant connect to bootstrap port) > > > > > > > > > > > > > > > > WHen you tried ssh-cl to beagle with automatic > > > > > > > > coasters, > > > > > > > > what > > > > > > > > configuration (sites env etc) did you use? > > > > > > > > > > > > > > > > I verified that beagle can connect back to the midway > > > > > > > > hosts > > > > > > > > and > > > > > > > > ports. > > > > > > > > > > > > > > > > Do we need to specify security or create a proxy etc? > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > From: "David Kelly" > > > > > > > > > To: "Michael Wilde" > > > > > > > > > Sent: Saturday, March 9, 2013 3:08:58 PM > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > One way you can override/customize the default > > > > > > > > > templates > > > > > > > > > is > > > > > > > > > to > > > > > > > > > create > > > > > > > > > them in $HOME/.swift/sites (I'm not sure if that's > > > > > > > > > what > > > > > > > > > you > > > > > > > > > mean > > > > > > > > > by > > > > > > > > > a local sites dir or not). But you are right about > > > > > > > > > Midway > > > > > > > > > - > > > > > > > > > I > > > > > > > > > have > > > > > > > > > noticed that when using modis it will sometimes get > > > > > > > > > stuck > > > > > > > > > when > > > > > > > > > it > > > > > > > > > goes to a queue that is busy. Ideally swift > > > > > > > > > replication > > > > > > > > > would > > > > > > > > > be > > > > > > > > > able to help better handle that, but I haven't had > > > > > > > > > much > > > > > > > > > luck > > > > > > > > > with > > > > > > > > > that yet. Another way around this may be to add this > > > > > > > > > to > > > > > > > > > the > > > > > > > > > template: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > key="slurm.exclusive">false > > > > > > > > > > > > > > > > > > > > > > > > > > > The swift.log issue was never fixed. It went to > > > > > > > > > swift-devel > > > > > > > > > for > > > > > > > > > discussion but was never fixed. I think it is > > > > > > > > > relatively > > > > > > > > > simple > > > > > > > > > though.. probably worth fixing before release. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > To: "David Kelly" > > > > > > > > > Sent: Saturday, March 9, 2013 1:38:47 PM > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > OK, sounds good re the trip plan. Feel free to stay > > > > > > > > > Tue > > > > > > > > > night > > > > > > > > > to > > > > > > > > > avoid a 4hr drive after a long day. > > > > > > > > > > > > > > > > > > Im trying the modis demo. > > > > > > > > > > > > > > > > > > I tried to create a local sites/ dir so I can modify > > > > > > > > > the > > > > > > > > > sites > > > > > > > > > templates; thats not working for me either yet. > > > > > > > > > > > > > > > > > > For midway, need to force to westmere or sandyb (but > > > > > > > > > not > > > > > > > > > both) > > > > > > > > > and > > > > > > > > > ensure 1-node jobs, because either queue can get > > > > > > > > > filled > > > > > > > > > and > > > > > > > > > not > > > > > > > > > yield an idle node for a long time. maybe need to > > > > > > > > > fiddle > > > > > > > > > jobsPerNode > > > > > > > > > to get at least 1 core when the system is busy and > > > > > > > > > *pretend* > > > > > > > > > that > > > > > > > > > its a node. > > > > > > > > > > > > > > > > > > So to get response I tried beagle-ssh; That isnt > > > > > > > > > working > > > > > > > > > because > > > > > > > > > the > > > > > > > > > template sites file is wrong in swift 0.94 rc4. > > > > > > > > > > > > > > > > > > I also see that swift.log is still getting produced - > > > > > > > > > I > > > > > > > > > thought > > > > > > > > > we > > > > > > > > > eliminated that. Did it come back due to a problem > > > > > > > > > with > > > > > > > > > that > > > > > > > > > fix? > > > > > > > > > > > > > > > > > > I'll keep hacking; suggestions welcome. > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > From: "David Kelly" > > > > > > > > > > To: "Michael Wilde" > > > > > > > > > > Sent: Saturday, March 9, 2013 12:20:00 PM > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Mike, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Looking more closely at the agenda, I think the > > > > > > > > > > most > > > > > > > > > > interesting/useful talks will be on Tuesday. Monday > > > > > > > > > > I'll > > > > > > > > > > come > > > > > > > > > > to > > > > > > > > > > Argonne to work on any loose ends and put the > > > > > > > > > > finishing > > > > > > > > > > touches > > > > > > > > > > on > > > > > > > > > > any slides/runs/scripts, then drive to Indianapolis > > > > > > > > > > on > > > > > > > > > > Monday > > > > > > > > > > afternoon/evening. I have a hotel booked for Monday > > > > > > > > > > night. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'll do some runs using the routes we talked about. > > > > > > > > > > I'm > > > > > > > > > > pretty > > > > > > > > > > sure > > > > > > > > > > I > > > > > > > > > > have working configurations for everything we > > > > > > > > > > talked > > > > > > > > > > about, > > > > > > > > > > so > > > > > > > > > > I > > > > > > > > > > think it's really just a matter of plugging in the > > > > > > > > > > apps. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > David > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > > To: "David Kelly" > > > > > > > > > > Sent: Saturday, March 9, 2013 11:03:15 AM > > > > > > > > > > Subject: runs for OSG talk > > > > > > > > > > > > > > > > > > > > Hi David, > > > > > > > > > > > > > > > > > > > > I just wanted to let you know that Im looking into > > > > > > > > > > the > > > > > > > > > > run > > > > > > > > > > options > > > > > > > > > > now. Im hoping to try a few... WIll see how much > > > > > > > > > > help > > > > > > > > > > I > > > > > > > > > > need. > > > > > > > > > > Have > > > > > > > > > > you decided on a driving time and made hotel > > > > > > > > > > arrangements? > > > > > > > > > > > > > > > > > > > > I would feel free to stay for whatever portion of > > > > > > > > > > the > > > > > > > > > > OSG > > > > > > > > > > meeting > > > > > > > > > > you > > > > > > > > > > feel is of value. The only thing I ask is that for > > > > > > > > > > Wed > > > > > > > > > > and > > > > > > > > > > Thu > > > > > > > > > > you > > > > > > > > > > stay available online for user-support or other > > > > > > > > > > assistance > > > > > > > > > > needs > > > > > > > > > > that come up here. And that you engage with people > > > > > > > > > > that > > > > > > > > > > can > > > > > > > > > > help > > > > > > > > > > us > > > > > > > > > > develop the Swift user community and reliable OSG > > > > > > > > > > usage. > > > > > > > > > > Rob, > > > > > > > > > > Marco, > > > > > > > > > > Lincoln, and Suchandra would be good to hang out > > > > > > > > > > with > > > > > > > > > > and > > > > > > > > > > they > > > > > > > > > > can > > > > > > > > > > introduce you to good contacts. > > > > > > > > > > > > > > > > > > > > Of course we will cover your expenses via a > > > > > > > > > > UChicago > > > > > > > > > > travel > > > > > > > > > > expense > > > > > > > > > > report. > > > > > > > > > > > > > > > > > > > > We'll be starting a project with a tiny bit of > > > > > > > > > > additional > > > > > > > > > > ExTENCI > > > > > > > > > > funds to make Swift do smarter data management on > > > > > > > > > > OSG > > > > > > > > > > sites > > > > > > > > > > (and > > > > > > > > > > in > > > > > > > > > > general) so anything you learn about OSG storage > > > > > > > > > > elements/services/tools will be valuable for that > > > > > > > > > > (srmcp, > > > > > > > > > > lcgcp, > > > > > > > > > > etc). > > > > > > > > > > > > > > > > > > > > Between now and your talk, lets just focus on the > > > > > > > > > > talk, > > > > > > > > > > OK? > > > > > > > > > > Im > > > > > > > > > > hoping > > > > > > > > > > we have slides frozen by Monday. > > > > > > > > > > > > > > > > > > > > While I fiddle, if you could do catsn or other > > > > > > > > > > hello-world-like > > > > > > > > > > tests > > > > > > > > > > to cover the "routes" we discussed, that would pave > > > > > > > > > > the > > > > > > > > > > way > > > > > > > > > > for > > > > > > > > > > plugging in the real app examples. > > > > > > > > > > > > > > > > > > > > Sound good? Let me know of any concerns (other than > > > > > > > > > > the > > > > > > > > > > fact > > > > > > > > > > that > > > > > > > > > > this is a tad rushed ;) > > > > > > > > > > > > > > > > > > > > Thanks and regards, > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > Michael Wilde > > > > > > > > > > Computation Institute, University of Chicago > > > > > > > > > > Mathematics and Computer Science Division > > > > > > > > > > Argonne National Laboratory > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From hategan at mcs.anl.gov Sat Mar 9 19:43:29 2013 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 09 Mar 2013 17:43:29 -0800 Subject: [Swift-devel] Cant get auto-coasters to run from midway to beagle In-Reply-To: <1737988661.1232489.1362872981749.JavaMail.root@mcs.anl.gov> References: <1737988661.1232489.1362872981749.JavaMail.root@mcs.anl.gov> Message-ID: <1362879809.26464.1.camel@echo> I noticed some random weirdness due to the fact that the coaster service runs with the ibm jdk. I'll run some tests with both and see what happens. Mihael On Sat, 2013-03-09 at 17:49 -0600, Michael Wilde wrote: > An update on this provider staging related issue: reducing filesize from 17MB to 600KB runs well. > > So seems like some kind of flow control or buffer management problem, possibly? > > May need to take that problem offline - would be a perfect test case for Yadu to develop a new stress test for. > > - Mike > > > ----- Forwarded Message ----- > From: "Michael Wilde" > To: "David Kelly" > Sent: Saturday, March 9, 2013 5:21:49 PM > Subject: Re: runs for OSG talk > > OK, much better: with 600K files (5x5 reduction or 25X smaller) it works well, and fast (form midway to beagle!) > > Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified locally) > > RunID: 20130309-2319-5zq0jrfg > Progress: time: Sat, 09 Mar 2013 23:19:45 +0000 > Progress: time: Sat, 09 Mar 2013 23:19:56 +0000 Selecting site:269 Submitting:47 Submitted:1 > Progress: time: Sat, 09 Mar 2013 23:20:05 +0000 Selecting site:269 Stage in:1 Submitted:47 > Progress: time: Sat, 09 Mar 2013 23:20:09 +0000 Selecting site:269 Stage in:47 Active:1 > Progress: time: Sat, 09 Mar 2013 23:20:10 +0000 Selecting site:269 Stage in:46 Active:1 Stage out:1 > Progress: time: Sat, 09 Mar 2013 23:20:11 +0000 Selecting site:250 Stage in:19 Active:28 Stage out:1 Finished successfully:19 > Progress: time: Sat, 09 Mar 2013 23:20:12 +0000 Selecting site:229 Stage in:18 Submitting:21 Active:1 Stage out:7 Finished successfully:41 > Progress: time: Sat, 09 Mar 2013 23:20:13 +0000 Selecting site:220 Stage in:41 Submitting:1 Active:5 Stage out:1 Finished successfully:49 > Progress: time: Sat, 09 Mar 2013 23:20:14 +0000 Selecting site:220 Stage in:38 Active:1 Stage out:9 Finished successfully:49 > Progress: time: Sat, 09 Mar 2013 23:20:15 +0000 Selecting site:212 Stage in:30 Submitting:8 Stage out:9 Finished successfully:58 > Progress: time: Sat, 09 Mar 2013 23:20:16 +0000 Selecting site:203 Stage in:38 Submitting:8 Submitted:1 Finished successfully:67 > Progress: time: Sat, 09 Mar 2013 23:20:18 +0000 Selecting site:202 Stage in:19 Stage out:28 Finished successfully:68 > Progress: time: Sat, 09 Mar 2013 23:20:19 +0000 Selecting site:172 Stage in:33 Submitting:2 Submitted:6 Active:5 Stage out:2 Finished successfully:97 > Progress: time: Sat, 09 Mar 2013 23:20:20 +0000 Selecting site:170 Stage in:31 Submitting:2 Stage out:14 Finished successfully:100 > Progress: time: Sat, 09 Mar 2013 23:20:21 +0000 Selecting site:162 Stage in:30 Submitting:10 Stage out:6 Finished successfully:109 > Progress: time: Sat, 09 Mar 2013 23:20:22 +0000 Selecting site:154 Stage in:39 Submitting:5 Submitted:3 Active:1 Finished successfully:115 > Progress: time: Sat, 09 Mar 2013 23:20:23 +0000 Selecting site:154 Stage in:21 Active:10 Stage out:16 Finished successfully:116 > Progress: time: Sat, 09 Mar 2013 23:20:24 +0000 Selecting site:126 Stage in:20 Submitting:25 Submitted:1 Stage out:2 Finished successfully:143 > Progress: time: Sat, 09 Mar 2013 23:20:25 +0000 Selecting site:124 Stage in:31 Active:2 Stage out:15 Finished successfully:145 > Progress: time: Sat, 09 Mar 2013 23:20:26 +0000 Selecting site:110 Stage in:30 Submitting:14 Stage out:3 Finished successfully:160 > Progress: time: Sat, 09 Mar 2013 23:20:27 +0000 Selecting site:106 Stage in:43 Submitting:1 Submitted:1 Active:1 Stage out:2 Finished successfully:163 > Progress: time: Sat, 09 Mar 2013 23:20:28 +0000 Selecting site:104 Stage in:20 Submitting:2 Active:7 Stage out:19 Finished successfully:165 > Progress: time: Sat, 09 Mar 2013 23:20:29 +0000 Selecting site:78 Stage in:29 Submitting:16 Submitted:1 Stage out:2 Finished successfully:191 > Progress: time: Sat, 09 Mar 2013 23:20:31 +0000 Selecting site:76 Stage in:30 Stage out:17 Finished successfully:194 > Progress: time: Sat, 09 Mar 2013 23:20:32 +0000 Selecting site:58 Stage in:29 Submitting:18 Active:1 Finished successfully:211 > Progress: time: Sat, 09 Mar 2013 23:20:33 +0000 Selecting site:58 Stage in:33 Active:3 Stage out:12 Finished successfully:211 > Progress: time: Sat, 09 Mar 2013 23:20:34 +0000 Selecting site:46 Stage in:18 Submitting:11 Submitted:1 Active:2 Stage out:14 Finished successfully:225 > Progress: time: Sat, 09 Mar 2013 23:20:35 +0000 Selecting site:30 Stage in:29 Active:14 Stage out:3 Finished successfully:241 > Progress: time: Sat, 09 Mar 2013 23:20:36 +0000 Selecting site:28 Stage in:28 Submitting:2 Stage out:17 Finished successfully:242 > Progress: time: Sat, 09 Mar 2013 23:20:37 +0000 Selecting site:10 Stage in:30 Submitting:17 Submitted:1 Finished successfully:259 > Progress: time: Sat, 09 Mar 2013 23:20:38 +0000 Selecting site:10 Stage in:35 Stage out:13 Finished successfully:259 > Progress: time: Sat, 09 Mar 2013 23:20:39 +0000 Stage in:21 Submitting:6 Submitted:3 Stage out:15 Finished successfully:272 > Progress: time: Sat, 09 Mar 2013 23:20:40 +0000 Stage in:10 Active:5 Stage out:14 Finished successfully:288 > Final status: Sat, 09 Mar 2013 23:20:41 +0000 Finished successfully:317 > > real 0m58.953s > user 0m32.573s > sys 0m1.263s > + mv /home/wilde/.swift/runs/current/run029.1362871183 /home/wilde/.swift/runs/completed > midway001$ > > > > ----- Original Message ----- > > From: "David Kelly" > > To: "Michael Wilde" > > Sent: Saturday, March 9, 2013 5:12:59 PM > > Subject: Re: runs for OSG talk > > > > > > Yep - I had a version where the input files were in a very similar > > format (PGM, 1 byte per pixel). I'll add that back, but without the > > small PGM header in the files. > > > > ----- Original Message ----- > > > > > > From: "Michael Wilde" > > To: "David Kelly" > > Sent: Saturday, March 9, 2013 5:04:43 PM > > Subject: Re: runs for OSG talk > > > > I think we need to cut down the size of these files for a demo > > (although they are great for a stress test). > > > > First, the RGB format by itself uses 3 bytes per pixel when it only > > needs one (for land use) > > > > Second, we should cut down by a factor of 9 (3x3) or 16 (4x4). > > > > I tried that using simple convert statements, but it always seems to > > yield a file exactly double what it should be. > > > > More on this later; was hoping to get things working "as is" first. > > > > I assume you could get the perl code to work on one-byte-per-pixel > > instead of the default 3 for the convert rgb format? > > > > - Mike > > > > ----- Original Message ----- > > > From: "David Kelly" > > > To: "Michael Wilde" > > > Sent: Saturday, March 9, 2013 4:36:30 PM > > > Subject: Re: runs for OSG talk > > > > > > > > > That would probably be a good idea for a new script, to show how to > > > stage apps like that. For now I updated the scripts on lustre.. > > > hopefully that helps. > > > > > > ----- Original Message ----- > > > > > > > > > From: "Michael Wilde" > > > To: "David Kelly" > > > Sent: Saturday, March 9, 2013 4:29:14 PM > > > Subject: Re: runs for OSG talk > > > > > > OK, I see that its trying to run getlanduse.sh from your /lustre > > > dir > > > on beagle, which is different than the one Ive got checked out. It > > > seems to get an error in a stderr redirect??? Let me se what I need > > > to do to get the beagle side in sync. > > > > > > Seems like since these are perl scripts, we should make the app() > > > /bin/sh and send the script as data, perhaps? > > > > > > - Mike > > > > > > ----- Original Message ----- > > > > From: "Michael Wilde" > > > > To: "David Kelly" > > > > Sent: Saturday, March 9, 2013 4:19:31 PM > > > > Subject: Re: runs for OSG talk > > > > > > > > OK, making progress. Now I dialed down the throttle and node > > > > counts > > > > to 48 jobs. > > > > > > > > Now I get further, for ./demo and site=4 script=2: > > > > > > > > RunID: 20130309-2214-1oi3rvea > > > > Progress: time: Sat, 09 Mar 2013 22:14:06 +0000 > > > > Progress: time: Sat, 09 Mar 2013 22:14:17 +0000 Selecting > > > > site:269 > > > > Submitting:47 Submitted:1 > > > > Progress: time: Sat, 09 Mar 2013 22:14:22 +0000 Selecting > > > > site:269 > > > > Stage in:1 Submitted:47 > > > > Progress: time: Sat, 09 Mar 2013 22:14:28 +0000 Selecting > > > > site:269 > > > > Stage in:25 Submitted:23 > > > > Progress: time: Sat, 09 Mar 2013 22:14:36 +0000 Selecting > > > > site:269 > > > > Stage in:48 > > > > Progress: time: Sat, 09 Mar 2013 22:15:06 +0000 Selecting > > > > site:269 > > > > Stage in:48 > > > > Progress: time: Sat, 09 Mar 2013 22:15:36 +0000 Selecting > > > > site:269 > > > > Stage in:48 > > > > Progress: time: Sat, 09 Mar 2013 22:16:06 +0000 Selecting > > > > site:269 > > > > Stage in:48 > > > > Progress: time: Sat, 09 Mar 2013 22:16:26 +0000 Selecting > > > > site:269 > > > > Stage in:47 Active:1 > > > > Progress: time: Sat, 09 Mar 2013 22:16:27 +0000 Selecting > > > > site:269 > > > > Stage in:36 Active:12 > > > > Progress: time: Sat, 09 Mar 2013 22:16:29 +0000 Selecting > > > > site:269 > > > > Stage in:24 Active:24 > > > > Progress: time: Sat, 09 Mar 2013 22:16:34 +0000 Selecting > > > > site:269 > > > > Stage in:24 Active:23 Stage out:1 > > > > Progress: time: Sat, 09 Mar 2013 22:16:35 +0000 Selecting > > > > site:269 > > > > Stage in:14 Active:33 Stage out:1 > > > > Execution failed: > > > > Exception in getlanduse: > > > > Arguments: > > > > [home/wilde/osgdemo/modis/svn/data/modis/2002/h08v04.rgb] > > > > Host: beagle > > > > Directory: > > > > modis02-20130309-2214-1oi3rvea/jobs/k/getlanduse-ko5qjd6l > > > > > > > > Caused by: > > > > Application /lustre/beagle/davidk/modis/bin/getlanduse.sh failed > > > > with an exit code of 1 > > > > getLandUse, modis02.swift, line 20 > > > > > > > > real 2m31.463s > > > > user 1m33.238s > > > > sys 0m2.160s > > > > + mv /home/wilde/.swift/runs/current/run024.1362867244 > > > > /home/wilde/.swift/runs/completed > > > > midway001$ > > > > > > > > > > > > ----- Original Message ----- > > > > > From: "David Kelly" > > > > > To: "Michael Wilde" > > > > > Sent: Saturday, March 9, 2013 3:55:30 PM > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > ok, I'll take a look at that. The run dir I used was > > > > > /scratch/midway/davidkelly999/modis/run011 > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > To: "David Kelly" > > > > > Sent: Saturday, March 9, 2013 3:52:28 PM > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > I just tried this, but didnt work - same prob. > > > > > > > > > > But if its working for you now, we must be close. > > > > > > > > > > Not yet sure what the diff is... > > > > > > > > > > My run dir is /home/wilde/osgdemo/modis/svn/run021 > > > > > > > > > > - Mike > > > > > > > > > > ----- Original Message ----- > > > > > > From: "David Kelly" > > > > > > To: "Michael Wilde" > > > > > > Sent: Saturday, March 9, 2013 3:46:13 PM > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > Had to make sure I was using the IP address on eth4 > > > > > > (128.135.112.71 > > > > > > for midway-login1), not a local address or an infiniband > > > > > > address. > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > From: "David Kelly" > > > > > > To: "Michael Wilde" > > > > > > Sent: Saturday, March 9, 2013 3:43:51 PM > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > I just got it working. I had to adjust for the differences in > > > > > > my > > > > > > username on Beagle/Midway, then I had to set GLOBUS_HOSTNAME > > > > > > on > > > > > > Midway to the IP address, rather than the full hostname > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > To: "David Kelly" > > > > > > Sent: Saturday, March 9, 2013 3:40:03 PM > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > From: "David Kelly" > > > > > > > To: "Michael Wilde" > > > > > > > Sent: Saturday, March 9, 2013 3:34:58 PM > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > Is your username the same on beagle and midway? > > > > > > > > > > > > Yes. And I verified that I can ssh to login4 on beagle from > > > > > > my > > > > > > midway > > > > > > session (as indeed the scp's of the proxy files seem to be > > > > > > working) > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > To: "David Kelly" > > > > > > > Sent: Saturday, March 9, 2013 3:34:28 PM > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > OK. > > > > > > > > > > > > > > Ignore what I said about "problem finding java" - thats > > > > > > > code > > > > > > > in > > > > > > > the > > > > > > > very long escaped shell command that gets sent to the > > > > > > > remote > > > > > > > side. > > > > > > > I > > > > > > > dont *think* thats the problem. > > > > > > > > > > > > > > I also verified that beagle can connect to ports 50001 etc > > > > > > > on > > > > > > > swift.rcc, and that seems OK. > > > > > > > > > > > > > > I exported GLOBUS_HOSTNAME=midway001.rcc.uchicago.edu on > > > > > > > the > > > > > > > midway > > > > > > > side. And the beagle side seems to be connecting there. > > > > > > > > > > > > > > Im a bit confused about the timestamps I see for the proxy > > > > > > > expiration > > > > > > > time, but am not yet suspicious of that (although it seems > > > > > > > less > > > > > > > than > > > > > > > 5 hours past GMT... not sure.) > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > From: "David Kelly" > > > > > > > > To: "Michael Wilde" > > > > > > > > Sent: Saturday, March 9, 2013 3:26:32 PM > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > I'm seeing the same error now.. looking into it > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > To: "David Kelly" > > > > > > > > Sent: Saturday, March 9, 2013 3:21:30 PM > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > Looking deeper I see that the logs show problems with > > > > > > > > finding > > > > > > > > Java, > > > > > > > > I > > > > > > > > assume on beagle, ans also service ending (presumably > > > > > > > > coaster > > > > > > > > service on midway host). > > > > > > > > > > > > > > > > I'll dig into these two. > > > > > > > > > > > > > > > > I see that it scp's the proxies to beagle which I think > > > > > > > > answers > > > > > > > > my > > > > > > > > question about security. > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > From: "Michael Wilde" > > > > > > > > > To: "David Kelly" > > > > > > > > > Sent: Saturday, March 9, 2013 3:15:01 PM > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > OK. Any thoughts about beagle? > > > > > > > > > > > > > > > > > > Ive been experimenting but still cant get it to work, > > > > > > > > > same > > > > > > > > > error > > > > > > > > > (cant connect to bootstrap port) > > > > > > > > > > > > > > > > > > WHen you tried ssh-cl to beagle with automatic > > > > > > > > > coasters, > > > > > > > > > what > > > > > > > > > configuration (sites env etc) did you use? > > > > > > > > > > > > > > > > > > I verified that beagle can connect back to the midway > > > > > > > > > hosts > > > > > > > > > and > > > > > > > > > ports. > > > > > > > > > > > > > > > > > > Do we need to specify security or create a proxy etc? > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > From: "David Kelly" > > > > > > > > > > To: "Michael Wilde" > > > > > > > > > > Sent: Saturday, March 9, 2013 3:08:58 PM > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > One way you can override/customize the default > > > > > > > > > > templates > > > > > > > > > > is > > > > > > > > > > to > > > > > > > > > > create > > > > > > > > > > them in $HOME/.swift/sites (I'm not sure if that's > > > > > > > > > > what > > > > > > > > > > you > > > > > > > > > > mean > > > > > > > > > > by > > > > > > > > > > a local sites dir or not). But you are right about > > > > > > > > > > Midway > > > > > > > > > > - > > > > > > > > > > I > > > > > > > > > > have > > > > > > > > > > noticed that when using modis it will sometimes get > > > > > > > > > > stuck > > > > > > > > > > when > > > > > > > > > > it > > > > > > > > > > goes to a queue that is busy. Ideally swift > > > > > > > > > > replication > > > > > > > > > > would > > > > > > > > > > be > > > > > > > > > > able to help better handle that, but I haven't had > > > > > > > > > > much > > > > > > > > > > luck > > > > > > > > > > with > > > > > > > > > > that yet. Another way around this may be to add this > > > > > > > > > > to > > > > > > > > > > the > > > > > > > > > > template: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > key="slurm.exclusive">false > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The swift.log issue was never fixed. It went to > > > > > > > > > > swift-devel > > > > > > > > > > for > > > > > > > > > > discussion but was never fixed. I think it is > > > > > > > > > > relatively > > > > > > > > > > simple > > > > > > > > > > though.. probably worth fixing before release. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > > To: "David Kelly" > > > > > > > > > > Sent: Saturday, March 9, 2013 1:38:47 PM > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > OK, sounds good re the trip plan. Feel free to stay > > > > > > > > > > Tue > > > > > > > > > > night > > > > > > > > > > to > > > > > > > > > > avoid a 4hr drive after a long day. > > > > > > > > > > > > > > > > > > > > Im trying the modis demo. > > > > > > > > > > > > > > > > > > > > I tried to create a local sites/ dir so I can modify > > > > > > > > > > the > > > > > > > > > > sites > > > > > > > > > > templates; thats not working for me either yet. > > > > > > > > > > > > > > > > > > > > For midway, need to force to westmere or sandyb (but > > > > > > > > > > not > > > > > > > > > > both) > > > > > > > > > > and > > > > > > > > > > ensure 1-node jobs, because either queue can get > > > > > > > > > > filled > > > > > > > > > > and > > > > > > > > > > not > > > > > > > > > > yield an idle node for a long time. maybe need to > > > > > > > > > > fiddle > > > > > > > > > > jobsPerNode > > > > > > > > > > to get at least 1 core when the system is busy and > > > > > > > > > > *pretend* > > > > > > > > > > that > > > > > > > > > > its a node. > > > > > > > > > > > > > > > > > > > > So to get response I tried beagle-ssh; That isnt > > > > > > > > > > working > > > > > > > > > > because > > > > > > > > > > the > > > > > > > > > > template sites file is wrong in swift 0.94 rc4. > > > > > > > > > > > > > > > > > > > > I also see that swift.log is still getting produced - > > > > > > > > > > I > > > > > > > > > > thought > > > > > > > > > > we > > > > > > > > > > eliminated that. Did it come back due to a problem > > > > > > > > > > with > > > > > > > > > > that > > > > > > > > > > fix? > > > > > > > > > > > > > > > > > > > > I'll keep hacking; suggestions welcome. > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > From: "David Kelly" > > > > > > > > > > > To: "Michael Wilde" > > > > > > > > > > > Sent: Saturday, March 9, 2013 12:20:00 PM > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Mike, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Looking more closely at the agenda, I think the > > > > > > > > > > > most > > > > > > > > > > > interesting/useful talks will be on Tuesday. Monday > > > > > > > > > > > I'll > > > > > > > > > > > come > > > > > > > > > > > to > > > > > > > > > > > Argonne to work on any loose ends and put the > > > > > > > > > > > finishing > > > > > > > > > > > touches > > > > > > > > > > > on > > > > > > > > > > > any slides/runs/scripts, then drive to Indianapolis > > > > > > > > > > > on > > > > > > > > > > > Monday > > > > > > > > > > > afternoon/evening. I have a hotel booked for Monday > > > > > > > > > > > night. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'll do some runs using the routes we talked about. > > > > > > > > > > > I'm > > > > > > > > > > > pretty > > > > > > > > > > > sure > > > > > > > > > > > I > > > > > > > > > > > have working configurations for everything we > > > > > > > > > > > talked > > > > > > > > > > > about, > > > > > > > > > > > so > > > > > > > > > > > I > > > > > > > > > > > think it's really just a matter of plugging in the > > > > > > > > > > > apps. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > David > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > > > To: "David Kelly" > > > > > > > > > > > Sent: Saturday, March 9, 2013 11:03:15 AM > > > > > > > > > > > Subject: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > Hi David, > > > > > > > > > > > > > > > > > > > > > > I just wanted to let you know that Im looking into > > > > > > > > > > > the > > > > > > > > > > > run > > > > > > > > > > > options > > > > > > > > > > > now. Im hoping to try a few... WIll see how much > > > > > > > > > > > help > > > > > > > > > > > I > > > > > > > > > > > need. > > > > > > > > > > > Have > > > > > > > > > > > you decided on a driving time and made hotel > > > > > > > > > > > arrangements? > > > > > > > > > > > > > > > > > > > > > > I would feel free to stay for whatever portion of > > > > > > > > > > > the > > > > > > > > > > > OSG > > > > > > > > > > > meeting > > > > > > > > > > > you > > > > > > > > > > > feel is of value. The only thing I ask is that for > > > > > > > > > > > Wed > > > > > > > > > > > and > > > > > > > > > > > Thu > > > > > > > > > > > you > > > > > > > > > > > stay available online for user-support or other > > > > > > > > > > > assistance > > > > > > > > > > > needs > > > > > > > > > > > that come up here. And that you engage with people > > > > > > > > > > > that > > > > > > > > > > > can > > > > > > > > > > > help > > > > > > > > > > > us > > > > > > > > > > > develop the Swift user community and reliable OSG > > > > > > > > > > > usage. > > > > > > > > > > > Rob, > > > > > > > > > > > Marco, > > > > > > > > > > > Lincoln, and Suchandra would be good to hang out > > > > > > > > > > > with > > > > > > > > > > > and > > > > > > > > > > > they > > > > > > > > > > > can > > > > > > > > > > > introduce you to good contacts. > > > > > > > > > > > > > > > > > > > > > > Of course we will cover your expenses via a > > > > > > > > > > > UChicago > > > > > > > > > > > travel > > > > > > > > > > > expense > > > > > > > > > > > report. > > > > > > > > > > > > > > > > > > > > > > We'll be starting a project with a tiny bit of > > > > > > > > > > > additional > > > > > > > > > > > ExTENCI > > > > > > > > > > > funds to make Swift do smarter data management on > > > > > > > > > > > OSG > > > > > > > > > > > sites > > > > > > > > > > > (and > > > > > > > > > > > in > > > > > > > > > > > general) so anything you learn about OSG storage > > > > > > > > > > > elements/services/tools will be valuable for that > > > > > > > > > > > (srmcp, > > > > > > > > > > > lcgcp, > > > > > > > > > > > etc). > > > > > > > > > > > > > > > > > > > > > > Between now and your talk, lets just focus on the > > > > > > > > > > > talk, > > > > > > > > > > > OK? > > > > > > > > > > > Im > > > > > > > > > > > hoping > > > > > > > > > > > we have slides frozen by Monday. > > > > > > > > > > > > > > > > > > > > > > While I fiddle, if you could do catsn or other > > > > > > > > > > > hello-world-like > > > > > > > > > > > tests > > > > > > > > > > > to cover the "routes" we discussed, that would pave > > > > > > > > > > > the > > > > > > > > > > > way > > > > > > > > > > > for > > > > > > > > > > > plugging in the real app examples. > > > > > > > > > > > > > > > > > > > > > > Sound good? Let me know of any concerns (other than > > > > > > > > > > > the > > > > > > > > > > > fact > > > > > > > > > > > that > > > > > > > > > > > this is a tad rushed ;) > > > > > > > > > > > > > > > > > > > > > > Thanks and regards, > > > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > Michael Wilde > > > > > > > > > > > Computation Institute, University of Chicago > > > > > > > > > > > Mathematics and Computer Science Division > > > > > > > > > > > Argonne National Laboratory > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel From ketancmaheshwari at gmail.com Sat Mar 9 20:24:56 2013 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Sat, 9 Mar 2013 20:24:56 -0600 Subject: [Swift-devel] Cant get auto-coasters to run from midway to beagle In-Reply-To: <1362879809.26464.1.camel@echo> References: <1737988661.1232489.1362872981749.JavaMail.root@mcs.anl.gov> <1362879809.26464.1.camel@echo> Message-ID: IBM jdk on beagle is known to not function well with Swift coasters. We had to switch to Sun jdk for ssh:pbs runs from bridled/communicado. On Sat, Mar 9, 2013 at 7:43 PM, Mihael Hategan wrote: > I noticed some random weirdness due to the fact that the coaster service > runs with the ibm jdk. > > I'll run some tests with both and see what happens. > > Mihael > > On Sat, 2013-03-09 at 17:49 -0600, Michael Wilde wrote: > > An update on this provider staging related issue: reducing filesize from > 17MB to 600KB runs well. > > > > So seems like some kind of flow control or buffer management problem, > possibly? > > > > May need to take that problem offline - would be a perfect test case for > Yadu to develop a new stress test for. > > > > - Mike > > > > > > ----- Forwarded Message ----- > > From: "Michael Wilde" > > To: "David Kelly" > > Sent: Saturday, March 9, 2013 5:21:49 PM > > Subject: Re: runs for OSG talk > > > > OK, much better: with 600K files (5x5 reduction or 25X smaller) it works > well, and fast (form midway to beagle!) > > > > Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified locally) > > > > RunID: 20130309-2319-5zq0jrfg > > Progress: time: Sat, 09 Mar 2013 23:19:45 +0000 > > Progress: time: Sat, 09 Mar 2013 23:19:56 +0000 Selecting site:269 > Submitting:47 Submitted:1 > > Progress: time: Sat, 09 Mar 2013 23:20:05 +0000 Selecting site:269 > Stage in:1 Submitted:47 > > Progress: time: Sat, 09 Mar 2013 23:20:09 +0000 Selecting site:269 > Stage in:47 Active:1 > > Progress: time: Sat, 09 Mar 2013 23:20:10 +0000 Selecting site:269 > Stage in:46 Active:1 Stage out:1 > > Progress: time: Sat, 09 Mar 2013 23:20:11 +0000 Selecting site:250 > Stage in:19 Active:28 Stage out:1 Finished successfully:19 > > Progress: time: Sat, 09 Mar 2013 23:20:12 +0000 Selecting site:229 > Stage in:18 Submitting:21 Active:1 Stage out:7 Finished successfully:41 > > Progress: time: Sat, 09 Mar 2013 23:20:13 +0000 Selecting site:220 > Stage in:41 Submitting:1 Active:5 Stage out:1 Finished successfully:49 > > Progress: time: Sat, 09 Mar 2013 23:20:14 +0000 Selecting site:220 > Stage in:38 Active:1 Stage out:9 Finished successfully:49 > > Progress: time: Sat, 09 Mar 2013 23:20:15 +0000 Selecting site:212 > Stage in:30 Submitting:8 Stage out:9 Finished successfully:58 > > Progress: time: Sat, 09 Mar 2013 23:20:16 +0000 Selecting site:203 > Stage in:38 Submitting:8 Submitted:1 Finished successfully:67 > > Progress: time: Sat, 09 Mar 2013 23:20:18 +0000 Selecting site:202 > Stage in:19 Stage out:28 Finished successfully:68 > > Progress: time: Sat, 09 Mar 2013 23:20:19 +0000 Selecting site:172 > Stage in:33 Submitting:2 Submitted:6 Active:5 Stage out:2 Finished > successfully:97 > > Progress: time: Sat, 09 Mar 2013 23:20:20 +0000 Selecting site:170 > Stage in:31 Submitting:2 Stage out:14 Finished successfully:100 > > Progress: time: Sat, 09 Mar 2013 23:20:21 +0000 Selecting site:162 > Stage in:30 Submitting:10 Stage out:6 Finished successfully:109 > > Progress: time: Sat, 09 Mar 2013 23:20:22 +0000 Selecting site:154 > Stage in:39 Submitting:5 Submitted:3 Active:1 Finished successfully:115 > > Progress: time: Sat, 09 Mar 2013 23:20:23 +0000 Selecting site:154 > Stage in:21 Active:10 Stage out:16 Finished successfully:116 > > Progress: time: Sat, 09 Mar 2013 23:20:24 +0000 Selecting site:126 > Stage in:20 Submitting:25 Submitted:1 Stage out:2 Finished > successfully:143 > > Progress: time: Sat, 09 Mar 2013 23:20:25 +0000 Selecting site:124 > Stage in:31 Active:2 Stage out:15 Finished successfully:145 > > Progress: time: Sat, 09 Mar 2013 23:20:26 +0000 Selecting site:110 > Stage in:30 Submitting:14 Stage out:3 Finished successfully:160 > > Progress: time: Sat, 09 Mar 2013 23:20:27 +0000 Selecting site:106 > Stage in:43 Submitting:1 Submitted:1 Active:1 Stage out:2 Finished > successfully:163 > > Progress: time: Sat, 09 Mar 2013 23:20:28 +0000 Selecting site:104 > Stage in:20 Submitting:2 Active:7 Stage out:19 Finished > successfully:165 > > Progress: time: Sat, 09 Mar 2013 23:20:29 +0000 Selecting site:78 > Stage in:29 Submitting:16 Submitted:1 Stage out:2 Finished > successfully:191 > > Progress: time: Sat, 09 Mar 2013 23:20:31 +0000 Selecting site:76 > Stage in:30 Stage out:17 Finished successfully:194 > > Progress: time: Sat, 09 Mar 2013 23:20:32 +0000 Selecting site:58 > Stage in:29 Submitting:18 Active:1 Finished successfully:211 > > Progress: time: Sat, 09 Mar 2013 23:20:33 +0000 Selecting site:58 > Stage in:33 Active:3 Stage out:12 Finished successfully:211 > > Progress: time: Sat, 09 Mar 2013 23:20:34 +0000 Selecting site:46 > Stage in:18 Submitting:11 Submitted:1 Active:2 Stage out:14 Finished > successfully:225 > > Progress: time: Sat, 09 Mar 2013 23:20:35 +0000 Selecting site:30 > Stage in:29 Active:14 Stage out:3 Finished successfully:241 > > Progress: time: Sat, 09 Mar 2013 23:20:36 +0000 Selecting site:28 > Stage in:28 Submitting:2 Stage out:17 Finished successfully:242 > > Progress: time: Sat, 09 Mar 2013 23:20:37 +0000 Selecting site:10 > Stage in:30 Submitting:17 Submitted:1 Finished successfully:259 > > Progress: time: Sat, 09 Mar 2013 23:20:38 +0000 Selecting site:10 > Stage in:35 Stage out:13 Finished successfully:259 > > Progress: time: Sat, 09 Mar 2013 23:20:39 +0000 Stage in:21 > Submitting:6 Submitted:3 Stage out:15 Finished successfully:272 > > Progress: time: Sat, 09 Mar 2013 23:20:40 +0000 Stage in:10 Active:5 > Stage out:14 Finished successfully:288 > > Final status: Sat, 09 Mar 2013 23:20:41 +0000 Finished successfully:317 > > > > real 0m58.953s > > user 0m32.573s > > sys 0m1.263s > > + mv /home/wilde/.swift/runs/current/run029.1362871183 > /home/wilde/.swift/runs/completed > > midway001$ > > > > > > > > ----- Original Message ----- > > > From: "David Kelly" > > > To: "Michael Wilde" > > > Sent: Saturday, March 9, 2013 5:12:59 PM > > > Subject: Re: runs for OSG talk > > > > > > > > > Yep - I had a version where the input files were in a very similar > > > format (PGM, 1 byte per pixel). I'll add that back, but without the > > > small PGM header in the files. > > > > > > ----- Original Message ----- > > > > > > > > > From: "Michael Wilde" > > > To: "David Kelly" > > > Sent: Saturday, March 9, 2013 5:04:43 PM > > > Subject: Re: runs for OSG talk > > > > > > I think we need to cut down the size of these files for a demo > > > (although they are great for a stress test). > > > > > > First, the RGB format by itself uses 3 bytes per pixel when it only > > > needs one (for land use) > > > > > > Second, we should cut down by a factor of 9 (3x3) or 16 (4x4). > > > > > > I tried that using simple convert statements, but it always seems to > > > yield a file exactly double what it should be. > > > > > > More on this later; was hoping to get things working "as is" first. > > > > > > I assume you could get the perl code to work on one-byte-per-pixel > > > instead of the default 3 for the convert rgb format? > > > > > > - Mike > > > > > > ----- Original Message ----- > > > > From: "David Kelly" > > > > To: "Michael Wilde" > > > > Sent: Saturday, March 9, 2013 4:36:30 PM > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > That would probably be a good idea for a new script, to show how to > > > > stage apps like that. For now I updated the scripts on lustre.. > > > > hopefully that helps. > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > From: "Michael Wilde" > > > > To: "David Kelly" > > > > Sent: Saturday, March 9, 2013 4:29:14 PM > > > > Subject: Re: runs for OSG talk > > > > > > > > OK, I see that its trying to run getlanduse.sh from your /lustre > > > > dir > > > > on beagle, which is different than the one Ive got checked out. It > > > > seems to get an error in a stderr redirect??? Let me se what I need > > > > to do to get the beagle side in sync. > > > > > > > > Seems like since these are perl scripts, we should make the app() > > > > /bin/sh and send the script as data, perhaps? > > > > > > > > - Mike > > > > > > > > ----- Original Message ----- > > > > > From: "Michael Wilde" > > > > > To: "David Kelly" > > > > > Sent: Saturday, March 9, 2013 4:19:31 PM > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > OK, making progress. Now I dialed down the throttle and node > > > > > counts > > > > > to 48 jobs. > > > > > > > > > > Now I get further, for ./demo and site=4 script=2: > > > > > > > > > > RunID: 20130309-2214-1oi3rvea > > > > > Progress: time: Sat, 09 Mar 2013 22:14:06 +0000 > > > > > Progress: time: Sat, 09 Mar 2013 22:14:17 +0000 Selecting > > > > > site:269 > > > > > Submitting:47 Submitted:1 > > > > > Progress: time: Sat, 09 Mar 2013 22:14:22 +0000 Selecting > > > > > site:269 > > > > > Stage in:1 Submitted:47 > > > > > Progress: time: Sat, 09 Mar 2013 22:14:28 +0000 Selecting > > > > > site:269 > > > > > Stage in:25 Submitted:23 > > > > > Progress: time: Sat, 09 Mar 2013 22:14:36 +0000 Selecting > > > > > site:269 > > > > > Stage in:48 > > > > > Progress: time: Sat, 09 Mar 2013 22:15:06 +0000 Selecting > > > > > site:269 > > > > > Stage in:48 > > > > > Progress: time: Sat, 09 Mar 2013 22:15:36 +0000 Selecting > > > > > site:269 > > > > > Stage in:48 > > > > > Progress: time: Sat, 09 Mar 2013 22:16:06 +0000 Selecting > > > > > site:269 > > > > > Stage in:48 > > > > > Progress: time: Sat, 09 Mar 2013 22:16:26 +0000 Selecting > > > > > site:269 > > > > > Stage in:47 Active:1 > > > > > Progress: time: Sat, 09 Mar 2013 22:16:27 +0000 Selecting > > > > > site:269 > > > > > Stage in:36 Active:12 > > > > > Progress: time: Sat, 09 Mar 2013 22:16:29 +0000 Selecting > > > > > site:269 > > > > > Stage in:24 Active:24 > > > > > Progress: time: Sat, 09 Mar 2013 22:16:34 +0000 Selecting > > > > > site:269 > > > > > Stage in:24 Active:23 Stage out:1 > > > > > Progress: time: Sat, 09 Mar 2013 22:16:35 +0000 Selecting > > > > > site:269 > > > > > Stage in:14 Active:33 Stage out:1 > > > > > Execution failed: > > > > > Exception in getlanduse: > > > > > Arguments: > > > > > [home/wilde/osgdemo/modis/svn/data/modis/2002/h08v04.rgb] > > > > > Host: beagle > > > > > Directory: > > > > > modis02-20130309-2214-1oi3rvea/jobs/k/getlanduse-ko5qjd6l > > > > > > > > > > Caused by: > > > > > Application /lustre/beagle/davidk/modis/bin/getlanduse.sh failed > > > > > with an exit code of 1 > > > > > getLandUse, modis02.swift, line 20 > > > > > > > > > > real 2m31.463s > > > > > user 1m33.238s > > > > > sys 0m2.160s > > > > > + mv /home/wilde/.swift/runs/current/run024.1362867244 > > > > > /home/wilde/.swift/runs/completed > > > > > midway001$ > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > From: "David Kelly" > > > > > > To: "Michael Wilde" > > > > > > Sent: Saturday, March 9, 2013 3:55:30 PM > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > ok, I'll take a look at that. The run dir I used was > > > > > > /scratch/midway/davidkelly999/modis/run011 > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > To: "David Kelly" > > > > > > Sent: Saturday, March 9, 2013 3:52:28 PM > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > I just tried this, but didnt work - same prob. > > > > > > > > > > > > But if its working for you now, we must be close. > > > > > > > > > > > > Not yet sure what the diff is... > > > > > > > > > > > > My run dir is /home/wilde/osgdemo/modis/svn/run021 > > > > > > > > > > > > - Mike > > > > > > > > > > > > ----- Original Message ----- > > > > > > > From: "David Kelly" > > > > > > > To: "Michael Wilde" > > > > > > > Sent: Saturday, March 9, 2013 3:46:13 PM > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > Had to make sure I was using the IP address on eth4 > > > > > > > (128.135.112.71 > > > > > > > for midway-login1), not a local address or an infiniband > > > > > > > address. > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > From: "David Kelly" > > > > > > > To: "Michael Wilde" > > > > > > > Sent: Saturday, March 9, 2013 3:43:51 PM > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > I just got it working. I had to adjust for the differences in > > > > > > > my > > > > > > > username on Beagle/Midway, then I had to set GLOBUS_HOSTNAME > > > > > > > on > > > > > > > Midway to the IP address, rather than the full hostname > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > To: "David Kelly" > > > > > > > Sent: Saturday, March 9, 2013 3:40:03 PM > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > From: "David Kelly" > > > > > > > > To: "Michael Wilde" > > > > > > > > Sent: Saturday, March 9, 2013 3:34:58 PM > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > Is your username the same on beagle and midway? > > > > > > > > > > > > > > Yes. And I verified that I can ssh to login4 on beagle from > > > > > > > my > > > > > > > midway > > > > > > > session (as indeed the scp's of the proxy files seem to be > > > > > > > working) > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > To: "David Kelly" > > > > > > > > Sent: Saturday, March 9, 2013 3:34:28 PM > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > OK. > > > > > > > > > > > > > > > > Ignore what I said about "problem finding java" - thats > > > > > > > > code > > > > > > > > in > > > > > > > > the > > > > > > > > very long escaped shell command that gets sent to the > > > > > > > > remote > > > > > > > > side. > > > > > > > > I > > > > > > > > dont *think* thats the problem. > > > > > > > > > > > > > > > > I also verified that beagle can connect to ports 50001 etc > > > > > > > > on > > > > > > > > swift.rcc, and that seems OK. > > > > > > > > > > > > > > > > I exported GLOBUS_HOSTNAME=midway001.rcc.uchicago.edu on > > > > > > > > the > > > > > > > > midway > > > > > > > > side. And the beagle side seems to be connecting there. > > > > > > > > > > > > > > > > Im a bit confused about the timestamps I see for the proxy > > > > > > > > expiration > > > > > > > > time, but am not yet suspicious of that (although it seems > > > > > > > > less > > > > > > > > than > > > > > > > > 5 hours past GMT... not sure.) > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > From: "David Kelly" > > > > > > > > > To: "Michael Wilde" > > > > > > > > > Sent: Saturday, March 9, 2013 3:26:32 PM > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > I'm seeing the same error now.. looking into it > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > To: "David Kelly" > > > > > > > > > Sent: Saturday, March 9, 2013 3:21:30 PM > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > Looking deeper I see that the logs show problems with > > > > > > > > > finding > > > > > > > > > Java, > > > > > > > > > I > > > > > > > > > assume on beagle, ans also service ending (presumably > > > > > > > > > coaster > > > > > > > > > service on midway host). > > > > > > > > > > > > > > > > > > I'll dig into these two. > > > > > > > > > > > > > > > > > > I see that it scp's the proxies to beagle which I think > > > > > > > > > answers > > > > > > > > > my > > > > > > > > > question about security. > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > > To: "David Kelly" > > > > > > > > > > Sent: Saturday, March 9, 2013 3:15:01 PM > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > OK. Any thoughts about beagle? > > > > > > > > > > > > > > > > > > > > Ive been experimenting but still cant get it to work, > > > > > > > > > > same > > > > > > > > > > error > > > > > > > > > > (cant connect to bootstrap port) > > > > > > > > > > > > > > > > > > > > WHen you tried ssh-cl to beagle with automatic > > > > > > > > > > coasters, > > > > > > > > > > what > > > > > > > > > > configuration (sites env etc) did you use? > > > > > > > > > > > > > > > > > > > > I verified that beagle can connect back to the midway > > > > > > > > > > hosts > > > > > > > > > > and > > > > > > > > > > ports. > > > > > > > > > > > > > > > > > > > > Do we need to specify security or create a proxy etc? > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > From: "David Kelly" > > > > > > > > > > > To: "Michael Wilde" > > > > > > > > > > > Sent: Saturday, March 9, 2013 3:08:58 PM > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > One way you can override/customize the default > > > > > > > > > > > templates > > > > > > > > > > > is > > > > > > > > > > > to > > > > > > > > > > > create > > > > > > > > > > > them in $HOME/.swift/sites (I'm not sure if that's > > > > > > > > > > > what > > > > > > > > > > > you > > > > > > > > > > > mean > > > > > > > > > > > by > > > > > > > > > > > a local sites dir or not). But you are right about > > > > > > > > > > > Midway > > > > > > > > > > > - > > > > > > > > > > > I > > > > > > > > > > > have > > > > > > > > > > > noticed that when using modis it will sometimes get > > > > > > > > > > > stuck > > > > > > > > > > > when > > > > > > > > > > > it > > > > > > > > > > > goes to a queue that is busy. Ideally swift > > > > > > > > > > > replication > > > > > > > > > > > would > > > > > > > > > > > be > > > > > > > > > > > able to help better handle that, but I haven't had > > > > > > > > > > > much > > > > > > > > > > > luck > > > > > > > > > > > with > > > > > > > > > > > that yet. Another way around this may be to add this > > > > > > > > > > > to > > > > > > > > > > > the > > > > > > > > > > > template: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > key="slurm.exclusive">false > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The swift.log issue was never fixed. It went to > > > > > > > > > > > swift-devel > > > > > > > > > > > for > > > > > > > > > > > discussion but was never fixed. I think it is > > > > > > > > > > > relatively > > > > > > > > > > > simple > > > > > > > > > > > though.. probably worth fixing before release. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > > > To: "David Kelly" > > > > > > > > > > > Sent: Saturday, March 9, 2013 1:38:47 PM > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > OK, sounds good re the trip plan. Feel free to stay > > > > > > > > > > > Tue > > > > > > > > > > > night > > > > > > > > > > > to > > > > > > > > > > > avoid a 4hr drive after a long day. > > > > > > > > > > > > > > > > > > > > > > Im trying the modis demo. > > > > > > > > > > > > > > > > > > > > > > I tried to create a local sites/ dir so I can modify > > > > > > > > > > > the > > > > > > > > > > > sites > > > > > > > > > > > templates; thats not working for me either yet. > > > > > > > > > > > > > > > > > > > > > > For midway, need to force to westmere or sandyb (but > > > > > > > > > > > not > > > > > > > > > > > both) > > > > > > > > > > > and > > > > > > > > > > > ensure 1-node jobs, because either queue can get > > > > > > > > > > > filled > > > > > > > > > > > and > > > > > > > > > > > not > > > > > > > > > > > yield an idle node for a long time. maybe need to > > > > > > > > > > > fiddle > > > > > > > > > > > jobsPerNode > > > > > > > > > > > to get at least 1 core when the system is busy and > > > > > > > > > > > *pretend* > > > > > > > > > > > that > > > > > > > > > > > its a node. > > > > > > > > > > > > > > > > > > > > > > So to get response I tried beagle-ssh; That isnt > > > > > > > > > > > working > > > > > > > > > > > because > > > > > > > > > > > the > > > > > > > > > > > template sites file is wrong in swift 0.94 rc4. > > > > > > > > > > > > > > > > > > > > > > I also see that swift.log is still getting produced - > > > > > > > > > > > I > > > > > > > > > > > thought > > > > > > > > > > > we > > > > > > > > > > > eliminated that. Did it come back due to a problem > > > > > > > > > > > with > > > > > > > > > > > that > > > > > > > > > > > fix? > > > > > > > > > > > > > > > > > > > > > > I'll keep hacking; suggestions welcome. > > > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > From: "David Kelly" > > > > > > > > > > > > To: "Michael Wilde" > > > > > > > > > > > > Sent: Saturday, March 9, 2013 12:20:00 PM > > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Mike, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Looking more closely at the agenda, I think the > > > > > > > > > > > > most > > > > > > > > > > > > interesting/useful talks will be on Tuesday. Monday > > > > > > > > > > > > I'll > > > > > > > > > > > > come > > > > > > > > > > > > to > > > > > > > > > > > > Argonne to work on any loose ends and put the > > > > > > > > > > > > finishing > > > > > > > > > > > > touches > > > > > > > > > > > > on > > > > > > > > > > > > any slides/runs/scripts, then drive to Indianapolis > > > > > > > > > > > > on > > > > > > > > > > > > Monday > > > > > > > > > > > > afternoon/evening. I have a hotel booked for Monday > > > > > > > > > > > > night. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'll do some runs using the routes we talked about. > > > > > > > > > > > > I'm > > > > > > > > > > > > pretty > > > > > > > > > > > > sure > > > > > > > > > > > > I > > > > > > > > > > > > have working configurations for everything we > > > > > > > > > > > > talked > > > > > > > > > > > > about, > > > > > > > > > > > > so > > > > > > > > > > > > I > > > > > > > > > > > > think it's really just a matter of plugging in the > > > > > > > > > > > > apps. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > David > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > > > > To: "David Kelly" > > > > > > > > > > > > Sent: Saturday, March 9, 2013 11:03:15 AM > > > > > > > > > > > > Subject: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > Hi David, > > > > > > > > > > > > > > > > > > > > > > > > I just wanted to let you know that Im looking into > > > > > > > > > > > > the > > > > > > > > > > > > run > > > > > > > > > > > > options > > > > > > > > > > > > now. Im hoping to try a few... WIll see how much > > > > > > > > > > > > help > > > > > > > > > > > > I > > > > > > > > > > > > need. > > > > > > > > > > > > Have > > > > > > > > > > > > you decided on a driving time and made hotel > > > > > > > > > > > > arrangements? > > > > > > > > > > > > > > > > > > > > > > > > I would feel free to stay for whatever portion of > > > > > > > > > > > > the > > > > > > > > > > > > OSG > > > > > > > > > > > > meeting > > > > > > > > > > > > you > > > > > > > > > > > > feel is of value. The only thing I ask is that for > > > > > > > > > > > > Wed > > > > > > > > > > > > and > > > > > > > > > > > > Thu > > > > > > > > > > > > you > > > > > > > > > > > > stay available online for user-support or other > > > > > > > > > > > > assistance > > > > > > > > > > > > needs > > > > > > > > > > > > that come up here. And that you engage with people > > > > > > > > > > > > that > > > > > > > > > > > > can > > > > > > > > > > > > help > > > > > > > > > > > > us > > > > > > > > > > > > develop the Swift user community and reliable OSG > > > > > > > > > > > > usage. > > > > > > > > > > > > Rob, > > > > > > > > > > > > Marco, > > > > > > > > > > > > Lincoln, and Suchandra would be good to hang out > > > > > > > > > > > > with > > > > > > > > > > > > and > > > > > > > > > > > > they > > > > > > > > > > > > can > > > > > > > > > > > > introduce you to good contacts. > > > > > > > > > > > > > > > > > > > > > > > > Of course we will cover your expenses via a > > > > > > > > > > > > UChicago > > > > > > > > > > > > travel > > > > > > > > > > > > expense > > > > > > > > > > > > report. > > > > > > > > > > > > > > > > > > > > > > > > We'll be starting a project with a tiny bit of > > > > > > > > > > > > additional > > > > > > > > > > > > ExTENCI > > > > > > > > > > > > funds to make Swift do smarter data management on > > > > > > > > > > > > OSG > > > > > > > > > > > > sites > > > > > > > > > > > > (and > > > > > > > > > > > > in > > > > > > > > > > > > general) so anything you learn about OSG storage > > > > > > > > > > > > elements/services/tools will be valuable for that > > > > > > > > > > > > (srmcp, > > > > > > > > > > > > lcgcp, > > > > > > > > > > > > etc). > > > > > > > > > > > > > > > > > > > > > > > > Between now and your talk, lets just focus on the > > > > > > > > > > > > talk, > > > > > > > > > > > > OK? > > > > > > > > > > > > Im > > > > > > > > > > > > hoping > > > > > > > > > > > > we have slides frozen by Monday. > > > > > > > > > > > > > > > > > > > > > > > > While I fiddle, if you could do catsn or other > > > > > > > > > > > > hello-world-like > > > > > > > > > > > > tests > > > > > > > > > > > > to cover the "routes" we discussed, that would pave > > > > > > > > > > > > the > > > > > > > > > > > > way > > > > > > > > > > > > for > > > > > > > > > > > > plugging in the real app examples. > > > > > > > > > > > > > > > > > > > > > > > > Sound good? Let me know of any concerns (other than > > > > > > > > > > > > the > > > > > > > > > > > > fact > > > > > > > > > > > > that > > > > > > > > > > > > this is a tad rushed ;) > > > > > > > > > > > > > > > > > > > > > > > > Thanks and regards, > > > > > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > Michael Wilde > > > > > > > > > > > > Computation Institute, University of Chicago > > > > > > > > > > > > Mathematics and Computer Science Division > > > > > > > > > > > > Argonne National Laboratory > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Sun Mar 10 01:36:26 2013 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 09 Mar 2013 23:36:26 -0800 Subject: [Swift-devel] Cant get auto-coasters to run from midway to beagle In-Reply-To: <1737988661.1232489.1362872981749.JavaMail.root@mcs.anl.gov> References: <1737988661.1232489.1362872981749.JavaMail.root@mcs.anl.gov> Message-ID: <1362900986.29839.4.camel@echo> Please try now. I made some changes: 1. start the service with "-l" so that things in your .profile (such as module load sun-java) would be picked up. However, this also means that you should unset X509_* stuff or the sshcl proxy forwarding will not work properly. 2. I fixed a bug that caused an extra connection to the coaster service. Normally the service connects back to the client and both use that connection. However, due to some changes in the way credentials were set for jobs, and the fact that connections were looked up based on both hostname and credential, the coaster client would ignore the existing connection and create another one. The initial one with then time out at some point causing the service to crash. Mihael On Sat, 2013-03-09 at 17:49 -0600, Michael Wilde wrote: > An update on this provider staging related issue: reducing filesize from 17MB to 600KB runs well. > > So seems like some kind of flow control or buffer management problem, possibly? > > May need to take that problem offline - would be a perfect test case for Yadu to develop a new stress test for. > > - Mike > > > ----- Forwarded Message ----- > From: "Michael Wilde" > To: "David Kelly" > Sent: Saturday, March 9, 2013 5:21:49 PM > Subject: Re: runs for OSG talk > > OK, much better: with 600K files (5x5 reduction or 25X smaller) it works well, and fast (form midway to beagle!) > > Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified locally) > > RunID: 20130309-2319-5zq0jrfg > Progress: time: Sat, 09 Mar 2013 23:19:45 +0000 > Progress: time: Sat, 09 Mar 2013 23:19:56 +0000 Selecting site:269 Submitting:47 Submitted:1 > Progress: time: Sat, 09 Mar 2013 23:20:05 +0000 Selecting site:269 Stage in:1 Submitted:47 > Progress: time: Sat, 09 Mar 2013 23:20:09 +0000 Selecting site:269 Stage in:47 Active:1 > Progress: time: Sat, 09 Mar 2013 23:20:10 +0000 Selecting site:269 Stage in:46 Active:1 Stage out:1 > Progress: time: Sat, 09 Mar 2013 23:20:11 +0000 Selecting site:250 Stage in:19 Active:28 Stage out:1 Finished successfully:19 > Progress: time: Sat, 09 Mar 2013 23:20:12 +0000 Selecting site:229 Stage in:18 Submitting:21 Active:1 Stage out:7 Finished successfully:41 > Progress: time: Sat, 09 Mar 2013 23:20:13 +0000 Selecting site:220 Stage in:41 Submitting:1 Active:5 Stage out:1 Finished successfully:49 > Progress: time: Sat, 09 Mar 2013 23:20:14 +0000 Selecting site:220 Stage in:38 Active:1 Stage out:9 Finished successfully:49 > Progress: time: Sat, 09 Mar 2013 23:20:15 +0000 Selecting site:212 Stage in:30 Submitting:8 Stage out:9 Finished successfully:58 > Progress: time: Sat, 09 Mar 2013 23:20:16 +0000 Selecting site:203 Stage in:38 Submitting:8 Submitted:1 Finished successfully:67 > Progress: time: Sat, 09 Mar 2013 23:20:18 +0000 Selecting site:202 Stage in:19 Stage out:28 Finished successfully:68 > Progress: time: Sat, 09 Mar 2013 23:20:19 +0000 Selecting site:172 Stage in:33 Submitting:2 Submitted:6 Active:5 Stage out:2 Finished successfully:97 > Progress: time: Sat, 09 Mar 2013 23:20:20 +0000 Selecting site:170 Stage in:31 Submitting:2 Stage out:14 Finished successfully:100 > Progress: time: Sat, 09 Mar 2013 23:20:21 +0000 Selecting site:162 Stage in:30 Submitting:10 Stage out:6 Finished successfully:109 > Progress: time: Sat, 09 Mar 2013 23:20:22 +0000 Selecting site:154 Stage in:39 Submitting:5 Submitted:3 Active:1 Finished successfully:115 > Progress: time: Sat, 09 Mar 2013 23:20:23 +0000 Selecting site:154 Stage in:21 Active:10 Stage out:16 Finished successfully:116 > Progress: time: Sat, 09 Mar 2013 23:20:24 +0000 Selecting site:126 Stage in:20 Submitting:25 Submitted:1 Stage out:2 Finished successfully:143 > Progress: time: Sat, 09 Mar 2013 23:20:25 +0000 Selecting site:124 Stage in:31 Active:2 Stage out:15 Finished successfully:145 > Progress: time: Sat, 09 Mar 2013 23:20:26 +0000 Selecting site:110 Stage in:30 Submitting:14 Stage out:3 Finished successfully:160 > Progress: time: Sat, 09 Mar 2013 23:20:27 +0000 Selecting site:106 Stage in:43 Submitting:1 Submitted:1 Active:1 Stage out:2 Finished successfully:163 > Progress: time: Sat, 09 Mar 2013 23:20:28 +0000 Selecting site:104 Stage in:20 Submitting:2 Active:7 Stage out:19 Finished successfully:165 > Progress: time: Sat, 09 Mar 2013 23:20:29 +0000 Selecting site:78 Stage in:29 Submitting:16 Submitted:1 Stage out:2 Finished successfully:191 > Progress: time: Sat, 09 Mar 2013 23:20:31 +0000 Selecting site:76 Stage in:30 Stage out:17 Finished successfully:194 > Progress: time: Sat, 09 Mar 2013 23:20:32 +0000 Selecting site:58 Stage in:29 Submitting:18 Active:1 Finished successfully:211 > Progress: time: Sat, 09 Mar 2013 23:20:33 +0000 Selecting site:58 Stage in:33 Active:3 Stage out:12 Finished successfully:211 > Progress: time: Sat, 09 Mar 2013 23:20:34 +0000 Selecting site:46 Stage in:18 Submitting:11 Submitted:1 Active:2 Stage out:14 Finished successfully:225 > Progress: time: Sat, 09 Mar 2013 23:20:35 +0000 Selecting site:30 Stage in:29 Active:14 Stage out:3 Finished successfully:241 > Progress: time: Sat, 09 Mar 2013 23:20:36 +0000 Selecting site:28 Stage in:28 Submitting:2 Stage out:17 Finished successfully:242 > Progress: time: Sat, 09 Mar 2013 23:20:37 +0000 Selecting site:10 Stage in:30 Submitting:17 Submitted:1 Finished successfully:259 > Progress: time: Sat, 09 Mar 2013 23:20:38 +0000 Selecting site:10 Stage in:35 Stage out:13 Finished successfully:259 > Progress: time: Sat, 09 Mar 2013 23:20:39 +0000 Stage in:21 Submitting:6 Submitted:3 Stage out:15 Finished successfully:272 > Progress: time: Sat, 09 Mar 2013 23:20:40 +0000 Stage in:10 Active:5 Stage out:14 Finished successfully:288 > Final status: Sat, 09 Mar 2013 23:20:41 +0000 Finished successfully:317 > > real 0m58.953s > user 0m32.573s > sys 0m1.263s > + mv /home/wilde/.swift/runs/current/run029.1362871183 /home/wilde/.swift/runs/completed > midway001$ > > > > ----- Original Message ----- > > From: "David Kelly" > > To: "Michael Wilde" > > Sent: Saturday, March 9, 2013 5:12:59 PM > > Subject: Re: runs for OSG talk > > > > > > Yep - I had a version where the input files were in a very similar > > format (PGM, 1 byte per pixel). I'll add that back, but without the > > small PGM header in the files. > > > > ----- Original Message ----- > > > > > > From: "Michael Wilde" > > To: "David Kelly" > > Sent: Saturday, March 9, 2013 5:04:43 PM > > Subject: Re: runs for OSG talk > > > > I think we need to cut down the size of these files for a demo > > (although they are great for a stress test). > > > > First, the RGB format by itself uses 3 bytes per pixel when it only > > needs one (for land use) > > > > Second, we should cut down by a factor of 9 (3x3) or 16 (4x4). > > > > I tried that using simple convert statements, but it always seems to > > yield a file exactly double what it should be. > > > > More on this later; was hoping to get things working "as is" first. > > > > I assume you could get the perl code to work on one-byte-per-pixel > > instead of the default 3 for the convert rgb format? > > > > - Mike > > > > ----- Original Message ----- > > > From: "David Kelly" > > > To: "Michael Wilde" > > > Sent: Saturday, March 9, 2013 4:36:30 PM > > > Subject: Re: runs for OSG talk > > > > > > > > > That would probably be a good idea for a new script, to show how to > > > stage apps like that. For now I updated the scripts on lustre.. > > > hopefully that helps. > > > > > > ----- Original Message ----- > > > > > > > > > From: "Michael Wilde" > > > To: "David Kelly" > > > Sent: Saturday, March 9, 2013 4:29:14 PM > > > Subject: Re: runs for OSG talk > > > > > > OK, I see that its trying to run getlanduse.sh from your /lustre > > > dir > > > on beagle, which is different than the one Ive got checked out. It > > > seems to get an error in a stderr redirect??? Let me se what I need > > > to do to get the beagle side in sync. > > > > > > Seems like since these are perl scripts, we should make the app() > > > /bin/sh and send the script as data, perhaps? > > > > > > - Mike > > > > > > ----- Original Message ----- > > > > From: "Michael Wilde" > > > > To: "David Kelly" > > > > Sent: Saturday, March 9, 2013 4:19:31 PM > > > > Subject: Re: runs for OSG talk > > > > > > > > OK, making progress. Now I dialed down the throttle and node > > > > counts > > > > to 48 jobs. > > > > > > > > Now I get further, for ./demo and site=4 script=2: > > > > > > > > RunID: 20130309-2214-1oi3rvea > > > > Progress: time: Sat, 09 Mar 2013 22:14:06 +0000 > > > > Progress: time: Sat, 09 Mar 2013 22:14:17 +0000 Selecting > > > > site:269 > > > > Submitting:47 Submitted:1 > > > > Progress: time: Sat, 09 Mar 2013 22:14:22 +0000 Selecting > > > > site:269 > > > > Stage in:1 Submitted:47 > > > > Progress: time: Sat, 09 Mar 2013 22:14:28 +0000 Selecting > > > > site:269 > > > > Stage in:25 Submitted:23 > > > > Progress: time: Sat, 09 Mar 2013 22:14:36 +0000 Selecting > > > > site:269 > > > > Stage in:48 > > > > Progress: time: Sat, 09 Mar 2013 22:15:06 +0000 Selecting > > > > site:269 > > > > Stage in:48 > > > > Progress: time: Sat, 09 Mar 2013 22:15:36 +0000 Selecting > > > > site:269 > > > > Stage in:48 > > > > Progress: time: Sat, 09 Mar 2013 22:16:06 +0000 Selecting > > > > site:269 > > > > Stage in:48 > > > > Progress: time: Sat, 09 Mar 2013 22:16:26 +0000 Selecting > > > > site:269 > > > > Stage in:47 Active:1 > > > > Progress: time: Sat, 09 Mar 2013 22:16:27 +0000 Selecting > > > > site:269 > > > > Stage in:36 Active:12 > > > > Progress: time: Sat, 09 Mar 2013 22:16:29 +0000 Selecting > > > > site:269 > > > > Stage in:24 Active:24 > > > > Progress: time: Sat, 09 Mar 2013 22:16:34 +0000 Selecting > > > > site:269 > > > > Stage in:24 Active:23 Stage out:1 > > > > Progress: time: Sat, 09 Mar 2013 22:16:35 +0000 Selecting > > > > site:269 > > > > Stage in:14 Active:33 Stage out:1 > > > > Execution failed: > > > > Exception in getlanduse: > > > > Arguments: > > > > [home/wilde/osgdemo/modis/svn/data/modis/2002/h08v04.rgb] > > > > Host: beagle > > > > Directory: > > > > modis02-20130309-2214-1oi3rvea/jobs/k/getlanduse-ko5qjd6l > > > > > > > > Caused by: > > > > Application /lustre/beagle/davidk/modis/bin/getlanduse.sh failed > > > > with an exit code of 1 > > > > getLandUse, modis02.swift, line 20 > > > > > > > > real 2m31.463s > > > > user 1m33.238s > > > > sys 0m2.160s > > > > + mv /home/wilde/.swift/runs/current/run024.1362867244 > > > > /home/wilde/.swift/runs/completed > > > > midway001$ > > > > > > > > > > > > ----- Original Message ----- > > > > > From: "David Kelly" > > > > > To: "Michael Wilde" > > > > > Sent: Saturday, March 9, 2013 3:55:30 PM > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > ok, I'll take a look at that. The run dir I used was > > > > > /scratch/midway/davidkelly999/modis/run011 > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > To: "David Kelly" > > > > > Sent: Saturday, March 9, 2013 3:52:28 PM > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > I just tried this, but didnt work - same prob. > > > > > > > > > > But if its working for you now, we must be close. > > > > > > > > > > Not yet sure what the diff is... > > > > > > > > > > My run dir is /home/wilde/osgdemo/modis/svn/run021 > > > > > > > > > > - Mike > > > > > > > > > > ----- Original Message ----- > > > > > > From: "David Kelly" > > > > > > To: "Michael Wilde" > > > > > > Sent: Saturday, March 9, 2013 3:46:13 PM > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > Had to make sure I was using the IP address on eth4 > > > > > > (128.135.112.71 > > > > > > for midway-login1), not a local address or an infiniband > > > > > > address. > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > From: "David Kelly" > > > > > > To: "Michael Wilde" > > > > > > Sent: Saturday, March 9, 2013 3:43:51 PM > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > I just got it working. I had to adjust for the differences in > > > > > > my > > > > > > username on Beagle/Midway, then I had to set GLOBUS_HOSTNAME > > > > > > on > > > > > > Midway to the IP address, rather than the full hostname > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > To: "David Kelly" > > > > > > Sent: Saturday, March 9, 2013 3:40:03 PM > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > From: "David Kelly" > > > > > > > To: "Michael Wilde" > > > > > > > Sent: Saturday, March 9, 2013 3:34:58 PM > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > Is your username the same on beagle and midway? > > > > > > > > > > > > Yes. And I verified that I can ssh to login4 on beagle from > > > > > > my > > > > > > midway > > > > > > session (as indeed the scp's of the proxy files seem to be > > > > > > working) > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > To: "David Kelly" > > > > > > > Sent: Saturday, March 9, 2013 3:34:28 PM > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > OK. > > > > > > > > > > > > > > Ignore what I said about "problem finding java" - thats > > > > > > > code > > > > > > > in > > > > > > > the > > > > > > > very long escaped shell command that gets sent to the > > > > > > > remote > > > > > > > side. > > > > > > > I > > > > > > > dont *think* thats the problem. > > > > > > > > > > > > > > I also verified that beagle can connect to ports 50001 etc > > > > > > > on > > > > > > > swift.rcc, and that seems OK. > > > > > > > > > > > > > > I exported GLOBUS_HOSTNAME=midway001.rcc.uchicago.edu on > > > > > > > the > > > > > > > midway > > > > > > > side. And the beagle side seems to be connecting there. > > > > > > > > > > > > > > Im a bit confused about the timestamps I see for the proxy > > > > > > > expiration > > > > > > > time, but am not yet suspicious of that (although it seems > > > > > > > less > > > > > > > than > > > > > > > 5 hours past GMT... not sure.) > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > From: "David Kelly" > > > > > > > > To: "Michael Wilde" > > > > > > > > Sent: Saturday, March 9, 2013 3:26:32 PM > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > I'm seeing the same error now.. looking into it > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > To: "David Kelly" > > > > > > > > Sent: Saturday, March 9, 2013 3:21:30 PM > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > Looking deeper I see that the logs show problems with > > > > > > > > finding > > > > > > > > Java, > > > > > > > > I > > > > > > > > assume on beagle, ans also service ending (presumably > > > > > > > > coaster > > > > > > > > service on midway host). > > > > > > > > > > > > > > > > I'll dig into these two. > > > > > > > > > > > > > > > > I see that it scp's the proxies to beagle which I think > > > > > > > > answers > > > > > > > > my > > > > > > > > question about security. > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > From: "Michael Wilde" > > > > > > > > > To: "David Kelly" > > > > > > > > > Sent: Saturday, March 9, 2013 3:15:01 PM > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > OK. Any thoughts about beagle? > > > > > > > > > > > > > > > > > > Ive been experimenting but still cant get it to work, > > > > > > > > > same > > > > > > > > > error > > > > > > > > > (cant connect to bootstrap port) > > > > > > > > > > > > > > > > > > WHen you tried ssh-cl to beagle with automatic > > > > > > > > > coasters, > > > > > > > > > what > > > > > > > > > configuration (sites env etc) did you use? > > > > > > > > > > > > > > > > > > I verified that beagle can connect back to the midway > > > > > > > > > hosts > > > > > > > > > and > > > > > > > > > ports. > > > > > > > > > > > > > > > > > > Do we need to specify security or create a proxy etc? > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > From: "David Kelly" > > > > > > > > > > To: "Michael Wilde" > > > > > > > > > > Sent: Saturday, March 9, 2013 3:08:58 PM > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > One way you can override/customize the default > > > > > > > > > > templates > > > > > > > > > > is > > > > > > > > > > to > > > > > > > > > > create > > > > > > > > > > them in $HOME/.swift/sites (I'm not sure if that's > > > > > > > > > > what > > > > > > > > > > you > > > > > > > > > > mean > > > > > > > > > > by > > > > > > > > > > a local sites dir or not). But you are right about > > > > > > > > > > Midway > > > > > > > > > > - > > > > > > > > > > I > > > > > > > > > > have > > > > > > > > > > noticed that when using modis it will sometimes get > > > > > > > > > > stuck > > > > > > > > > > when > > > > > > > > > > it > > > > > > > > > > goes to a queue that is busy. Ideally swift > > > > > > > > > > replication > > > > > > > > > > would > > > > > > > > > > be > > > > > > > > > > able to help better handle that, but I haven't had > > > > > > > > > > much > > > > > > > > > > luck > > > > > > > > > > with > > > > > > > > > > that yet. Another way around this may be to add this > > > > > > > > > > to > > > > > > > > > > the > > > > > > > > > > template: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > key="slurm.exclusive">false > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The swift.log issue was never fixed. It went to > > > > > > > > > > swift-devel > > > > > > > > > > for > > > > > > > > > > discussion but was never fixed. I think it is > > > > > > > > > > relatively > > > > > > > > > > simple > > > > > > > > > > though.. probably worth fixing before release. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > > To: "David Kelly" > > > > > > > > > > Sent: Saturday, March 9, 2013 1:38:47 PM > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > OK, sounds good re the trip plan. Feel free to stay > > > > > > > > > > Tue > > > > > > > > > > night > > > > > > > > > > to > > > > > > > > > > avoid a 4hr drive after a long day. > > > > > > > > > > > > > > > > > > > > Im trying the modis demo. > > > > > > > > > > > > > > > > > > > > I tried to create a local sites/ dir so I can modify > > > > > > > > > > the > > > > > > > > > > sites > > > > > > > > > > templates; thats not working for me either yet. > > > > > > > > > > > > > > > > > > > > For midway, need to force to westmere or sandyb (but > > > > > > > > > > not > > > > > > > > > > both) > > > > > > > > > > and > > > > > > > > > > ensure 1-node jobs, because either queue can get > > > > > > > > > > filled > > > > > > > > > > and > > > > > > > > > > not > > > > > > > > > > yield an idle node for a long time. maybe need to > > > > > > > > > > fiddle > > > > > > > > > > jobsPerNode > > > > > > > > > > to get at least 1 core when the system is busy and > > > > > > > > > > *pretend* > > > > > > > > > > that > > > > > > > > > > its a node. > > > > > > > > > > > > > > > > > > > > So to get response I tried beagle-ssh; That isnt > > > > > > > > > > working > > > > > > > > > > because > > > > > > > > > > the > > > > > > > > > > template sites file is wrong in swift 0.94 rc4. > > > > > > > > > > > > > > > > > > > > I also see that swift.log is still getting produced - > > > > > > > > > > I > > > > > > > > > > thought > > > > > > > > > > we > > > > > > > > > > eliminated that. Did it come back due to a problem > > > > > > > > > > with > > > > > > > > > > that > > > > > > > > > > fix? > > > > > > > > > > > > > > > > > > > > I'll keep hacking; suggestions welcome. > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > From: "David Kelly" > > > > > > > > > > > To: "Michael Wilde" > > > > > > > > > > > Sent: Saturday, March 9, 2013 12:20:00 PM > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Mike, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Looking more closely at the agenda, I think the > > > > > > > > > > > most > > > > > > > > > > > interesting/useful talks will be on Tuesday. Monday > > > > > > > > > > > I'll > > > > > > > > > > > come > > > > > > > > > > > to > > > > > > > > > > > Argonne to work on any loose ends and put the > > > > > > > > > > > finishing > > > > > > > > > > > touches > > > > > > > > > > > on > > > > > > > > > > > any slides/runs/scripts, then drive to Indianapolis > > > > > > > > > > > on > > > > > > > > > > > Monday > > > > > > > > > > > afternoon/evening. I have a hotel booked for Monday > > > > > > > > > > > night. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'll do some runs using the routes we talked about. > > > > > > > > > > > I'm > > > > > > > > > > > pretty > > > > > > > > > > > sure > > > > > > > > > > > I > > > > > > > > > > > have working configurations for everything we > > > > > > > > > > > talked > > > > > > > > > > > about, > > > > > > > > > > > so > > > > > > > > > > > I > > > > > > > > > > > think it's really just a matter of plugging in the > > > > > > > > > > > apps. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > David > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > > > To: "David Kelly" > > > > > > > > > > > Sent: Saturday, March 9, 2013 11:03:15 AM > > > > > > > > > > > Subject: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > Hi David, > > > > > > > > > > > > > > > > > > > > > > I just wanted to let you know that Im looking into > > > > > > > > > > > the > > > > > > > > > > > run > > > > > > > > > > > options > > > > > > > > > > > now. Im hoping to try a few... WIll see how much > > > > > > > > > > > help > > > > > > > > > > > I > > > > > > > > > > > need. > > > > > > > > > > > Have > > > > > > > > > > > you decided on a driving time and made hotel > > > > > > > > > > > arrangements? > > > > > > > > > > > > > > > > > > > > > > I would feel free to stay for whatever portion of > > > > > > > > > > > the > > > > > > > > > > > OSG > > > > > > > > > > > meeting > > > > > > > > > > > you > > > > > > > > > > > feel is of value. The only thing I ask is that for > > > > > > > > > > > Wed > > > > > > > > > > > and > > > > > > > > > > > Thu > > > > > > > > > > > you > > > > > > > > > > > stay available online for user-support or other > > > > > > > > > > > assistance > > > > > > > > > > > needs > > > > > > > > > > > that come up here. And that you engage with people > > > > > > > > > > > that > > > > > > > > > > > can > > > > > > > > > > > help > > > > > > > > > > > us > > > > > > > > > > > develop the Swift user community and reliable OSG > > > > > > > > > > > usage. > > > > > > > > > > > Rob, > > > > > > > > > > > Marco, > > > > > > > > > > > Lincoln, and Suchandra would be good to hang out > > > > > > > > > > > with > > > > > > > > > > > and > > > > > > > > > > > they > > > > > > > > > > > can > > > > > > > > > > > introduce you to good contacts. > > > > > > > > > > > > > > > > > > > > > > Of course we will cover your expenses via a > > > > > > > > > > > UChicago > > > > > > > > > > > travel > > > > > > > > > > > expense > > > > > > > > > > > report. > > > > > > > > > > > > > > > > > > > > > > We'll be starting a project with a tiny bit of > > > > > > > > > > > additional > > > > > > > > > > > ExTENCI > > > > > > > > > > > funds to make Swift do smarter data management on > > > > > > > > > > > OSG > > > > > > > > > > > sites > > > > > > > > > > > (and > > > > > > > > > > > in > > > > > > > > > > > general) so anything you learn about OSG storage > > > > > > > > > > > elements/services/tools will be valuable for that > > > > > > > > > > > (srmcp, > > > > > > > > > > > lcgcp, > > > > > > > > > > > etc). > > > > > > > > > > > > > > > > > > > > > > Between now and your talk, lets just focus on the > > > > > > > > > > > talk, > > > > > > > > > > > OK? > > > > > > > > > > > Im > > > > > > > > > > > hoping > > > > > > > > > > > we have slides frozen by Monday. > > > > > > > > > > > > > > > > > > > > > > While I fiddle, if you could do catsn or other > > > > > > > > > > > hello-world-like > > > > > > > > > > > tests > > > > > > > > > > > to cover the "routes" we discussed, that would pave > > > > > > > > > > > the > > > > > > > > > > > way > > > > > > > > > > > for > > > > > > > > > > > plugging in the real app examples. > > > > > > > > > > > > > > > > > > > > > > Sound good? Let me know of any concerns (other than > > > > > > > > > > > the > > > > > > > > > > > fact > > > > > > > > > > > that > > > > > > > > > > > this is a tad rushed ;) > > > > > > > > > > > > > > > > > > > > > > Thanks and regards, > > > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > Michael Wilde > > > > > > > > > > > Computation Institute, University of Chicago > > > > > > > > > > > Mathematics and Computer Science Division > > > > > > > > > > > Argonne National Laboratory > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel From yadudoc1729 at gmail.com Sun Mar 10 01:45:18 2013 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Sun, 10 Mar 2013 13:15:18 +0530 Subject: [Swift-devel] a note on stress testing In-Reply-To: <434037777.1232168.1362870778578.JavaMail.root@mcs.anl.gov> References: <1522642899.1232101.1362870556551.JavaMail.root@mcs.anl.gov> <434037777.1232168.1362870778578.JavaMail.root@mcs.anl.gov> Message-ID: Hi Mike, >From the last meeting we had, my understanding was that we'd need three types on runs based on where swift runs: -> Local | Local ( Swift runs locally with jobs also run locally ) -> Local | Remote ( Swift runs locally but the compute resources are remote providers ? ) -> Remote | Local ( Swift instantiated on a remote system to coordinate resources locally ) In the case you have pointed out, Swift on Midway submits jobs to beagle. Is this a remote|remote type which I haven't listed? Can we expect to see the same errors, if we were to run swift on an MCS machine to submit the same jobs to beagle? -Yadu On Sun, Mar 10, 2013 at 4:42 AM, Michael Wilde wrote: > Yadu, below is exactly the kind of error Im hoping we can catch in the test suite. > > The one below is happening on remote submissions from midway to beagle using coaster provider staging of 17MB input files. > > So it might need both site-config and stress testing concurrently, to detect. > > - Mike > > > ----- Forwarded Message ----- > From: "Michael Wilde" > To: "Mihael Hategan" > Cc: "Swift Devel" > Sent: Saturday, March 9, 2013 5:09:16 PM > Subject: Re: [Swift-devel] Cant get auto-coasters to run from midway to beagle > > See instead run028. Errors below. > > Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified locally) > > RunID: 20130309-2252-x37dmuy0 > Progress: time: Sat, 09 Mar 2013 22:52:24 +0000 > Progress: time: Sat, 09 Mar 2013 22:52:34 +0000 Selecting site:269 Submitting:47 Submitted:1 > Progress: time: Sat, 09 Mar 2013 22:52:42 +0000 Selecting site:269 Stage in:1 Submitted:47 > Progress: time: Sat, 09 Mar 2013 22:52:54 +0000 Selecting site:269 Stage in:48 > Progress: time: Sat, 09 Mar 2013 22:53:24 +0000 Selecting site:269 Stage in:48 > Progress: time: Sat, 09 Mar 2013 22:53:54 +0000 Selecting site:269 Stage in:48 > Progress: time: Sat, 09 Mar 2013 22:54:24 +0000 Selecting site:269 Stage in:48 > Progress: time: Sat, 09 Mar 2013 22:54:51 +0000 Selecting site:269 Stage in:47 Active:1 > Progress: time: Sat, 09 Mar 2013 22:54:52 +0000 Selecting site:269 Stage in:42 Active:6 > Progress: time: Sat, 09 Mar 2013 22:54:54 +0000 Selecting site:269 Stage in:24 Active:24 > Progress: time: Sat, 09 Mar 2013 22:54:57 +0000 Selecting site:269 Active:47 Stage out:1 > Progress: time: Sat, 09 Mar 2013 22:54:58 +0000 Selecting site:266 Stage in:2 Submitted:1 Active:44 Stage out:1 Finished successfully:3 > Progress: time: Sat, 09 Mar 2013 22:55:00 +0000 Selecting site:261 Stage in:5 Submitting:2 Submitted:1 Active:37 Stage out:3 Finished successfully:8 > Progress: time: Sat, 09 Mar 2013 22:55:01 +0000 Selecting site:254 Stage in:12 Submitting:3 Active:24 Stage out:8 Finished successfully:16 > Progress: time: Sat, 09 Mar 2013 22:55:02 +0000 Selecting site:241 Stage in:23 Submitting:5 Active:15 Stage out:4 Finished successfully:29 > Progress: time: Sat, 09 Mar 2013 22:55:03 +0000 Selecting site:234 Stage in:28 Submitting:7 Stage out:12 Finished successfully:36 > Progress: time: Sat, 09 Mar 2013 22:55:04 +0000 Selecting site:221 Stage in:35 Submitting:12 Submitted:1 Finished successfully:48 > Progress: time: Sat, 09 Mar 2013 22:55:24 +0000 Selecting site:221 Stage in:48 Finished successfully:48 > Progress: time: Sat, 09 Mar 2013 22:55:54 +0000 Selecting site:221 Stage in:48 Finished successfully:48 > Progress: time: Sat, 09 Mar 2013 22:56:08 +0000 Selecting site:221 Stage in:47 Active:1 Finished successfully:48 > Progress: time: Sat, 09 Mar 2013 22:56:14 +0000 Selecting site:221 Stage in:47 Stage out:1 Finished successfully:48 > Progress: time: Sat, 09 Mar 2013 22:56:16 +0000 Selecting site:221 Stage in:47 Finished successfully:49 > Progress: time: Sat, 09 Mar 2013 22:56:19 +0000 Selecting site:220 Stage in:47 Submitted:1 Finished successfully:49 > Progress: time: Sat, 09 Mar 2013 22:56:24 +0000 Selecting site:220 Stage in:48 Finished successfully:49 > Progress: time: Sat, 09 Mar 2013 22:56:29 +0000 Selecting site:220 Stage in:47 Active:1 Finished successfully:49 > Progress: time: Sat, 09 Mar 2013 22:56:35 +0000 Selecting site:220 Stage in:47 Stage out:1 Finished successfully:49 > Channels: {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60121] -> BufferingChannel, null at id://u-7c5b9ab4-13d515b0ca0--8000-u2ecf5a73-13d515b0cac--8000S=MetaChannel[service-60121] -> BufferingChannel, null at id://u2ecf5a73-13d515b0cac--7fff-u-7c5b9ab4-13d515b0ca0--7fffC=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]} > Context: service-60231 > Meta context: service-60121 > Progress: time: Sat, 09 Mar 2013 22:56:37 +0000 Selecting site:220 Stage in:47 Finished successfully:50 > Progress: time: Sat, 09 Mar 2013 22:56:40 +0000 Selecting site:219 Stage in:47 Submitted:1 Finished successfully:50 > Channels: {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60121] -> BufferingChannel, null at id://u-7c5b9ab4-13d515b0ca0--8000-u2ecf5a73-13d515b0cac--8000S=MetaChannel[service-60121] -> BufferingChannel, null at id://u2ecf5a73-13d515b0cac--7fff-u-7c5b9ab4-13d515b0ca0--7fffC=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]} > Context: service-60507 > Meta context: service-60121 > Progress: time: Sat, 09 Mar 2013 22:56:44 +0000 Selecting site:219 Stage in:47 Active:1 Finished successfully:50 > Channels: {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60121] -> BufferingChannel, null at id://u-7c5b9ab4-13d515b0ca0--8000-u2ecf5a73-13d515b0cac--8000S=MetaChannel[service-60121] -> BufferingChannel, null at id://u2ecf5a73-13d515b0cac--7fff-u-7c5b9ab4-13d515b0ca0--7fffC=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]} > Context: service-60742 > Meta context: service-60121 > Progress: time: Sat, 09 Mar 2013 22:56:48 +0000 Selecting site:219 Stage in:46 Active:2 Finished successfully:50 > Execution failed: > Exception in getlanduse: > Arguments: [home/wilde/osgdemo/modis/svn/data/modis/2002/h16v07.rgb] > Host: beagle > Directory: modis02-20130309-2252-x37dmuy0/jobs/d/getlanduse-d3q9ld6l > > Caused by: > Shutting down worker > getLandUse, modis02.swift, line 20 > Attempted to unregister unregistered handler with id 526 > Attempted to unregister unregistered handler with id 534 > Attempted to unregister unregistered handler with id 430 > Attempted to unregister unregistered handler with id 476 > Failed to abort transfer > java.util.ConcurrentModificationException > at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) > at java.util.LinkedList$ListItr.next(LinkedList.java:886) > at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) > at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) > at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) > at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) > at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313) > at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292) > at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560) > at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74) > Attempted to unregister unregistered handler with id 337 > Failed to abort transfer > java.util.ConcurrentModificationException > at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) > at java.util.LinkedList$ListItr.next(LinkedList.java:886) > at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) > at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) > at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) > at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) > at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) > at org.globus.cog.abstraction.impl.execution.coaster.ServiceManager.statusChanged(ServiceManager.java:267) > at org.globus.cog.abstraction.impl.common.task.TaskImpl.notifyListeners(TaskImpl.java:240) > at org.globus.cog.abstraction.impl.common.task.TaskImpl.setStatus(TaskImpl.java:228) > at org.globus.cog.abstraction.impl.common.AbstractDelegatedTaskHandler.failTask(AbstractDelegatedTaskHandler.java:57) > at org.globus.cog.abstraction.impl.execution.local.JobSubmissionTaskHandler.run(JobSubmissionTaskHandler.java:257) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:722) > Failed to abort transfer > java.util.ConcurrentModificationException > at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) > at java.util.LinkedList$ListItr.next(LinkedList.java:886) > at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) > at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) > at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) > at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) > at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313) > at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292) > at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560) > at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74) > Failed to abort transfer > java.util.ConcurrentModificationException > at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) > at java.util.LinkedList$ListItr.next(LinkedList.java:886) > at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) > at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) > at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) > at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) > at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313) > at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292) > at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560) > at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74) > Attempted to unregister unregistered handler with id 348 > Attempted to unregister unregistered handler with id 466 > Attempted to unregister unregistered handler with id 347 > Failed to abort transfer > java.util.ConcurrentModificationException > at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) > at java.util.LinkedList$ListItr.next(LinkedList.java:886) > at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) > at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) > at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) > at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) > at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313) > at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292) > at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560) > at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74) > Attempted to unregister unregistered handler with id 456 > Attempted to unregister unregistered handler with id 454 > Attempted to unregister unregistered handler with id 508 > Failed to abort transfer > java.util.ConcurrentModificationException > at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) > at java.util.LinkedList$ListItr.next(LinkedList.java:886) > at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) > at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) > at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) > at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) > at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313) > at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292) > at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560) > at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74) > Failed to abort transfer > java.util.ConcurrentModificationException > at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) > at java.util.LinkedList$ListItr.next(LinkedList.java:886) > at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) > at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) > at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) > at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) > at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313) > at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292) > at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560) > at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74) > Attempted to unregister unregistered handler with id 511 > Attempted to unregister unregistered handler with id 506 > Failed to abort transfer > java.util.ConcurrentModificationException > at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) > at java.util.LinkedList$ListItr.next(LinkedList.java:886) > at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) > at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) > at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) > at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) > at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313) > at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292) > at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560) > at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74) > Attempted to unregister unregistered handler with id 380 > Attempted to unregister unregistered handler with id 502 > Attempted to unregister unregistered handler with id 376 > Attempted to unregister unregistered handler with id 226 > Failed to abort transfer > java.util.ConcurrentModificationException > at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) > at java.util.LinkedList$ListItr.next(LinkedList.java:886) > at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) > at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) > at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) > at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) > at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313) > at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292) > at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560) > at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74) > Failed to abort transfer > java.util.ConcurrentModificationException > at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) > at java.util.LinkedList$ListItr.next(LinkedList.java:886) > at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) > at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) > at org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) > at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) > at org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313) > at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292) > at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560) > at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74) > Attempted to unregister unregistered handler with id 484 > Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-66-1-1-1362869544093) > Task being removed twice? > java.lang.Throwable > at org.globus.cog.karajan.workflow.nodes.grid.AbstractGridNode.removeTask(AbstractGridNode.java:291) > at org.globus.cog.karajan.workflow.nodes.grid.AbstractGridNode.statusChanged(AbstractGridNode.java:263) > at org.griphyn.vdl.karajan.lib.Execute.statusChanged(Execute.java:136) > at org.globus.cog.karajan.scheduler.AbstractScheduler.fireJobStatusChangeEvent(AbstractScheduler.java:168) > at org.globus.cog.karajan.scheduler.LateBindingScheduler.statusChanged(LateBindingScheduler.java:665) > at org.globus.cog.karajan.scheduler.WeightedHostScoreScheduler.statusChanged(WeightedHostScoreScheduler.java:428) > at org.griphyn.vdl.karajan.VDSAdaptiveScheduler.statusChanged(VDSAdaptiveScheduler.java:426) > at org.globus.cog.abstraction.impl.common.task.TaskImpl.notifyListeners(TaskImpl.java:240) > at org.globus.cog.abstraction.impl.common.task.TaskImpl.setStatus(TaskImpl.java:228) > at org.globus.cog.abstraction.impl.execution.coaster.JobSubmissionTaskHandler.errorReceived(JobSubmissionTaskHandler.java:219) > at org.globus.cog.karajan.workflow.service.commands.Command.errorReceived(Command.java:191) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:227) > at org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313) > at org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292) > at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560) > at org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74) > Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-244-1-1-1362869544091) > Ex098 > java.lang.NullPointerException > at org.globus.cog.karajan.arguments.NamedArgumentsImpl.merge(NamedArgumentsImpl.java:52) > at org.globus.cog.karajan.workflow.nodes.SequentialChoice.commitBuffers(SequentialChoice.java:46) > at org.globus.cog.karajan.workflow.nodes.SequentialChoice.completed(SequentialChoice.java:40) > at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:194) > at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > at org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:98) > at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:722) > Could not fail element > Attempted to close nonexistent channel buffers > > at org.globus.cog.karajan.arguments.ArgUtil.closeBuffers(ArgUtil.java:279) > at org.globus.cog.karajan.workflow.nodes.AbstractParallelIterator.closeBuffers(AbstractParallelIterator.java:107) > at org.globus.cog.karajan.workflow.nodes.AbstractParallelIterator.failed(AbstractParallelIterator.java:143) > at org.globus.cog.karajan.workflow.nodes.FlowNode.failImmediately(FlowNode.java:89) > at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:151) > at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > at org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:98) > at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:722) > Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-84-1-1-1362869544098) > Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-64-1-1-1362869544095) > Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-45-1-1-1362869544101) > Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-79-1-1-1362869544099) > Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-68-1-1-1362869544108) > Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-63-1-1-1362869544103) > Task had no contacts Task(type=JOB_SUBMISSION, identity=urn:0-6-256-1-1-1362869544077) > error null > error null > > real 4m27.856s > user 2m45.576s > sys 0m3.697s > + mv /home/wilde/.swift/runs/current/run028.1362869541 /home/wilde/.swift/runs/completed > midway001$ > > > ----- Original Message ----- >> From: "Michael Wilde" >> To: "Mihael Hategan" >> Cc: "Swift Devel" >> Sent: Saturday, March 9, 2013 5:05:25 PM >> Subject: Re: [Swift-devel] Cant get auto-coasters to run from midway to beagle >> >> Mihael, now I think I have a coaster problem. Curiously, it always >> seems to happen at about 5 mins into the run. >> >> Logs for these runs are on midway in eg >> /home/wilde/osgdemo/modis/svn/run027 >> >> leading portion of error from stdout/err is below. >> >> - Mike >> >> Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified locally) >> >> RunID: 20130309-2252-x37dmuy0 >> Progress: time: Sat, 09 Mar 2013 22:52:24 +0000 >> Progress: time: Sat, 09 Mar 2013 22:52:34 +0000 Selecting site:269 >> Submitting:47 Submitted:1 >> Progress: time: Sat, 09 Mar 2013 22:52:42 +0000 Selecting site:269 >> Stage in:1 Submitted:47 >> Progress: time: Sat, 09 Mar 2013 22:52:54 +0000 Selecting site:269 >> Stage in:48 >> Progress: time: Sat, 09 Mar 2013 22:53:24 +0000 Selecting site:269 >> Stage in:48 >> Progress: time: Sat, 09 Mar 2013 22:53:54 +0000 Selecting site:269 >> Stage in:48 >> Progress: time: Sat, 09 Mar 2013 22:54:24 +0000 Selecting site:269 >> Stage in:48 >> Progress: time: Sat, 09 Mar 2013 22:54:51 +0000 Selecting site:269 >> Stage in:47 Active:1 >> Progress: time: Sat, 09 Mar 2013 22:54:52 +0000 Selecting site:269 >> Stage in:42 Active:6 >> Progress: time: Sat, 09 Mar 2013 22:54:54 +0000 Selecting site:269 >> Stage in:24 Active:24 >> Progress: time: Sat, 09 Mar 2013 22:54:57 +0000 Selecting site:269 >> Active:47 Stage out:1 >> Progress: time: Sat, 09 Mar 2013 22:54:58 +0000 Selecting site:266 >> Stage in:2 Submitted:1 Active:44 Stage out:1 Finished >> successfully:3 >> Progress: time: Sat, 09 Mar 2013 22:55:00 +0000 Selecting site:261 >> Stage in:5 Submitting:2 Submitted:1 Active:37 Stage out:3 >> Finished successfully:8 >> Progress: time: Sat, 09 Mar 2013 22:55:01 +0000 Selecting site:254 >> Stage in:12 Submitting:3 Active:24 Stage out:8 Finished >> successfully:16 >> Progress: time: Sat, 09 Mar 2013 22:55:02 +0000 Selecting site:241 >> Stage in:23 Submitting:5 Active:15 Stage out:4 Finished >> successfully:29 >> Progress: time: Sat, 09 Mar 2013 22:55:03 +0000 Selecting site:234 >> Stage in:28 Submitting:7 Stage out:12 Finished successfully:36 >> Progress: time: Sat, 09 Mar 2013 22:55:04 +0000 Selecting site:221 >> Stage in:35 Submitting:12 Submitted:1 Finished successfully:48 >> Progress: time: Sat, 09 Mar 2013 22:55:24 +0000 Selecting site:221 >> Stage in:48 Finished successfully:48 >> Progress: time: Sat, 09 Mar 2013 22:55:54 +0000 Selecting site:221 >> Stage in:48 Finished successfully:48 >> Progress: time: Sat, 09 Mar 2013 22:56:08 +0000 Selecting site:221 >> Stage in:47 Active:1 Finished successfully:48 >> Progress: time: Sat, 09 Mar 2013 22:56:14 +0000 Selecting site:221 >> Stage in:47 Stage out:1 Finished successfully:48 >> Progress: time: Sat, 09 Mar 2013 22:56:16 +0000 Selecting site:221 >> Stage in:47 Finished successfully:49 >> Progress: time: Sat, 09 Mar 2013 22:56:19 +0000 Selecting site:220 >> Stage in:47 Submitted:1 Finished successfully:49 >> Progress: time: Sat, 09 Mar 2013 22:56:24 +0000 Selecting site:220 >> Stage in:48 Finished successfully:49 >> Progress: time: Sat, 09 Mar 2013 22:56:29 +0000 Selecting site:220 >> Stage in:47 Active:1 Finished successfully:49 >> Progress: time: Sat, 09 Mar 2013 22:56:35 +0000 Selecting site:220 >> Stage in:47 Stage out:1 Finished successfully:49 >> Channels: >> {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] >> -> >> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], >> /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60121] >> -> BufferingChannel, >> null at id://u-7c5b9ab4-13d515b0ca0--8000-u2ecf5a73-13d515b0cac--8000S=MetaChannel[service-60121] >> -> BufferingChannel, >> null at id://u2ecf5a73-13d515b0cac--7fff-u-7c5b9ab4-13d515b0ca0--7fffC=MetaChannel[https://192.5.86.107:50000] >> -> >> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]} >> Context: service-60231 >> Meta context: service-60121 >> Progress: time: Sat, 09 Mar 2013 22:56:37 +0000 Selecting site:220 >> Stage in:47 Finished successfully:50 >> Progress: time: Sat, 09 Mar 2013 22:56:40 +0000 Selecting site:219 >> Stage in:47 Submitted:1 Finished successfully:50 >> Channels: >> {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] >> -> >> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], >> /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60121] >> -> BufferingChannel, >> null at id://u-7c5b9ab4-13d515b0ca0--8000-u2ecf5a73-13d515b0cac--8000S=MetaChannel[service-60121] >> -> BufferingChannel, >> null at id://u2ecf5a73-13d515b0cac--7fff-u-7c5b9ab4-13d515b0ca0--7fffC=MetaChannel[https://192.5.86.107:50000] >> -> >> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]} >> Context: service-60507 >> Meta context: service-60121 >> Progress: time: Sat, 09 Mar 2013 22:56:44 +0000 Selecting site:219 >> Stage in:47 Active:1 Finished successfully:50 >> Channels: >> {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] >> -> >> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], >> /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60121] >> -> BufferingChannel, >> null at id://u-7c5b9ab4-13d515b0ca0--8000-u2ecf5a73-13d515b0cac--8000S=MetaChannel[service-60121] >> -> BufferingChannel, >> null at id://u2ecf5a73-13d515b0cac--7fff-u-7c5b9ab4-13d515b0ca0--7fffC=MetaChannel[https://192.5.86.107:50000] >> -> >> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]} >> Context: service-60742 >> Meta context: service-60121 >> Progress: time: Sat, 09 Mar 2013 22:56:48 +0000 Selecting site:219 >> Stage in:46 Active:2 Finished successfully:50 >> Execution failed: >> Exception in getlanduse: >> Arguments: >> [home/wilde/osgdemo/modis/svn/data/modis/2002/h16v07.rgb] >> Host: beagle >> Directory: >> modis02-20130309-2252-x37dmuy0/jobs/d/getlanduse-d3q9ld6l >> >> Caused by: >> Shutting down worker >> getLandUse, modis02.swift, line 20 >> Attempted to unregister unregistered handler with id 526 >> Attempted to unregister unregistered handler with id 534 >> Attempted to unregister unregistered handler with id 430 >> Attempted to unregister unregistered handler with id 476 >> Failed to abort transfer >> java.util.ConcurrentModificationException >> at >> java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) >> at java.util.LinkedList$ListItr.next(LinkedList.java:886) >> at >> org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) >> at >> org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) >> at >> org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) >> at >> org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) >> at >> org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) >> at >> org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) >> at >> org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) >> at >> org.globus.cog.karajan.workflow.service.channels.ChannelContext.channelShutDown(ChannelContext.java:313) >> at >> org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:292) >> at >> org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleChannelException(AbstractKarajanChannel.java:560) >> at >> org.globus.cog.karajan.workflow.service.channels.Sender.run(Sender.java:74) >> Attempted to unregister unregistered handler with id 337 >> Failed to abort transfer >> java.util.ConcurrentModificationException >> at >> java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) >> at java.util.LinkedList$ListItr.next(LinkedList.java:886) >> at >> org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) >> at >> org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) >> at >> org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) >> at >> org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) >> at >> org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) >> at >> org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) >> at >> org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226) >> at >> org.globus.cog.abstraction.impl.execution.coaster.ServiceManager.statusChanged(ServiceManager.java:267) >> at >> org.globus.cog.abstraction.impl.common.task.TaskImpl.notifyListeners(TaskImpl.java:240) >> at >> org.globus.cog.abstraction.impl.common.task.TaskImpl.setStatus(TaskImpl.java:228) >> at >> org.globus.cog.abstraction.impl.common.AbstractDelegatedTaskHandler.failTask(AbstractDelegatedTaskHandler.java:57) >> at >> org.globus.cog.abstraction.impl.execution.local.JobSubmissionTaskHandler.run(JobSubmissionTaskHandler.java:257) >> at >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >> at >> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) >> at java.util.concurrent.FutureTask.run(FutureTask.java:166) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) >> at java.lang.Thread.run(Thread.java:722) >> Failed to abort transfer >> java.util.ConcurrentModificationException >> at >> java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953) >> at java.util.LinkedList$ListItr.next(LinkedList.java:886) >> at >> org.globus.cog.karajan.workflow.service.handlers.RequestHandler.send(RequestHandler.java:72) >> at >> org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:139) >> at >> org.globus.cog.karajan.workflow.service.RequestReply.sendError(RequestReply.java:111) >> at >> org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.abort(GetFileHandler.java:195) >> at >> org.globus.cog.abstraction.impl.file.coaster.handlers.GetFileHandler.errorReceived(GetFileHandler.java:113) >> at >> org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyListeners(ChannelContext.java:239) >> at >> org.globus.cog.karajan.workflow.service.channels.ChannelContext.notifyRegisteredCommandsAndHandlers(ChannelContext.java:226 >> >> ----- Original Message ----- >> > From: "Michael Wilde" >> > To: "Mihael Hategan" >> > Cc: "Swift Devel" >> > Sent: Saturday, March 9, 2013 4:24:17 PM >> > Subject: Re: [Swift-devel] Cant get auto-coasters to run from >> > midway to beagle >> > >> > I forgot to paste the error, sorry. Its below now, fer real. When >> > I >> > dial down the throttle to 48 and only start 2 beagle nodes, I get >> > further and the app calls make it to active state. The 317 files >> > being staged in here are 17MB each. >> > >> > The swift progress output and error are below: >> > >> > RunID: 20130309-2204-qu9ck076 >> > Progress: time: Sat, 09 Mar 2013 22:04:34 +0000 >> > Progress: time: Sat, 09 Mar 2013 22:04:45 +0000 Submitting:316 >> > Submitted:1 >> > Progress: time: Sat, 09 Mar 2013 22:04:51 +0000 Stage in:1 >> > Submitted:316 >> > Progress: time: Sat, 09 Mar 2013 22:04:52 +0000 Stage in:25 >> > Submitted:292 >> > Progress: time: Sat, 09 Mar 2013 22:04:53 +0000 Stage in:68 >> > Submitted:249 >> > Progress: time: Sat, 09 Mar 2013 22:04:55 +0000 Stage in:113 >> > Submitted:204 >> > Progress: time: Sat, 09 Mar 2013 22:04:56 +0000 Stage in:165 >> > Submitted:152 >> > Progress: time: Sat, 09 Mar 2013 22:04:58 +0000 Stage in:177 >> > Submitted:140 >> > Progress: time: Sat, 09 Mar 2013 22:05:00 +0000 Stage in:225 >> > Submitted:92 >> > Progress: time: Sat, 09 Mar 2013 22:05:04 +0000 Stage in:241 >> > Submitted:76 >> > Progress: time: Sat, 09 Mar 2013 22:05:05 +0000 Stage in:289 >> > Submitted:28 >> > Progress: time: Sat, 09 Mar 2013 22:05:09 +0000 Stage in:305 >> > Submitted:12 >> > Progress: time: Sat, 09 Mar 2013 22:05:34 +0000 Stage in:317 >> > Progress: time: Sat, 09 Mar 2013 22:06:04 +0000 Stage in:317 >> > Progress: time: Sat, 09 Mar 2013 22:06:34 +0000 Stage in:317 >> > Progress: time: Sat, 09 Mar 2013 22:07:04 +0000 Stage in:317 >> > Progress: time: Sat, 09 Mar 2013 22:07:34 +0000 Stage in:317 >> > Progress: time: Sat, 09 Mar 2013 22:08:04 +0000 Stage in:317 >> > Progress: time: Sat, 09 Mar 2013 22:08:34 +0000 Stage in:317 >> > Channels: >> > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] >> > -> >> > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], >> > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60640] >> > -> BufferingChannel, >> > null at id://u-23c37c02-13d512f435d--7fff-u66598f98-13d512f434d--7fffC=MetaChannel[https://192.5.86.107:50000] >> > -> >> > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], >> > null at id://u66598f98-13d512f434d--8000-u-23c37c02-13d512f435d--8000S=MetaChannel[service-60640] >> > -> BufferingChannel} >> > Context: service-60822 >> > Meta context: service-60640 >> > Channels: >> > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] >> > -> >> > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], >> > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60640] >> > -> BufferingChannel, >> > null at id://u-23c37c02-13d512f435d--7fff-u66598f98-13d512f434d--7fffC=MetaChannel[https://192.5.86.107:50000] >> > -> >> > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], >> > null at id://u66598f98-13d512f434d--8000-u-23c37c02-13d512f435d--8000S=MetaChannel[service-60640] >> > -> BufferingChannel} >> > Context: service-60116 >> > Meta context: service-60640 >> > Channels: >> > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] >> > -> >> > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], >> > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60640] >> > -> BufferingChannel, >> > null at id://u-23c37c02-13d512f435d--7fff-u66598f98-13d512f434d--7fffC=MetaChannel[https://192.5.86.107:50000] >> > -> >> > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], >> > null at id://u66598f98-13d512f434d--8000-u-23c37c02-13d512f435d--8000S=MetaChannel[service-60640] >> > -> BufferingChannel} >> > Context: service-60598 >> > Meta context: service-60640 >> > Progress: time: Sat, 09 Mar 2013 22:09:04 +0000 Stage in:317 >> > Progress: time: Sat, 09 Mar 2013 22:09:08 +0000 Stage in:316 >> > Active:1 >> > Execution failed: >> > Exception in getlanduse: >> > Arguments: >> > [home/wilde/osgdemo/modis/svn/data/modis/2002/h15v02.rgb] >> > Host: beagle >> > Directory: >> > modis02-20130309-2204-qu9ck076/jobs/b/getlanduse-bmscjd6l >> > >> > Caused by: >> > Shutting down worker >> > getLandUse, modis02.swift, line 20 >> > error null >> > >> > real 4m36.777s >> > user 2m55.240s >> > sys 0m3.837s >> > >> > >> > --- >> > >> > With a throttle of 48 (.47) and 2 beagle nodes, I see: >> > >> > Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified locally) >> > >> > RunID: 20130309-2214-1oi3rvea >> > Progress: time: Sat, 09 Mar 2013 22:14:06 +0000 >> > Progress: time: Sat, 09 Mar 2013 22:14:17 +0000 Selecting >> > site:269 >> > Submitting:47 Submitted:1 >> > Progress: time: Sat, 09 Mar 2013 22:14:22 +0000 Selecting >> > site:269 >> > Stage in:1 Submitted:47 >> > Progress: time: Sat, 09 Mar 2013 22:14:28 +0000 Selecting >> > site:269 >> > Stage in:25 Submitted:23 >> > Progress: time: Sat, 09 Mar 2013 22:14:36 +0000 Selecting >> > site:269 >> > Stage in:48 >> > Progress: time: Sat, 09 Mar 2013 22:15:06 +0000 Selecting >> > site:269 >> > Stage in:48 >> > Progress: time: Sat, 09 Mar 2013 22:15:36 +0000 Selecting >> > site:269 >> > Stage in:48 >> > Progress: time: Sat, 09 Mar 2013 22:16:06 +0000 Selecting >> > site:269 >> > Stage in:48 >> > Progress: time: Sat, 09 Mar 2013 22:16:26 +0000 Selecting >> > site:269 >> > Stage in:47 Active:1 >> > Progress: time: Sat, 09 Mar 2013 22:16:27 +0000 Selecting >> > site:269 >> > Stage in:36 Active:12 >> > Progress: time: Sat, 09 Mar 2013 22:16:29 +0000 Selecting >> > site:269 >> > Stage in:24 Active:24 >> > Progress: time: Sat, 09 Mar 2013 22:16:34 +0000 Selecting >> > site:269 >> > Stage in:24 Active:23 Stage out:1 >> > Progress: time: Sat, 09 Mar 2013 22:16:35 +0000 Selecting >> > site:269 >> > Stage in:14 Active:33 Stage out:1 >> > Execution failed: >> > Exception in getlanduse: >> > Arguments: >> > [home/wilde/osgdemo/modis/svn/data/modis/2002/h08v04.rgb] >> > Host: beagle >> > Directory: >> > modis02-20130309-2214-1oi3rvea/jobs/k/getlanduse-ko5qjd6l >> > >> > Caused by: >> > Application /lustre/beagle/davidk/modis/bin/getlanduse.sh failed >> > with an exit code of 1 >> > getLandUse, modis02.swift, line 20 >> > >> > real 2m31.463s >> > user 1m33.238s >> > sys 0m2.160s >> > + mv /home/wilde/.swift/runs/current/run024.1362867244 >> > /home/wilde/.swift/runs/completed >> > >> > This error is likely in the demo app code; just pasting here to >> > show >> > that with less concurrency it makes progress. >> > >> > ----- Original Message ----- >> > > From: "Michael Wilde" >> > > To: "Mihael Hategan" >> > > Cc: "Swift Devel" >> > > Sent: Saturday, March 9, 2013 4:11:24 PM >> > > Subject: Re: [Swift-devel] Cant get auto-coasters to run from >> > > midway to beagle >> > > >> > > Now Im getting the error below (from running 317 simple MODIS >> > > apps >> > > concurrently). Im going to dial down the throttle first to see >> > > if >> > > the staging load is overwhelming either coasters or the >> > > midway-beagle path. >> > > >> > > - Mike >> > > >> > > >> > > ----- Original Message ----- >> > > > From: "Michael Wilde" >> > > > To: "Mihael Hategan" >> > > > Cc: "Swift Devel" >> > > > Sent: Saturday, March 9, 2013 3:59:22 PM >> > > > Subject: Re: [Swift-devel] Cant get auto-coasters to run from >> > > > midway to beagle >> > > > >> > > > I think we just got this working. Problems may have included >> > > > the >> > > > need >> > > > to pre-create the workdirectory and to specify a dotted IP >> > > > address >> > > > on the external network for GLOBUS_HOSTNAME. Will need to >> > > > experiment. So likely that proxy expiration time was not a >> > > > problem >> > > > (although its confusing). >> > > > >> > > > Will report back on this once the needed steps are clear. >> > > > >> > > > Thanks, >> > > > >> > > > - Mike >> > > > >> > > > ----- Original Message ----- >> > > > > From: "Mihael Hategan" >> > > > > To: "Michael Wilde" >> > > > > Cc: "Swift Devel" >> > > > > Sent: Saturday, March 9, 2013 3:56:36 PM >> > > > > Subject: Re: Cant get auto-coasters to run from midway to >> > > > > beagle >> > > > > >> > > > > Can you post ,globus/coasters/coaster.log from beagle? >> > > > > >> > > > > On Sat, 2013-03-09 at 15:46 -0600, Michael Wilde wrote: >> > > > > > Mihael, can you advise on this problem? >> > > > > > >> > > > > > David and I are trying to run automatic coaster jobs from >> > > > > > midway >> > > > > > login hosts and swift.rcc to beagle using ssh-cl:pbs. >> > > > > > >> > > > > > My failed attempts are on midway under >> > > > > > /home/wilde/osgdemo/modis/svn, see eg run020 (which has >> > > > > > complete >> > > > > > logs). >> > > > > > >> > > > > > Quick question about the proxy files that get copied. Does >> > > > > > this >> > > > > > look OK? : >> > > > > > >> > > > > > 2013-03-09 21:24:46,895+0000 INFO AutoCA Checking >> > > > > > certificate >> > > > > > /home/wilde/.globus/coasters/proxy.0.pem >> > > > > > 2013-03-09 21:24:46,967+0000 INFO AutoCA Using certificate >> > > > > > /home/wilde/.globus/coasters/proxy.0.pem with expiration >> > > > > > date >> > > > > > Sat >> > > > > > Mar 23\ >> > > > > > 19:25:53 GMT 2013 >> > > > > > >> > > > > > The proxy expiration time listed above is two hours >> > > > > > *earlier* >> > > > > > than >> > > > > > the current time (as seen in the message's UTC timestamp). >> > > > > > Is >> > > > > > that correct, or a possible cause of this problem? >> > > > > > >> > > > > > The main symptom seems to be this: >> > > > > > >> > > > > > Execution failed: >> > > > > > Exception in getlanduse: >> > > > > > Arguments: [../data/modis/2002/h00v09.rgb] >> > > > > > Host: beagle >> > > > > > Directory: >> > > > > > modis01-20130309-2124-7ua3bde3/jobs/d/getlanduse-d24rhd6l >> > > > > > >> > > > > > Caused by: >> > > > > > Could not submit job >> > > > > > Caused by: >> > > > > > Could not start coaster service >> > > > > > Caused by: >> > > > > > Task ended before registration was received. >> > > > > > Failed to download bootstrap jar from >> > > > > > http://midway001.rcc.uchicago.edu:50001 >> > > > > > --- >> > > > > > >> > > > > > Yet Ive verified that midway login4 (which is the target >> > > > > > system) >> > > > > > can connect to this hostname and port (with nc -l and >> > > > > > telnet) >> > > > > > >> > > > > > - Mike >> > > > > > >> > > > > > >> > > > > >> > > > > >> > > > > >> > > > _______________________________________________ >> > > > Swift-devel mailing list >> > > > Swift-devel at ci.uchicago.edu >> > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel >> > > > >> > > _______________________________________________ >> > > Swift-devel mailing list >> > > Swift-devel at ci.uchicago.edu >> > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel >> > > >> > _______________________________________________ >> > Swift-devel mailing list >> > Swift-devel at ci.uchicago.edu >> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel >> > >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel >> > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel -- Thanks and Regards, Yadu Nand B From wilde at mcs.anl.gov Sun Mar 10 11:05:26 2013 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sun, 10 Mar 2013 11:05:26 -0500 (CDT) Subject: [Swift-devel] Java versions for testing - Re: Cant get auto-coasters to run from midway to beagle In-Reply-To: Message-ID: <1318332938.1274845.1362931526599.JavaMail.root@mcs.anl.gov> Good point, Ketan. We should document this on the site guide for Beagle. We should also file a bug to fix this some day (ie find out why we dont work well with this java and if possible fix it. But thats very low prio; for now we want to make sure all Swift users run on Sun Java 1.7. I had sun java in my .modules file. Mihael has now added the -l flag to make sure that automatic coaster bootstrap runs the user's init to pick up such things. Yadu, this raises another issue for the test plan and suite: testing on multiple Javas and carefully deciding and controlling the Java we use. More important than this esoteric Cray IBM Java is validating Swift on popular open source Javas. - Mike ----- Original Message ----- > From: "Ketan Maheshwari" > To: "Mihael Hategan" > Cc: "Michael Wilde" , "Swift Devel" > Sent: Saturday, March 9, 2013 8:24:56 PM > Subject: Re: [Swift-devel] Cant get auto-coasters to run from midway to beagle > > > IBM jdk on beagle is known to not function well with Swift coasters. > We had to switch to Sun jdk for ssh:pbs runs from > bridled/communicado. > > > > On Sat, Mar 9, 2013 at 7:43 PM, Mihael Hategan < hategan at mcs.anl.gov > > wrote: > > > I noticed some random weirdness due to the fact that the coaster > service > runs with the ibm jdk. > > I'll run some tests with both and see what happens. > > Mihael > > > > On Sat, 2013-03-09 at 17:49 -0600, Michael Wilde wrote: > > An update on this provider staging related issue: reducing filesize > > from 17MB to 600KB runs well. > > > > So seems like some kind of flow control or buffer management > > problem, possibly? > > > > May need to take that problem offline - would be a perfect test > > case for Yadu to develop a new stress test for. > > > > - Mike > > > > > > ----- Forwarded Message ----- > > From: "Michael Wilde" < wilde at mcs.anl.gov > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > Sent: Saturday, March 9, 2013 5:21:49 PM > > Subject: Re: runs for OSG talk > > > > OK, much better: with 600K files (5x5 reduction or 25X smaller) it > > works well, and fast (form midway to beagle!) > > > > Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified locally) > > > > RunID: 20130309-2319-5zq0jrfg > > Progress: time: Sat, 09 Mar 2013 23:19:45 +0000 > > Progress: time: Sat, 09 Mar 2013 23:19:56 +0000 Selecting site:269 > > Submitting:47 Submitted:1 > > Progress: time: Sat, 09 Mar 2013 23:20:05 +0000 Selecting site:269 > > Stage in:1 Submitted:47 > > Progress: time: Sat, 09 Mar 2013 23:20:09 +0000 Selecting site:269 > > Stage in:47 Active:1 > > Progress: time: Sat, 09 Mar 2013 23:20:10 +0000 Selecting site:269 > > Stage in:46 Active:1 Stage out:1 > > Progress: time: Sat, 09 Mar 2013 23:20:11 +0000 Selecting site:250 > > Stage in:19 Active:28 Stage out:1 Finished successfully:19 > > Progress: time: Sat, 09 Mar 2013 23:20:12 +0000 Selecting site:229 > > Stage in:18 Submitting:21 Active:1 Stage out:7 Finished > > successfully:41 > > Progress: time: Sat, 09 Mar 2013 23:20:13 +0000 Selecting site:220 > > Stage in:41 Submitting:1 Active:5 Stage out:1 Finished > > successfully:49 > > Progress: time: Sat, 09 Mar 2013 23:20:14 +0000 Selecting site:220 > > Stage in:38 Active:1 Stage out:9 Finished successfully:49 > > Progress: time: Sat, 09 Mar 2013 23:20:15 +0000 Selecting site:212 > > Stage in:30 Submitting:8 Stage out:9 Finished successfully:58 > > Progress: time: Sat, 09 Mar 2013 23:20:16 +0000 Selecting site:203 > > Stage in:38 Submitting:8 Submitted:1 Finished successfully:67 > > Progress: time: Sat, 09 Mar 2013 23:20:18 +0000 Selecting site:202 > > Stage in:19 Stage out:28 Finished successfully:68 > > Progress: time: Sat, 09 Mar 2013 23:20:19 +0000 Selecting site:172 > > Stage in:33 Submitting:2 Submitted:6 Active:5 Stage out:2 Finished > > successfully:97 > > Progress: time: Sat, 09 Mar 2013 23:20:20 +0000 Selecting site:170 > > Stage in:31 Submitting:2 Stage out:14 Finished successfully:100 > > Progress: time: Sat, 09 Mar 2013 23:20:21 +0000 Selecting site:162 > > Stage in:30 Submitting:10 Stage out:6 Finished successfully:109 > > Progress: time: Sat, 09 Mar 2013 23:20:22 +0000 Selecting site:154 > > Stage in:39 Submitting:5 Submitted:3 Active:1 Finished > > successfully:115 > > Progress: time: Sat, 09 Mar 2013 23:20:23 +0000 Selecting site:154 > > Stage in:21 Active:10 Stage out:16 Finished successfully:116 > > Progress: time: Sat, 09 Mar 2013 23:20:24 +0000 Selecting site:126 > > Stage in:20 Submitting:25 Submitted:1 Stage out:2 Finished > > successfully:143 > > Progress: time: Sat, 09 Mar 2013 23:20:25 +0000 Selecting site:124 > > Stage in:31 Active:2 Stage out:15 Finished successfully:145 > > Progress: time: Sat, 09 Mar 2013 23:20:26 +0000 Selecting site:110 > > Stage in:30 Submitting:14 Stage out:3 Finished successfully:160 > > Progress: time: Sat, 09 Mar 2013 23:20:27 +0000 Selecting site:106 > > Stage in:43 Submitting:1 Submitted:1 Active:1 Stage out:2 Finished > > successfully:163 > > Progress: time: Sat, 09 Mar 2013 23:20:28 +0000 Selecting site:104 > > Stage in:20 Submitting:2 Active:7 Stage out:19 Finished > > successfully:165 > > Progress: time: Sat, 09 Mar 2013 23:20:29 +0000 Selecting site:78 > > Stage in:29 Submitting:16 Submitted:1 Stage out:2 Finished > > successfully:191 > > Progress: time: Sat, 09 Mar 2013 23:20:31 +0000 Selecting site:76 > > Stage in:30 Stage out:17 Finished successfully:194 > > Progress: time: Sat, 09 Mar 2013 23:20:32 +0000 Selecting site:58 > > Stage in:29 Submitting:18 Active:1 Finished successfully:211 > > Progress: time: Sat, 09 Mar 2013 23:20:33 +0000 Selecting site:58 > > Stage in:33 Active:3 Stage out:12 Finished successfully:211 > > Progress: time: Sat, 09 Mar 2013 23:20:34 +0000 Selecting site:46 > > Stage in:18 Submitting:11 Submitted:1 Active:2 Stage out:14 > > Finished successfully:225 > > Progress: time: Sat, 09 Mar 2013 23:20:35 +0000 Selecting site:30 > > Stage in:29 Active:14 Stage out:3 Finished successfully:241 > > Progress: time: Sat, 09 Mar 2013 23:20:36 +0000 Selecting site:28 > > Stage in:28 Submitting:2 Stage out:17 Finished successfully:242 > > Progress: time: Sat, 09 Mar 2013 23:20:37 +0000 Selecting site:10 > > Stage in:30 Submitting:17 Submitted:1 Finished successfully:259 > > Progress: time: Sat, 09 Mar 2013 23:20:38 +0000 Selecting site:10 > > Stage in:35 Stage out:13 Finished successfully:259 > > Progress: time: Sat, 09 Mar 2013 23:20:39 +0000 Stage in:21 > > Submitting:6 Submitted:3 Stage out:15 Finished successfully:272 > > Progress: time: Sat, 09 Mar 2013 23:20:40 +0000 Stage in:10 > > Active:5 Stage out:14 Finished successfully:288 > > Final status: Sat, 09 Mar 2013 23:20:41 +0000 Finished > > successfully:317 > > > > real 0m58.953s > > user 0m32.573s > > sys 0m1.263s > > + mv /home/wilde/.swift/runs/current/run029.1362871183 > > /home/wilde/.swift/runs/completed > > midway001$ > > > > > > > > ----- Original Message ----- > > > From: "David Kelly" < davidk at ci.uchicago.edu > > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > > Sent: Saturday, March 9, 2013 5:12:59 PM > > > Subject: Re: runs for OSG talk > > > > > > > > > Yep - I had a version where the input files were in a very > > > similar > > > format (PGM, 1 byte per pixel). I'll add that back, but without > > > the > > > small PGM header in the files. > > > > > > ----- Original Message ----- > > > > > > > > > From: "Michael Wilde" < wilde at mcs.anl.gov > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > Sent: Saturday, March 9, 2013 5:04:43 PM > > > Subject: Re: runs for OSG talk > > > > > > I think we need to cut down the size of these files for a demo > > > (although they are great for a stress test). > > > > > > First, the RGB format by itself uses 3 bytes per pixel when it > > > only > > > needs one (for land use) > > > > > > Second, we should cut down by a factor of 9 (3x3) or 16 (4x4). > > > > > > I tried that using simple convert statements, but it always seems > > > to > > > yield a file exactly double what it should be. > > > > > > More on this later; was hoping to get things working "as is" > > > first. > > > > > > I assume you could get the perl code to work on > > > one-byte-per-pixel > > > instead of the default 3 for the convert rgb format? > > > > > > - Mike > > > > > > ----- Original Message ----- > > > > From: "David Kelly" < davidk at ci.uchicago.edu > > > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > > > Sent: Saturday, March 9, 2013 4:36:30 PM > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > That would probably be a good idea for a new script, to show > > > > how to > > > > stage apps like that. For now I updated the scripts on lustre.. > > > > hopefully that helps. > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > From: "Michael Wilde" < wilde at mcs.anl.gov > > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > > Sent: Saturday, March 9, 2013 4:29:14 PM > > > > Subject: Re: runs for OSG talk > > > > > > > > OK, I see that its trying to run getlanduse.sh from your > > > > /lustre > > > > dir > > > > on beagle, which is different than the one Ive got checked out. > > > > It > > > > seems to get an error in a stderr redirect??? Let me se what I > > > > need > > > > to do to get the beagle side in sync. > > > > > > > > Seems like since these are perl scripts, we should make the > > > > app() > > > > /bin/sh and send the script as data, perhaps? > > > > > > > > - Mike > > > > > > > > ----- Original Message ----- > > > > > From: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > > > Sent: Saturday, March 9, 2013 4:19:31 PM > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > OK, making progress. Now I dialed down the throttle and node > > > > > counts > > > > > to 48 jobs. > > > > > > > > > > Now I get further, for ./demo and site=4 script=2: > > > > > > > > > > RunID: 20130309-2214-1oi3rvea > > > > > Progress: time: Sat, 09 Mar 2013 22:14:06 +0000 > > > > > Progress: time: Sat, 09 Mar 2013 22:14:17 +0000 Selecting > > > > > site:269 > > > > > Submitting:47 Submitted:1 > > > > > Progress: time: Sat, 09 Mar 2013 22:14:22 +0000 Selecting > > > > > site:269 > > > > > Stage in:1 Submitted:47 > > > > > Progress: time: Sat, 09 Mar 2013 22:14:28 +0000 Selecting > > > > > site:269 > > > > > Stage in:25 Submitted:23 > > > > > Progress: time: Sat, 09 Mar 2013 22:14:36 +0000 Selecting > > > > > site:269 > > > > > Stage in:48 > > > > > Progress: time: Sat, 09 Mar 2013 22:15:06 +0000 Selecting > > > > > site:269 > > > > > Stage in:48 > > > > > Progress: time: Sat, 09 Mar 2013 22:15:36 +0000 Selecting > > > > > site:269 > > > > > Stage in:48 > > > > > Progress: time: Sat, 09 Mar 2013 22:16:06 +0000 Selecting > > > > > site:269 > > > > > Stage in:48 > > > > > Progress: time: Sat, 09 Mar 2013 22:16:26 +0000 Selecting > > > > > site:269 > > > > > Stage in:47 Active:1 > > > > > Progress: time: Sat, 09 Mar 2013 22:16:27 +0000 Selecting > > > > > site:269 > > > > > Stage in:36 Active:12 > > > > > Progress: time: Sat, 09 Mar 2013 22:16:29 +0000 Selecting > > > > > site:269 > > > > > Stage in:24 Active:24 > > > > > Progress: time: Sat, 09 Mar 2013 22:16:34 +0000 Selecting > > > > > site:269 > > > > > Stage in:24 Active:23 Stage out:1 > > > > > Progress: time: Sat, 09 Mar 2013 22:16:35 +0000 Selecting > > > > > site:269 > > > > > Stage in:14 Active:33 Stage out:1 > > > > > Execution failed: > > > > > Exception in getlanduse: > > > > > Arguments: > > > > > [home/wilde/osgdemo/modis/svn/data/modis/2002/h08v04.rgb] > > > > > Host: beagle > > > > > Directory: > > > > > modis02-20130309-2214-1oi3rvea/jobs/k/getlanduse-ko5qjd6l > > > > > > > > > > Caused by: > > > > > Application /lustre/beagle/davidk/modis/bin/getlanduse.sh > > > > > failed > > > > > with an exit code of 1 > > > > > getLandUse, modis02.swift, line 20 > > > > > > > > > > real 2m31.463s > > > > > user 1m33.238s > > > > > sys 0m2.160s > > > > > + mv /home/wilde/.swift/runs/current/run024.1362867244 > > > > > /home/wilde/.swift/runs/completed > > > > > midway001$ > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > From: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > Sent: Saturday, March 9, 2013 3:55:30 PM > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > ok, I'll take a look at that. The run dir I used was > > > > > > /scratch/midway/davidkelly999/modis/run011 > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > From: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > Sent: Saturday, March 9, 2013 3:52:28 PM > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > I just tried this, but didnt work - same prob. > > > > > > > > > > > > But if its working for you now, we must be close. > > > > > > > > > > > > Not yet sure what the diff is... > > > > > > > > > > > > My run dir is /home/wilde/osgdemo/modis/svn/run021 > > > > > > > > > > > > - Mike > > > > > > > > > > > > ----- Original Message ----- > > > > > > > From: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > > Sent: Saturday, March 9, 2013 3:46:13 PM > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > Had to make sure I was using the IP address on eth4 > > > > > > > (128.135.112.71 > > > > > > > for midway-login1), not a local address or an infiniband > > > > > > > address. > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > From: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > > Sent: Saturday, March 9, 2013 3:43:51 PM > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > I just got it working. I had to adjust for the > > > > > > > differences in > > > > > > > my > > > > > > > username on Beagle/Midway, then I had to set > > > > > > > GLOBUS_HOSTNAME > > > > > > > on > > > > > > > Midway to the IP address, rather than the full hostname > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > > Sent: Saturday, March 9, 2013 3:40:03 PM > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > From: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > > > Sent: Saturday, March 9, 2013 3:34:58 PM > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > Is your username the same on beagle and midway? > > > > > > > > > > > > > > Yes. And I verified that I can ssh to login4 on beagle > > > > > > > from > > > > > > > my > > > > > > > midway > > > > > > > session (as indeed the scp's of the proxy files seem to > > > > > > > be > > > > > > > working) > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > > > Sent: Saturday, March 9, 2013 3:34:28 PM > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > OK. > > > > > > > > > > > > > > > > Ignore what I said about "problem finding java" - thats > > > > > > > > code > > > > > > > > in > > > > > > > > the > > > > > > > > very long escaped shell command that gets sent to the > > > > > > > > remote > > > > > > > > side. > > > > > > > > I > > > > > > > > dont *think* thats the problem. > > > > > > > > > > > > > > > > I also verified that beagle can connect to ports 50001 > > > > > > > > etc > > > > > > > > on > > > > > > > > swift.rcc, and that seems OK. > > > > > > > > > > > > > > > > I exported GLOBUS_HOSTNAME= midway001.rcc.uchicago.edu > > > > > > > > on > > > > > > > > the > > > > > > > > midway > > > > > > > > side. And the beagle side seems to be connecting there. > > > > > > > > > > > > > > > > Im a bit confused about the timestamps I see for the > > > > > > > > proxy > > > > > > > > expiration > > > > > > > > time, but am not yet suspicious of that (although it > > > > > > > > seems > > > > > > > > less > > > > > > > > than > > > > > > > > 5 hours past GMT... not sure.) > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > From: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > > > > Sent: Saturday, March 9, 2013 3:26:32 PM > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > I'm seeing the same error now.. looking into it > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > > > > Sent: Saturday, March 9, 2013 3:21:30 PM > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > Looking deeper I see that the logs show problems with > > > > > > > > > finding > > > > > > > > > Java, > > > > > > > > > I > > > > > > > > > assume on beagle, ans also service ending (presumably > > > > > > > > > coaster > > > > > > > > > service on midway host). > > > > > > > > > > > > > > > > > > I'll dig into these two. > > > > > > > > > > > > > > > > > > I see that it scp's the proxies to beagle which I > > > > > > > > > think > > > > > > > > > answers > > > > > > > > > my > > > > > > > > > question about security. > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > From: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > > > > > Sent: Saturday, March 9, 2013 3:15:01 PM > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > OK. Any thoughts about beagle? > > > > > > > > > > > > > > > > > > > > Ive been experimenting but still cant get it to > > > > > > > > > > work, > > > > > > > > > > same > > > > > > > > > > error > > > > > > > > > > (cant connect to bootstrap port) > > > > > > > > > > > > > > > > > > > > WHen you tried ssh-cl to beagle with automatic > > > > > > > > > > coasters, > > > > > > > > > > what > > > > > > > > > > configuration (sites env etc) did you use? > > > > > > > > > > > > > > > > > > > > I verified that beagle can connect back to the > > > > > > > > > > midway > > > > > > > > > > hosts > > > > > > > > > > and > > > > > > > > > > ports. > > > > > > > > > > > > > > > > > > > > Do we need to specify security or create a proxy > > > > > > > > > > etc? > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > From: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > > > > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > > > > > > Sent: Saturday, March 9, 2013 3:08:58 PM > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > One way you can override/customize the default > > > > > > > > > > > templates > > > > > > > > > > > is > > > > > > > > > > > to > > > > > > > > > > > create > > > > > > > > > > > them in $HOME/.swift/sites (I'm not sure if > > > > > > > > > > > that's > > > > > > > > > > > what > > > > > > > > > > > you > > > > > > > > > > > mean > > > > > > > > > > > by > > > > > > > > > > > a local sites dir or not). But you are right > > > > > > > > > > > about > > > > > > > > > > > Midway > > > > > > > > > > > - > > > > > > > > > > > I > > > > > > > > > > > have > > > > > > > > > > > noticed that when using modis it will sometimes > > > > > > > > > > > get > > > > > > > > > > > stuck > > > > > > > > > > > when > > > > > > > > > > > it > > > > > > > > > > > goes to a queue that is busy. Ideally swift > > > > > > > > > > > replication > > > > > > > > > > > would > > > > > > > > > > > be > > > > > > > > > > > able to help better handle that, but I haven't > > > > > > > > > > > had > > > > > > > > > > > much > > > > > > > > > > > luck > > > > > > > > > > > with > > > > > > > > > > > that yet. Another way around this may be to add > > > > > > > > > > > this > > > > > > > > > > > to > > > > > > > > > > > the > > > > > > > > > > > template: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > key="slurm.exclusive">false > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The swift.log issue was never fixed. It went to > > > > > > > > > > > swift-devel > > > > > > > > > > > for > > > > > > > > > > > discussion but was never fixed. I think it is > > > > > > > > > > > relatively > > > > > > > > > > > simple > > > > > > > > > > > though.. probably worth fixing before release. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > > > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > > > > > > Sent: Saturday, March 9, 2013 1:38:47 PM > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > OK, sounds good re the trip plan. Feel free to > > > > > > > > > > > stay > > > > > > > > > > > Tue > > > > > > > > > > > night > > > > > > > > > > > to > > > > > > > > > > > avoid a 4hr drive after a long day. > > > > > > > > > > > > > > > > > > > > > > Im trying the modis demo. > > > > > > > > > > > > > > > > > > > > > > I tried to create a local sites/ dir so I can > > > > > > > > > > > modify > > > > > > > > > > > the > > > > > > > > > > > sites > > > > > > > > > > > templates; thats not working for me either yet. > > > > > > > > > > > > > > > > > > > > > > For midway, need to force to westmere or sandyb > > > > > > > > > > > (but > > > > > > > > > > > not > > > > > > > > > > > both) > > > > > > > > > > > and > > > > > > > > > > > ensure 1-node jobs, because either queue can get > > > > > > > > > > > filled > > > > > > > > > > > and > > > > > > > > > > > not > > > > > > > > > > > yield an idle node for a long time. maybe need to > > > > > > > > > > > fiddle > > > > > > > > > > > jobsPerNode > > > > > > > > > > > to get at least 1 core when the system is busy > > > > > > > > > > > and > > > > > > > > > > > *pretend* > > > > > > > > > > > that > > > > > > > > > > > its a node. > > > > > > > > > > > > > > > > > > > > > > So to get response I tried beagle-ssh; That isnt > > > > > > > > > > > working > > > > > > > > > > > because > > > > > > > > > > > the > > > > > > > > > > > template sites file is wrong in swift 0.94 rc4. > > > > > > > > > > > > > > > > > > > > > > I also see that swift.log is still getting > > > > > > > > > > > produced - > > > > > > > > > > > I > > > > > > > > > > > thought > > > > > > > > > > > we > > > > > > > > > > > eliminated that. Did it come back due to a > > > > > > > > > > > problem > > > > > > > > > > > with > > > > > > > > > > > that > > > > > > > > > > > fix? > > > > > > > > > > > > > > > > > > > > > > I'll keep hacking; suggestions welcome. > > > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > From: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > > > > > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > > > > > > > Sent: Saturday, March 9, 2013 12:20:00 PM > > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Mike, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Looking more closely at the agenda, I think the > > > > > > > > > > > > most > > > > > > > > > > > > interesting/useful talks will be on Tuesday. > > > > > > > > > > > > Monday > > > > > > > > > > > > I'll > > > > > > > > > > > > come > > > > > > > > > > > > to > > > > > > > > > > > > Argonne to work on any loose ends and put the > > > > > > > > > > > > finishing > > > > > > > > > > > > touches > > > > > > > > > > > > on > > > > > > > > > > > > any slides/runs/scripts, then drive to > > > > > > > > > > > > Indianapolis > > > > > > > > > > > > on > > > > > > > > > > > > Monday > > > > > > > > > > > > afternoon/evening. I have a hotel booked for > > > > > > > > > > > > Monday > > > > > > > > > > > > night. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'll do some runs using the routes we talked > > > > > > > > > > > > about. > > > > > > > > > > > > I'm > > > > > > > > > > > > pretty > > > > > > > > > > > > sure > > > > > > > > > > > > I > > > > > > > > > > > > have working configurations for everything we > > > > > > > > > > > > talked > > > > > > > > > > > > about, > > > > > > > > > > > > so > > > > > > > > > > > > I > > > > > > > > > > > > think it's really just a matter of plugging in > > > > > > > > > > > > the > > > > > > > > > > > > apps. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > David > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > > > > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > > > > > > > Sent: Saturday, March 9, 2013 11:03:15 AM > > > > > > > > > > > > Subject: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > Hi David, > > > > > > > > > > > > > > > > > > > > > > > > I just wanted to let you know that Im looking > > > > > > > > > > > > into > > > > > > > > > > > > the > > > > > > > > > > > > run > > > > > > > > > > > > options > > > > > > > > > > > > now. Im hoping to try a few... WIll see how > > > > > > > > > > > > much > > > > > > > > > > > > help > > > > > > > > > > > > I > > > > > > > > > > > > need. > > > > > > > > > > > > Have > > > > > > > > > > > > you decided on a driving time and made hotel > > > > > > > > > > > > arrangements? > > > > > > > > > > > > > > > > > > > > > > > > I would feel free to stay for whatever portion > > > > > > > > > > > > of > > > > > > > > > > > > the > > > > > > > > > > > > OSG > > > > > > > > > > > > meeting > > > > > > > > > > > > you > > > > > > > > > > > > feel is of value. The only thing I ask is that > > > > > > > > > > > > for > > > > > > > > > > > > Wed > > > > > > > > > > > > and > > > > > > > > > > > > Thu > > > > > > > > > > > > you > > > > > > > > > > > > stay available online for user-support or other > > > > > > > > > > > > assistance > > > > > > > > > > > > needs > > > > > > > > > > > > that come up here. And that you engage with > > > > > > > > > > > > people > > > > > > > > > > > > that > > > > > > > > > > > > can > > > > > > > > > > > > help > > > > > > > > > > > > us > > > > > > > > > > > > develop the Swift user community and reliable > > > > > > > > > > > > OSG > > > > > > > > > > > > usage. > > > > > > > > > > > > Rob, > > > > > > > > > > > > Marco, > > > > > > > > > > > > Lincoln, and Suchandra would be good to hang > > > > > > > > > > > > out > > > > > > > > > > > > with > > > > > > > > > > > > and > > > > > > > > > > > > they > > > > > > > > > > > > can > > > > > > > > > > > > introduce you to good contacts. > > > > > > > > > > > > > > > > > > > > > > > > Of course we will cover your expenses via a > > > > > > > > > > > > UChicago > > > > > > > > > > > > travel > > > > > > > > > > > > expense > > > > > > > > > > > > report. > > > > > > > > > > > > > > > > > > > > > > > > We'll be starting a project with a tiny bit of > > > > > > > > > > > > additional > > > > > > > > > > > > ExTENCI > > > > > > > > > > > > funds to make Swift do smarter data management > > > > > > > > > > > > on > > > > > > > > > > > > OSG > > > > > > > > > > > > sites > > > > > > > > > > > > (and > > > > > > > > > > > > in > > > > > > > > > > > > general) so anything you learn about OSG > > > > > > > > > > > > storage > > > > > > > > > > > > elements/services/tools will be valuable for > > > > > > > > > > > > that > > > > > > > > > > > > (srmcp, > > > > > > > > > > > > lcgcp, > > > > > > > > > > > > etc). > > > > > > > > > > > > > > > > > > > > > > > > Between now and your talk, lets just focus on > > > > > > > > > > > > the > > > > > > > > > > > > talk, > > > > > > > > > > > > OK? > > > > > > > > > > > > Im > > > > > > > > > > > > hoping > > > > > > > > > > > > we have slides frozen by Monday. > > > > > > > > > > > > > > > > > > > > > > > > While I fiddle, if you could do catsn or other > > > > > > > > > > > > hello-world-like > > > > > > > > > > > > tests > > > > > > > > > > > > to cover the "routes" we discussed, that would > > > > > > > > > > > > pave > > > > > > > > > > > > the > > > > > > > > > > > > way > > > > > > > > > > > > for > > > > > > > > > > > > plugging in the real app examples. > > > > > > > > > > > > > > > > > > > > > > > > Sound good? Let me know of any concerns (other > > > > > > > > > > > > than > > > > > > > > > > > > the > > > > > > > > > > > > fact > > > > > > > > > > > > that > > > > > > > > > > > > this is a tad rushed ;) > > > > > > > > > > > > > > > > > > > > > > > > Thanks and regards, > > > > > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > Michael Wilde > > > > > > > > > > > > Computation Institute, University of Chicago > > > > > > > > > > > > Mathematics and Computer Science Division > > > > > > > > > > > > Argonne National Laboratory > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > -- > Ketan > > From tim.g.armstrong at gmail.com Sun Mar 10 11:10:45 2013 From: tim.g.armstrong at gmail.com (Tim Armstrong) Date: Sun, 10 Mar 2013 11:10:45 -0500 Subject: [Swift-devel] Java versions for testing - Re: Cant get auto-coasters to run from midway to beagle In-Reply-To: <1318332938.1274845.1362931526599.JavaMail.root@mcs.anl.gov> References: <1318332938.1274845.1362931526599.JavaMail.root@mcs.anl.gov> Message-ID: YMMV but for other projects I've not found any compatibility issues between OpenJDK and Oracle's java - their codebases are mostly the same anyway now. I don't think there are really any other open source jdks in wide use (GCJ is pretty much a dead project, about a decade behind the Java standard). - Tim On Sun, Mar 10, 2013 at 11:05 AM, Michael Wilde wrote: > Good point, Ketan. We should document this on the site guide for Beagle. > > We should also file a bug to fix this some day (ie find out why we dont > work well with this java and if possible fix it. But thats very low prio; > for now we want to make sure all Swift users run on Sun Java 1.7. > > I had sun java in my .modules file. Mihael has now added the -l flag to > make sure that automatic coaster bootstrap runs the user's init to pick up > such things. > > Yadu, this raises another issue for the test plan and suite: testing on > multiple Javas and carefully deciding and controlling the Java we use. More > important than this esoteric Cray IBM Java is validating Swift on popular > open source Javas. > > - Mike > > ----- Original Message ----- > > From: "Ketan Maheshwari" > > To: "Mihael Hategan" > > Cc: "Michael Wilde" , "Swift Devel" < > swift-devel at ci.uchicago.edu> > > Sent: Saturday, March 9, 2013 8:24:56 PM > > Subject: Re: [Swift-devel] Cant get auto-coasters to run from midway to > beagle > > > > > > IBM jdk on beagle is known to not function well with Swift coasters. > > We had to switch to Sun jdk for ssh:pbs runs from > > bridled/communicado. > > > > > > > > On Sat, Mar 9, 2013 at 7:43 PM, Mihael Hategan < hategan at mcs.anl.gov > > > wrote: > > > > > > I noticed some random weirdness due to the fact that the coaster > > service > > runs with the ibm jdk. > > > > I'll run some tests with both and see what happens. > > > > Mihael > > > > > > > > On Sat, 2013-03-09 at 17:49 -0600, Michael Wilde wrote: > > > An update on this provider staging related issue: reducing filesize > > > from 17MB to 600KB runs well. > > > > > > So seems like some kind of flow control or buffer management > > > problem, possibly? > > > > > > May need to take that problem offline - would be a perfect test > > > case for Yadu to develop a new stress test for. > > > > > > - Mike > > > > > > > > > ----- Forwarded Message ----- > > > From: "Michael Wilde" < wilde at mcs.anl.gov > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > Sent: Saturday, March 9, 2013 5:21:49 PM > > > Subject: Re: runs for OSG talk > > > > > > OK, much better: with 600K files (5x5 reduction or 25X smaller) it > > > works well, and fast (form midway to beagle!) > > > > > > Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified locally) > > > > > > RunID: 20130309-2319-5zq0jrfg > > > Progress: time: Sat, 09 Mar 2013 23:19:45 +0000 > > > Progress: time: Sat, 09 Mar 2013 23:19:56 +0000 Selecting site:269 > > > Submitting:47 Submitted:1 > > > Progress: time: Sat, 09 Mar 2013 23:20:05 +0000 Selecting site:269 > > > Stage in:1 Submitted:47 > > > Progress: time: Sat, 09 Mar 2013 23:20:09 +0000 Selecting site:269 > > > Stage in:47 Active:1 > > > Progress: time: Sat, 09 Mar 2013 23:20:10 +0000 Selecting site:269 > > > Stage in:46 Active:1 Stage out:1 > > > Progress: time: Sat, 09 Mar 2013 23:20:11 +0000 Selecting site:250 > > > Stage in:19 Active:28 Stage out:1 Finished successfully:19 > > > Progress: time: Sat, 09 Mar 2013 23:20:12 +0000 Selecting site:229 > > > Stage in:18 Submitting:21 Active:1 Stage out:7 Finished > > > successfully:41 > > > Progress: time: Sat, 09 Mar 2013 23:20:13 +0000 Selecting site:220 > > > Stage in:41 Submitting:1 Active:5 Stage out:1 Finished > > > successfully:49 > > > Progress: time: Sat, 09 Mar 2013 23:20:14 +0000 Selecting site:220 > > > Stage in:38 Active:1 Stage out:9 Finished successfully:49 > > > Progress: time: Sat, 09 Mar 2013 23:20:15 +0000 Selecting site:212 > > > Stage in:30 Submitting:8 Stage out:9 Finished successfully:58 > > > Progress: time: Sat, 09 Mar 2013 23:20:16 +0000 Selecting site:203 > > > Stage in:38 Submitting:8 Submitted:1 Finished successfully:67 > > > Progress: time: Sat, 09 Mar 2013 23:20:18 +0000 Selecting site:202 > > > Stage in:19 Stage out:28 Finished successfully:68 > > > Progress: time: Sat, 09 Mar 2013 23:20:19 +0000 Selecting site:172 > > > Stage in:33 Submitting:2 Submitted:6 Active:5 Stage out:2 Finished > > > successfully:97 > > > Progress: time: Sat, 09 Mar 2013 23:20:20 +0000 Selecting site:170 > > > Stage in:31 Submitting:2 Stage out:14 Finished successfully:100 > > > Progress: time: Sat, 09 Mar 2013 23:20:21 +0000 Selecting site:162 > > > Stage in:30 Submitting:10 Stage out:6 Finished successfully:109 > > > Progress: time: Sat, 09 Mar 2013 23:20:22 +0000 Selecting site:154 > > > Stage in:39 Submitting:5 Submitted:3 Active:1 Finished > > > successfully:115 > > > Progress: time: Sat, 09 Mar 2013 23:20:23 +0000 Selecting site:154 > > > Stage in:21 Active:10 Stage out:16 Finished successfully:116 > > > Progress: time: Sat, 09 Mar 2013 23:20:24 +0000 Selecting site:126 > > > Stage in:20 Submitting:25 Submitted:1 Stage out:2 Finished > > > successfully:143 > > > Progress: time: Sat, 09 Mar 2013 23:20:25 +0000 Selecting site:124 > > > Stage in:31 Active:2 Stage out:15 Finished successfully:145 > > > Progress: time: Sat, 09 Mar 2013 23:20:26 +0000 Selecting site:110 > > > Stage in:30 Submitting:14 Stage out:3 Finished successfully:160 > > > Progress: time: Sat, 09 Mar 2013 23:20:27 +0000 Selecting site:106 > > > Stage in:43 Submitting:1 Submitted:1 Active:1 Stage out:2 Finished > > > successfully:163 > > > Progress: time: Sat, 09 Mar 2013 23:20:28 +0000 Selecting site:104 > > > Stage in:20 Submitting:2 Active:7 Stage out:19 Finished > > > successfully:165 > > > Progress: time: Sat, 09 Mar 2013 23:20:29 +0000 Selecting site:78 > > > Stage in:29 Submitting:16 Submitted:1 Stage out:2 Finished > > > successfully:191 > > > Progress: time: Sat, 09 Mar 2013 23:20:31 +0000 Selecting site:76 > > > Stage in:30 Stage out:17 Finished successfully:194 > > > Progress: time: Sat, 09 Mar 2013 23:20:32 +0000 Selecting site:58 > > > Stage in:29 Submitting:18 Active:1 Finished successfully:211 > > > Progress: time: Sat, 09 Mar 2013 23:20:33 +0000 Selecting site:58 > > > Stage in:33 Active:3 Stage out:12 Finished successfully:211 > > > Progress: time: Sat, 09 Mar 2013 23:20:34 +0000 Selecting site:46 > > > Stage in:18 Submitting:11 Submitted:1 Active:2 Stage out:14 > > > Finished successfully:225 > > > Progress: time: Sat, 09 Mar 2013 23:20:35 +0000 Selecting site:30 > > > Stage in:29 Active:14 Stage out:3 Finished successfully:241 > > > Progress: time: Sat, 09 Mar 2013 23:20:36 +0000 Selecting site:28 > > > Stage in:28 Submitting:2 Stage out:17 Finished successfully:242 > > > Progress: time: Sat, 09 Mar 2013 23:20:37 +0000 Selecting site:10 > > > Stage in:30 Submitting:17 Submitted:1 Finished successfully:259 > > > Progress: time: Sat, 09 Mar 2013 23:20:38 +0000 Selecting site:10 > > > Stage in:35 Stage out:13 Finished successfully:259 > > > Progress: time: Sat, 09 Mar 2013 23:20:39 +0000 Stage in:21 > > > Submitting:6 Submitted:3 Stage out:15 Finished successfully:272 > > > Progress: time: Sat, 09 Mar 2013 23:20:40 +0000 Stage in:10 > > > Active:5 Stage out:14 Finished successfully:288 > > > Final status: Sat, 09 Mar 2013 23:20:41 +0000 Finished > > > successfully:317 > > > > > > real 0m58.953s > > > user 0m32.573s > > > sys 0m1.263s > > > + mv /home/wilde/.swift/runs/current/run029.1362871183 > > > /home/wilde/.swift/runs/completed > > > midway001$ > > > > > > > > > > > > ----- Original Message ----- > > > > From: "David Kelly" < davidk at ci.uchicago.edu > > > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > > > Sent: Saturday, March 9, 2013 5:12:59 PM > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > Yep - I had a version where the input files were in a very > > > > similar > > > > format (PGM, 1 byte per pixel). I'll add that back, but without > > > > the > > > > small PGM header in the files. > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > From: "Michael Wilde" < wilde at mcs.anl.gov > > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > > Sent: Saturday, March 9, 2013 5:04:43 PM > > > > Subject: Re: runs for OSG talk > > > > > > > > I think we need to cut down the size of these files for a demo > > > > (although they are great for a stress test). > > > > > > > > First, the RGB format by itself uses 3 bytes per pixel when it > > > > only > > > > needs one (for land use) > > > > > > > > Second, we should cut down by a factor of 9 (3x3) or 16 (4x4). > > > > > > > > I tried that using simple convert statements, but it always seems > > > > to > > > > yield a file exactly double what it should be. > > > > > > > > More on this later; was hoping to get things working "as is" > > > > first. > > > > > > > > I assume you could get the perl code to work on > > > > one-byte-per-pixel > > > > instead of the default 3 for the convert rgb format? > > > > > > > > - Mike > > > > > > > > ----- Original Message ----- > > > > > From: "David Kelly" < davidk at ci.uchicago.edu > > > > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > Sent: Saturday, March 9, 2013 4:36:30 PM > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > That would probably be a good idea for a new script, to show > > > > > how to > > > > > stage apps like that. For now I updated the scripts on lustre.. > > > > > hopefully that helps. > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > From: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > > > Sent: Saturday, March 9, 2013 4:29:14 PM > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > OK, I see that its trying to run getlanduse.sh from your > > > > > /lustre > > > > > dir > > > > > on beagle, which is different than the one Ive got checked out. > > > > > It > > > > > seems to get an error in a stderr redirect??? Let me se what I > > > > > need > > > > > to do to get the beagle side in sync. > > > > > > > > > > Seems like since these are perl scripts, we should make the > > > > > app() > > > > > /bin/sh and send the script as data, perhaps? > > > > > > > > > > - Mike > > > > > > > > > > ----- Original Message ----- > > > > > > From: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > Sent: Saturday, March 9, 2013 4:19:31 PM > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > OK, making progress. Now I dialed down the throttle and node > > > > > > counts > > > > > > to 48 jobs. > > > > > > > > > > > > Now I get further, for ./demo and site=4 script=2: > > > > > > > > > > > > RunID: 20130309-2214-1oi3rvea > > > > > > Progress: time: Sat, 09 Mar 2013 22:14:06 +0000 > > > > > > Progress: time: Sat, 09 Mar 2013 22:14:17 +0000 Selecting > > > > > > site:269 > > > > > > Submitting:47 Submitted:1 > > > > > > Progress: time: Sat, 09 Mar 2013 22:14:22 +0000 Selecting > > > > > > site:269 > > > > > > Stage in:1 Submitted:47 > > > > > > Progress: time: Sat, 09 Mar 2013 22:14:28 +0000 Selecting > > > > > > site:269 > > > > > > Stage in:25 Submitted:23 > > > > > > Progress: time: Sat, 09 Mar 2013 22:14:36 +0000 Selecting > > > > > > site:269 > > > > > > Stage in:48 > > > > > > Progress: time: Sat, 09 Mar 2013 22:15:06 +0000 Selecting > > > > > > site:269 > > > > > > Stage in:48 > > > > > > Progress: time: Sat, 09 Mar 2013 22:15:36 +0000 Selecting > > > > > > site:269 > > > > > > Stage in:48 > > > > > > Progress: time: Sat, 09 Mar 2013 22:16:06 +0000 Selecting > > > > > > site:269 > > > > > > Stage in:48 > > > > > > Progress: time: Sat, 09 Mar 2013 22:16:26 +0000 Selecting > > > > > > site:269 > > > > > > Stage in:47 Active:1 > > > > > > Progress: time: Sat, 09 Mar 2013 22:16:27 +0000 Selecting > > > > > > site:269 > > > > > > Stage in:36 Active:12 > > > > > > Progress: time: Sat, 09 Mar 2013 22:16:29 +0000 Selecting > > > > > > site:269 > > > > > > Stage in:24 Active:24 > > > > > > Progress: time: Sat, 09 Mar 2013 22:16:34 +0000 Selecting > > > > > > site:269 > > > > > > Stage in:24 Active:23 Stage out:1 > > > > > > Progress: time: Sat, 09 Mar 2013 22:16:35 +0000 Selecting > > > > > > site:269 > > > > > > Stage in:14 Active:33 Stage out:1 > > > > > > Execution failed: > > > > > > Exception in getlanduse: > > > > > > Arguments: > > > > > > [home/wilde/osgdemo/modis/svn/data/modis/2002/h08v04.rgb] > > > > > > Host: beagle > > > > > > Directory: > > > > > > modis02-20130309-2214-1oi3rvea/jobs/k/getlanduse-ko5qjd6l > > > > > > > > > > > > Caused by: > > > > > > Application /lustre/beagle/davidk/modis/bin/getlanduse.sh > > > > > > failed > > > > > > with an exit code of 1 > > > > > > getLandUse, modis02.swift, line 20 > > > > > > > > > > > > real 2m31.463s > > > > > > user 1m33.238s > > > > > > sys 0m2.160s > > > > > > + mv /home/wilde/.swift/runs/current/run024.1362867244 > > > > > > /home/wilde/.swift/runs/completed > > > > > > midway001$ > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > From: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > > Sent: Saturday, March 9, 2013 3:55:30 PM > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > ok, I'll take a look at that. The run dir I used was > > > > > > > /scratch/midway/davidkelly999/modis/run011 > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > > Sent: Saturday, March 9, 2013 3:52:28 PM > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > I just tried this, but didnt work - same prob. > > > > > > > > > > > > > > But if its working for you now, we must be close. > > > > > > > > > > > > > > Not yet sure what the diff is... > > > > > > > > > > > > > > My run dir is /home/wilde/osgdemo/modis/svn/run021 > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > From: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > > > Sent: Saturday, March 9, 2013 3:46:13 PM > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > Had to make sure I was using the IP address on eth4 > > > > > > > > (128.135.112.71 > > > > > > > > for midway-login1), not a local address or an infiniband > > > > > > > > address. > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > From: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > > > Sent: Saturday, March 9, 2013 3:43:51 PM > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > I just got it working. I had to adjust for the > > > > > > > > differences in > > > > > > > > my > > > > > > > > username on Beagle/Midway, then I had to set > > > > > > > > GLOBUS_HOSTNAME > > > > > > > > on > > > > > > > > Midway to the IP address, rather than the full hostname > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > > > Sent: Saturday, March 9, 2013 3:40:03 PM > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > From: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > > > > Sent: Saturday, March 9, 2013 3:34:58 PM > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > Is your username the same on beagle and midway? > > > > > > > > > > > > > > > > Yes. And I verified that I can ssh to login4 on beagle > > > > > > > > from > > > > > > > > my > > > > > > > > midway > > > > > > > > session (as indeed the scp's of the proxy files seem to > > > > > > > > be > > > > > > > > working) > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > > > > Sent: Saturday, March 9, 2013 3:34:28 PM > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > OK. > > > > > > > > > > > > > > > > > > Ignore what I said about "problem finding java" - thats > > > > > > > > > code > > > > > > > > > in > > > > > > > > > the > > > > > > > > > very long escaped shell command that gets sent to the > > > > > > > > > remote > > > > > > > > > side. > > > > > > > > > I > > > > > > > > > dont *think* thats the problem. > > > > > > > > > > > > > > > > > > I also verified that beagle can connect to ports 50001 > > > > > > > > > etc > > > > > > > > > on > > > > > > > > > swift.rcc, and that seems OK. > > > > > > > > > > > > > > > > > > I exported GLOBUS_HOSTNAME= midway001.rcc.uchicago.edu > > > > > > > > > on > > > > > > > > > the > > > > > > > > > midway > > > > > > > > > side. And the beagle side seems to be connecting there. > > > > > > > > > > > > > > > > > > Im a bit confused about the timestamps I see for the > > > > > > > > > proxy > > > > > > > > > expiration > > > > > > > > > time, but am not yet suspicious of that (although it > > > > > > > > > seems > > > > > > > > > less > > > > > > > > > than > > > > > > > > > 5 hours past GMT... not sure.) > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > From: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > > > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > > > > > Sent: Saturday, March 9, 2013 3:26:32 PM > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'm seeing the same error now.. looking into it > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > > > > > Sent: Saturday, March 9, 2013 3:21:30 PM > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > Looking deeper I see that the logs show problems with > > > > > > > > > > finding > > > > > > > > > > Java, > > > > > > > > > > I > > > > > > > > > > assume on beagle, ans also service ending (presumably > > > > > > > > > > coaster > > > > > > > > > > service on midway host). > > > > > > > > > > > > > > > > > > > > I'll dig into these two. > > > > > > > > > > > > > > > > > > > > I see that it scp's the proxies to beagle which I > > > > > > > > > > think > > > > > > > > > > answers > > > > > > > > > > my > > > > > > > > > > question about security. > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > From: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > > > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > > > > > > Sent: Saturday, March 9, 2013 3:15:01 PM > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > OK. Any thoughts about beagle? > > > > > > > > > > > > > > > > > > > > > > Ive been experimenting but still cant get it to > > > > > > > > > > > work, > > > > > > > > > > > same > > > > > > > > > > > error > > > > > > > > > > > (cant connect to bootstrap port) > > > > > > > > > > > > > > > > > > > > > > WHen you tried ssh-cl to beagle with automatic > > > > > > > > > > > coasters, > > > > > > > > > > > what > > > > > > > > > > > configuration (sites env etc) did you use? > > > > > > > > > > > > > > > > > > > > > > I verified that beagle can connect back to the > > > > > > > > > > > midway > > > > > > > > > > > hosts > > > > > > > > > > > and > > > > > > > > > > > ports. > > > > > > > > > > > > > > > > > > > > > > Do we need to specify security or create a proxy > > > > > > > > > > > etc? > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > From: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > > > > > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > > > > > > > Sent: Saturday, March 9, 2013 3:08:58 PM > > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > One way you can override/customize the default > > > > > > > > > > > > templates > > > > > > > > > > > > is > > > > > > > > > > > > to > > > > > > > > > > > > create > > > > > > > > > > > > them in $HOME/.swift/sites (I'm not sure if > > > > > > > > > > > > that's > > > > > > > > > > > > what > > > > > > > > > > > > you > > > > > > > > > > > > mean > > > > > > > > > > > > by > > > > > > > > > > > > a local sites dir or not). But you are right > > > > > > > > > > > > about > > > > > > > > > > > > Midway > > > > > > > > > > > > - > > > > > > > > > > > > I > > > > > > > > > > > > have > > > > > > > > > > > > noticed that when using modis it will sometimes > > > > > > > > > > > > get > > > > > > > > > > > > stuck > > > > > > > > > > > > when > > > > > > > > > > > > it > > > > > > > > > > > > goes to a queue that is busy. Ideally swift > > > > > > > > > > > > replication > > > > > > > > > > > > would > > > > > > > > > > > > be > > > > > > > > > > > > able to help better handle that, but I haven't > > > > > > > > > > > > had > > > > > > > > > > > > much > > > > > > > > > > > > luck > > > > > > > > > > > > with > > > > > > > > > > > > that yet. Another way around this may be to add > > > > > > > > > > > > this > > > > > > > > > > > > to > > > > > > > > > > > > the > > > > > > > > > > > > template: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > key="slurm.exclusive">false > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The swift.log issue was never fixed. It went to > > > > > > > > > > > > swift-devel > > > > > > > > > > > > for > > > > > > > > > > > > discussion but was never fixed. I think it is > > > > > > > > > > > > relatively > > > > > > > > > > > > simple > > > > > > > > > > > > though.. probably worth fixing before release. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > > > > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > > > > > > > Sent: Saturday, March 9, 2013 1:38:47 PM > > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > OK, sounds good re the trip plan. Feel free to > > > > > > > > > > > > stay > > > > > > > > > > > > Tue > > > > > > > > > > > > night > > > > > > > > > > > > to > > > > > > > > > > > > avoid a 4hr drive after a long day. > > > > > > > > > > > > > > > > > > > > > > > > Im trying the modis demo. > > > > > > > > > > > > > > > > > > > > > > > > I tried to create a local sites/ dir so I can > > > > > > > > > > > > modify > > > > > > > > > > > > the > > > > > > > > > > > > sites > > > > > > > > > > > > templates; thats not working for me either yet. > > > > > > > > > > > > > > > > > > > > > > > > For midway, need to force to westmere or sandyb > > > > > > > > > > > > (but > > > > > > > > > > > > not > > > > > > > > > > > > both) > > > > > > > > > > > > and > > > > > > > > > > > > ensure 1-node jobs, because either queue can get > > > > > > > > > > > > filled > > > > > > > > > > > > and > > > > > > > > > > > > not > > > > > > > > > > > > yield an idle node for a long time. maybe need to > > > > > > > > > > > > fiddle > > > > > > > > > > > > jobsPerNode > > > > > > > > > > > > to get at least 1 core when the system is busy > > > > > > > > > > > > and > > > > > > > > > > > > *pretend* > > > > > > > > > > > > that > > > > > > > > > > > > its a node. > > > > > > > > > > > > > > > > > > > > > > > > So to get response I tried beagle-ssh; That isnt > > > > > > > > > > > > working > > > > > > > > > > > > because > > > > > > > > > > > > the > > > > > > > > > > > > template sites file is wrong in swift 0.94 rc4. > > > > > > > > > > > > > > > > > > > > > > > > I also see that swift.log is still getting > > > > > > > > > > > > produced - > > > > > > > > > > > > I > > > > > > > > > > > > thought > > > > > > > > > > > > we > > > > > > > > > > > > eliminated that. Did it come back due to a > > > > > > > > > > > > problem > > > > > > > > > > > > with > > > > > > > > > > > > that > > > > > > > > > > > > fix? > > > > > > > > > > > > > > > > > > > > > > > > I'll keep hacking; suggestions welcome. > > > > > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > From: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > > > > > > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > > > > > > > > Sent: Saturday, March 9, 2013 12:20:00 PM > > > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Mike, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Looking more closely at the agenda, I think the > > > > > > > > > > > > > most > > > > > > > > > > > > > interesting/useful talks will be on Tuesday. > > > > > > > > > > > > > Monday > > > > > > > > > > > > > I'll > > > > > > > > > > > > > come > > > > > > > > > > > > > to > > > > > > > > > > > > > Argonne to work on any loose ends and put the > > > > > > > > > > > > > finishing > > > > > > > > > > > > > touches > > > > > > > > > > > > > on > > > > > > > > > > > > > any slides/runs/scripts, then drive to > > > > > > > > > > > > > Indianapolis > > > > > > > > > > > > > on > > > > > > > > > > > > > Monday > > > > > > > > > > > > > afternoon/evening. I have a hotel booked for > > > > > > > > > > > > > Monday > > > > > > > > > > > > > night. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'll do some runs using the routes we talked > > > > > > > > > > > > > about. > > > > > > > > > > > > > I'm > > > > > > > > > > > > > pretty > > > > > > > > > > > > > sure > > > > > > > > > > > > > I > > > > > > > > > > > > > have working configurations for everything we > > > > > > > > > > > > > talked > > > > > > > > > > > > > about, > > > > > > > > > > > > > so > > > > > > > > > > > > > I > > > > > > > > > > > > > think it's really just a matter of plugging in > > > > > > > > > > > > > the > > > > > > > > > > > > > apps. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > David > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > > > > > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > > > > > > > > Sent: Saturday, March 9, 2013 11:03:15 AM > > > > > > > > > > > > > Subject: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > Hi David, > > > > > > > > > > > > > > > > > > > > > > > > > > I just wanted to let you know that Im looking > > > > > > > > > > > > > into > > > > > > > > > > > > > the > > > > > > > > > > > > > run > > > > > > > > > > > > > options > > > > > > > > > > > > > now. Im hoping to try a few... WIll see how > > > > > > > > > > > > > much > > > > > > > > > > > > > help > > > > > > > > > > > > > I > > > > > > > > > > > > > need. > > > > > > > > > > > > > Have > > > > > > > > > > > > > you decided on a driving time and made hotel > > > > > > > > > > > > > arrangements? > > > > > > > > > > > > > > > > > > > > > > > > > > I would feel free to stay for whatever portion > > > > > > > > > > > > > of > > > > > > > > > > > > > the > > > > > > > > > > > > > OSG > > > > > > > > > > > > > meeting > > > > > > > > > > > > > you > > > > > > > > > > > > > feel is of value. The only thing I ask is that > > > > > > > > > > > > > for > > > > > > > > > > > > > Wed > > > > > > > > > > > > > and > > > > > > > > > > > > > Thu > > > > > > > > > > > > > you > > > > > > > > > > > > > stay available online for user-support or other > > > > > > > > > > > > > assistance > > > > > > > > > > > > > needs > > > > > > > > > > > > > that come up here. And that you engage with > > > > > > > > > > > > > people > > > > > > > > > > > > > that > > > > > > > > > > > > > can > > > > > > > > > > > > > help > > > > > > > > > > > > > us > > > > > > > > > > > > > develop the Swift user community and reliable > > > > > > > > > > > > > OSG > > > > > > > > > > > > > usage. > > > > > > > > > > > > > Rob, > > > > > > > > > > > > > Marco, > > > > > > > > > > > > > Lincoln, and Suchandra would be good to hang > > > > > > > > > > > > > out > > > > > > > > > > > > > with > > > > > > > > > > > > > and > > > > > > > > > > > > > they > > > > > > > > > > > > > can > > > > > > > > > > > > > introduce you to good contacts. > > > > > > > > > > > > > > > > > > > > > > > > > > Of course we will cover your expenses via a > > > > > > > > > > > > > UChicago > > > > > > > > > > > > > travel > > > > > > > > > > > > > expense > > > > > > > > > > > > > report. > > > > > > > > > > > > > > > > > > > > > > > > > > We'll be starting a project with a tiny bit of > > > > > > > > > > > > > additional > > > > > > > > > > > > > ExTENCI > > > > > > > > > > > > > funds to make Swift do smarter data management > > > > > > > > > > > > > on > > > > > > > > > > > > > OSG > > > > > > > > > > > > > sites > > > > > > > > > > > > > (and > > > > > > > > > > > > > in > > > > > > > > > > > > > general) so anything you learn about OSG > > > > > > > > > > > > > storage > > > > > > > > > > > > > elements/services/tools will be valuable for > > > > > > > > > > > > > that > > > > > > > > > > > > > (srmcp, > > > > > > > > > > > > > lcgcp, > > > > > > > > > > > > > etc). > > > > > > > > > > > > > > > > > > > > > > > > > > Between now and your talk, lets just focus on > > > > > > > > > > > > > the > > > > > > > > > > > > > talk, > > > > > > > > > > > > > OK? > > > > > > > > > > > > > Im > > > > > > > > > > > > > hoping > > > > > > > > > > > > > we have slides frozen by Monday. > > > > > > > > > > > > > > > > > > > > > > > > > > While I fiddle, if you could do catsn or other > > > > > > > > > > > > > hello-world-like > > > > > > > > > > > > > tests > > > > > > > > > > > > > to cover the "routes" we discussed, that would > > > > > > > > > > > > > pave > > > > > > > > > > > > > the > > > > > > > > > > > > > way > > > > > > > > > > > > > for > > > > > > > > > > > > > plugging in the real app examples. > > > > > > > > > > > > > > > > > > > > > > > > > > Sound good? Let me know of any concerns (other > > > > > > > > > > > > > than > > > > > > > > > > > > > the > > > > > > > > > > > > > fact > > > > > > > > > > > > > that > > > > > > > > > > > > > this is a tad rushed ;) > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks and regards, > > > > > > > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > Michael Wilde > > > > > > > > > > > > > Computation Institute, University of Chicago > > > > > > > > > > > > > Mathematics and Computer Science Division > > > > > > > > > > > > > Argonne National Laboratory > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > > > > > -- > > Ketan > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Sun Mar 10 11:18:00 2013 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sun, 10 Mar 2013 11:18:00 -0500 (CDT) Subject: [Swift-devel] a note on stress testing In-Reply-To: Message-ID: <1691597092.1275063.1362932280635.JavaMail.root@mcs.anl.gov> > ... my understanding was that we'd need three types on > runs based on where swift runs: > -> Local | Local ( Swift runs locally with jobs also run locally ) > -> Local | Remote ( Swift runs locally but the compute resources > are remote providers ? ) > -> Remote | Local ( Swift instantiated on a remote system to > coordinate resources locally ) Right. > In the case you have pointed out, Swift on Midway submits jobs to > beagle. > Is this a remote|remote type which I haven't listed? Not by itself: its "local|remote". But you can *initiate* this test remotely, making it "remote|remote". So while all 4 boxes of this 2x2 matrix make sense, ultimately you could think of all the tests being initiated from a test service like Jenkins, so *all* tests would be "remote" in column one of your list above. That simplifies the way you can organize the test plan and execution mechanism. > Can we expect to > see the same errors, > if we were to run swift on an MCS machine to submit the same jobs to > beagle? Yes, I think in this particular case, the main errors were not caused by the environment in the submitting system (midway hosts). But they very well *could have been*. For example, if we only test on hosts that have single network interfaces, we'd never see known errors related to not handling the multiple-NIC case correctly. And in fact in our testing yesterday this was indeed an issue thats still unresolved: whether GLOBUS_HOST or the equivalent sites.xml tag needs to be set to a dotted IP or could accept a DNS FQDN, and under what circumstances. This needs both testing *and* documentation. Similar issues apply to hosts with various firewall/iptables settings. - Mike From wilde at mcs.anl.gov Sun Mar 10 11:23:04 2013 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sun, 10 Mar 2013 11:23:04 -0500 (CDT) Subject: [Swift-devel] Java versions for testing - Re: Cant get auto-coasters to run from midway to beagle In-Reply-To: Message-ID: <240807962.1275243.1362932584507.JavaMail.root@mcs.anl.gov> OK, thats good to know. But note that we've seen cases where Swift failed on a specific Oracle Java 1.6 build, due to the behavior of some esoteric class. So testing on both OpenJDK and Oracle still makes sense, and even more importantly, telling the user what specific Java release a given Swift revision was tested under will help eliminate some variables affecting reliability. - Mike ----- Original Message ----- > From: "Tim Armstrong" > To: "Michael Wilde" > Cc: "Swift Devel" > Sent: Sunday, March 10, 2013 11:10:45 AM > Subject: Re: [Swift-devel] Java versions for testing - Re: Cant get auto-coasters to run from midway to beagle > > YMMV but for other projects I've not found any compatibility issues > between OpenJDK and Oracle's java - their codebases are mostly the > same anyway now. I don't think there are really any other open > source jdks in wide use (GCJ is pretty much a dead project, about a > decade behind the Java standard). > > - Tim > > > On Sun, Mar 10, 2013 at 11:05 AM, Michael Wilde < wilde at mcs.anl.gov > > wrote: > > > Good point, Ketan. We should document this on the site guide for > Beagle. > > We should also file a bug to fix this some day (ie find out why we > dont work well with this java and if possible fix it. But thats very > low prio; for now we want to make sure all Swift users run on Sun > Java 1.7. > > I had sun java in my .modules file. Mihael has now added the -l flag > to make sure that automatic coaster bootstrap runs the user's init > to pick up such things. > > Yadu, this raises another issue for the test plan and suite: testing > on multiple Javas and carefully deciding and controlling the Java we > use. More important than this esoteric Cray IBM Java is validating > Swift on popular open source Javas. > > - Mike > > ----- Original Message ----- > > From: "Ketan Maheshwari" < ketancmaheshwari at gmail.com > > > To: "Mihael Hategan" < hategan at mcs.anl.gov > > > Cc: "Michael Wilde" < wilde at mcs.anl.gov >, "Swift Devel" < > > swift-devel at ci.uchicago.edu > > > Sent: Saturday, March 9, 2013 8:24:56 PM > > Subject: Re: [Swift-devel] Cant get auto-coasters to run from > > midway to beagle > > > > > > IBM jdk on beagle is known to not function well with Swift > > coasters. > > We had to switch to Sun jdk for ssh:pbs runs from > > bridled/communicado. > > > > > > > > On Sat, Mar 9, 2013 at 7:43 PM, Mihael Hategan < > > hategan at mcs.anl.gov > > > wrote: > > > > > > I noticed some random weirdness due to the fact that the coaster > > service > > runs with the ibm jdk. > > > > I'll run some tests with both and see what happens. > > > > Mihael > > > > > > > > On Sat, 2013-03-09 at 17:49 -0600, Michael Wilde wrote: > > > An update on this provider staging related issue: reducing > > > filesize > > > from 17MB to 600KB runs well. > > > > > > So seems like some kind of flow control or buffer management > > > problem, possibly? > > > > > > May need to take that problem offline - would be a perfect test > > > case for Yadu to develop a new stress test for. > > > > > > - Mike > > > > > > > > > ----- Forwarded Message ----- > > > From: "Michael Wilde" < wilde at mcs.anl.gov > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > Sent: Saturday, March 9, 2013 5:21:49 PM > > > Subject: Re: runs for OSG talk > > > > > > OK, much better: with 600K files (5x5 reduction or 25X smaller) > > > it > > > works well, and fast (form midway to beagle!) > > > > > > Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified locally) > > > > > > RunID: 20130309-2319-5zq0jrfg > > > Progress: time: Sat, 09 Mar 2013 23:19:45 +0000 > > > Progress: time: Sat, 09 Mar 2013 23:19:56 +0000 Selecting > > > site:269 > > > Submitting:47 Submitted:1 > > > Progress: time: Sat, 09 Mar 2013 23:20:05 +0000 Selecting > > > site:269 > > > Stage in:1 Submitted:47 > > > Progress: time: Sat, 09 Mar 2013 23:20:09 +0000 Selecting > > > site:269 > > > Stage in:47 Active:1 > > > Progress: time: Sat, 09 Mar 2013 23:20:10 +0000 Selecting > > > site:269 > > > Stage in:46 Active:1 Stage out:1 > > > Progress: time: Sat, 09 Mar 2013 23:20:11 +0000 Selecting > > > site:250 > > > Stage in:19 Active:28 Stage out:1 Finished successfully:19 > > > Progress: time: Sat, 09 Mar 2013 23:20:12 +0000 Selecting > > > site:229 > > > Stage in:18 Submitting:21 Active:1 Stage out:7 Finished > > > successfully:41 > > > Progress: time: Sat, 09 Mar 2013 23:20:13 +0000 Selecting > > > site:220 > > > Stage in:41 Submitting:1 Active:5 Stage out:1 Finished > > > successfully:49 > > > Progress: time: Sat, 09 Mar 2013 23:20:14 +0000 Selecting > > > site:220 > > > Stage in:38 Active:1 Stage out:9 Finished successfully:49 > > > Progress: time: Sat, 09 Mar 2013 23:20:15 +0000 Selecting > > > site:212 > > > Stage in:30 Submitting:8 Stage out:9 Finished successfully:58 > > > Progress: time: Sat, 09 Mar 2013 23:20:16 +0000 Selecting > > > site:203 > > > Stage in:38 Submitting:8 Submitted:1 Finished successfully:67 > > > Progress: time: Sat, 09 Mar 2013 23:20:18 +0000 Selecting > > > site:202 > > > Stage in:19 Stage out:28 Finished successfully:68 > > > Progress: time: Sat, 09 Mar 2013 23:20:19 +0000 Selecting > > > site:172 > > > Stage in:33 Submitting:2 Submitted:6 Active:5 Stage out:2 > > > Finished > > > successfully:97 > > > Progress: time: Sat, 09 Mar 2013 23:20:20 +0000 Selecting > > > site:170 > > > Stage in:31 Submitting:2 Stage out:14 Finished successfully:100 > > > Progress: time: Sat, 09 Mar 2013 23:20:21 +0000 Selecting > > > site:162 > > > Stage in:30 Submitting:10 Stage out:6 Finished successfully:109 > > > Progress: time: Sat, 09 Mar 2013 23:20:22 +0000 Selecting > > > site:154 > > > Stage in:39 Submitting:5 Submitted:3 Active:1 Finished > > > successfully:115 > > > Progress: time: Sat, 09 Mar 2013 23:20:23 +0000 Selecting > > > site:154 > > > Stage in:21 Active:10 Stage out:16 Finished successfully:116 > > > Progress: time: Sat, 09 Mar 2013 23:20:24 +0000 Selecting > > > site:126 > > > Stage in:20 Submitting:25 Submitted:1 Stage out:2 Finished > > > successfully:143 > > > Progress: time: Sat, 09 Mar 2013 23:20:25 +0000 Selecting > > > site:124 > > > Stage in:31 Active:2 Stage out:15 Finished successfully:145 > > > Progress: time: Sat, 09 Mar 2013 23:20:26 +0000 Selecting > > > site:110 > > > Stage in:30 Submitting:14 Stage out:3 Finished successfully:160 > > > Progress: time: Sat, 09 Mar 2013 23:20:27 +0000 Selecting > > > site:106 > > > Stage in:43 Submitting:1 Submitted:1 Active:1 Stage out:2 > > > Finished > > > successfully:163 > > > Progress: time: Sat, 09 Mar 2013 23:20:28 +0000 Selecting > > > site:104 > > > Stage in:20 Submitting:2 Active:7 Stage out:19 Finished > > > successfully:165 > > > Progress: time: Sat, 09 Mar 2013 23:20:29 +0000 Selecting site:78 > > > Stage in:29 Submitting:16 Submitted:1 Stage out:2 Finished > > > successfully:191 > > > Progress: time: Sat, 09 Mar 2013 23:20:31 +0000 Selecting site:76 > > > Stage in:30 Stage out:17 Finished successfully:194 > > > Progress: time: Sat, 09 Mar 2013 23:20:32 +0000 Selecting site:58 > > > Stage in:29 Submitting:18 Active:1 Finished successfully:211 > > > Progress: time: Sat, 09 Mar 2013 23:20:33 +0000 Selecting site:58 > > > Stage in:33 Active:3 Stage out:12 Finished successfully:211 > > > Progress: time: Sat, 09 Mar 2013 23:20:34 +0000 Selecting site:46 > > > Stage in:18 Submitting:11 Submitted:1 Active:2 Stage out:14 > > > Finished successfully:225 > > > Progress: time: Sat, 09 Mar 2013 23:20:35 +0000 Selecting site:30 > > > Stage in:29 Active:14 Stage out:3 Finished successfully:241 > > > Progress: time: Sat, 09 Mar 2013 23:20:36 +0000 Selecting site:28 > > > Stage in:28 Submitting:2 Stage out:17 Finished successfully:242 > > > Progress: time: Sat, 09 Mar 2013 23:20:37 +0000 Selecting site:10 > > > Stage in:30 Submitting:17 Submitted:1 Finished successfully:259 > > > Progress: time: Sat, 09 Mar 2013 23:20:38 +0000 Selecting site:10 > > > Stage in:35 Stage out:13 Finished successfully:259 > > > Progress: time: Sat, 09 Mar 2013 23:20:39 +0000 Stage in:21 > > > Submitting:6 Submitted:3 Stage out:15 Finished successfully:272 > > > Progress: time: Sat, 09 Mar 2013 23:20:40 +0000 Stage in:10 > > > Active:5 Stage out:14 Finished successfully:288 > > > Final status: Sat, 09 Mar 2013 23:20:41 +0000 Finished > > > successfully:317 > > > > > > real 0m58.953s > > > user 0m32.573s > > > sys 0m1.263s > > > + mv /home/wilde/.swift/runs/current/run029.1362871183 > > > /home/wilde/.swift/runs/completed > > > midway001$ > > > > > > > > > > > > ----- Original Message ----- > > > > From: "David Kelly" < davidk at ci.uchicago.edu > > > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > > > Sent: Saturday, March 9, 2013 5:12:59 PM > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > Yep - I had a version where the input files were in a very > > > > similar > > > > format (PGM, 1 byte per pixel). I'll add that back, but without > > > > the > > > > small PGM header in the files. > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > From: "Michael Wilde" < wilde at mcs.anl.gov > > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > > Sent: Saturday, March 9, 2013 5:04:43 PM > > > > Subject: Re: runs for OSG talk > > > > > > > > I think we need to cut down the size of these files for a demo > > > > (although they are great for a stress test). > > > > > > > > First, the RGB format by itself uses 3 bytes per pixel when it > > > > only > > > > needs one (for land use) > > > > > > > > Second, we should cut down by a factor of 9 (3x3) or 16 (4x4). > > > > > > > > I tried that using simple convert statements, but it always > > > > seems > > > > to > > > > yield a file exactly double what it should be. > > > > > > > > More on this later; was hoping to get things working "as is" > > > > first. > > > > > > > > I assume you could get the perl code to work on > > > > one-byte-per-pixel > > > > instead of the default 3 for the convert rgb format? > > > > > > > > - Mike > > > > > > > > ----- Original Message ----- > > > > > From: "David Kelly" < davidk at ci.uchicago.edu > > > > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > Sent: Saturday, March 9, 2013 4:36:30 PM > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > That would probably be a good idea for a new script, to show > > > > > how to > > > > > stage apps like that. For now I updated the scripts on > > > > > lustre.. > > > > > hopefully that helps. > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > From: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > > > Sent: Saturday, March 9, 2013 4:29:14 PM > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > OK, I see that its trying to run getlanduse.sh from your > > > > > /lustre > > > > > dir > > > > > on beagle, which is different than the one Ive got checked > > > > > out. > > > > > It > > > > > seems to get an error in a stderr redirect??? Let me se what > > > > > I > > > > > need > > > > > to do to get the beagle side in sync. > > > > > > > > > > Seems like since these are perl scripts, we should make the > > > > > app() > > > > > /bin/sh and send the script as data, perhaps? > > > > > > > > > > - Mike > > > > > > > > > > ----- Original Message ----- > > > > > > From: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > Sent: Saturday, March 9, 2013 4:19:31 PM > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > OK, making progress. Now I dialed down the throttle and > > > > > > node > > > > > > counts > > > > > > to 48 jobs. > > > > > > > > > > > > Now I get further, for ./demo and site=4 script=2: > > > > > > > > > > > > RunID: 20130309-2214-1oi3rvea > > > > > > Progress: time: Sat, 09 Mar 2013 22:14:06 +0000 > > > > > > Progress: time: Sat, 09 Mar 2013 22:14:17 +0000 Selecting > > > > > > site:269 > > > > > > Submitting:47 Submitted:1 > > > > > > Progress: time: Sat, 09 Mar 2013 22:14:22 +0000 Selecting > > > > > > site:269 > > > > > > Stage in:1 Submitted:47 > > > > > > Progress: time: Sat, 09 Mar 2013 22:14:28 +0000 Selecting > > > > > > site:269 > > > > > > Stage in:25 Submitted:23 > > > > > > Progress: time: Sat, 09 Mar 2013 22:14:36 +0000 Selecting > > > > > > site:269 > > > > > > Stage in:48 > > > > > > Progress: time: Sat, 09 Mar 2013 22:15:06 +0000 Selecting > > > > > > site:269 > > > > > > Stage in:48 > > > > > > Progress: time: Sat, 09 Mar 2013 22:15:36 +0000 Selecting > > > > > > site:269 > > > > > > Stage in:48 > > > > > > Progress: time: Sat, 09 Mar 2013 22:16:06 +0000 Selecting > > > > > > site:269 > > > > > > Stage in:48 > > > > > > Progress: time: Sat, 09 Mar 2013 22:16:26 +0000 Selecting > > > > > > site:269 > > > > > > Stage in:47 Active:1 > > > > > > Progress: time: Sat, 09 Mar 2013 22:16:27 +0000 Selecting > > > > > > site:269 > > > > > > Stage in:36 Active:12 > > > > > > Progress: time: Sat, 09 Mar 2013 22:16:29 +0000 Selecting > > > > > > site:269 > > > > > > Stage in:24 Active:24 > > > > > > Progress: time: Sat, 09 Mar 2013 22:16:34 +0000 Selecting > > > > > > site:269 > > > > > > Stage in:24 Active:23 Stage out:1 > > > > > > Progress: time: Sat, 09 Mar 2013 22:16:35 +0000 Selecting > > > > > > site:269 > > > > > > Stage in:14 Active:33 Stage out:1 > > > > > > Execution failed: > > > > > > Exception in getlanduse: > > > > > > Arguments: > > > > > > [home/wilde/osgdemo/modis/svn/data/modis/2002/h08v04.rgb] > > > > > > Host: beagle > > > > > > Directory: > > > > > > modis02-20130309-2214-1oi3rvea/jobs/k/getlanduse-ko5qjd6l > > > > > > > > > > > > Caused by: > > > > > > Application /lustre/beagle/davidk/modis/bin/getlanduse.sh > > > > > > failed > > > > > > with an exit code of 1 > > > > > > getLandUse, modis02.swift, line 20 > > > > > > > > > > > > real 2m31.463s > > > > > > user 1m33.238s > > > > > > sys 0m2.160s > > > > > > + mv /home/wilde/.swift/runs/current/run024.1362867244 > > > > > > /home/wilde/.swift/runs/completed > > > > > > midway001$ > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > From: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > > Sent: Saturday, March 9, 2013 3:55:30 PM > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > ok, I'll take a look at that. The run dir I used was > > > > > > > /scratch/midway/davidkelly999/modis/run011 > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > > Sent: Saturday, March 9, 2013 3:52:28 PM > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > I just tried this, but didnt work - same prob. > > > > > > > > > > > > > > But if its working for you now, we must be close. > > > > > > > > > > > > > > Not yet sure what the diff is... > > > > > > > > > > > > > > My run dir is /home/wilde/osgdemo/modis/svn/run021 > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > From: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > > > Sent: Saturday, March 9, 2013 3:46:13 PM > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > Had to make sure I was using the IP address on eth4 > > > > > > > > (128.135.112.71 > > > > > > > > for midway-login1), not a local address or an > > > > > > > > infiniband > > > > > > > > address. > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > From: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > > > Sent: Saturday, March 9, 2013 3:43:51 PM > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > I just got it working. I had to adjust for the > > > > > > > > differences in > > > > > > > > my > > > > > > > > username on Beagle/Midway, then I had to set > > > > > > > > GLOBUS_HOSTNAME > > > > > > > > on > > > > > > > > Midway to the IP address, rather than the full hostname > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > > > Sent: Saturday, March 9, 2013 3:40:03 PM > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > From: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > > > > Sent: Saturday, March 9, 2013 3:34:58 PM > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > Is your username the same on beagle and midway? > > > > > > > > > > > > > > > > Yes. And I verified that I can ssh to login4 on beagle > > > > > > > > from > > > > > > > > my > > > > > > > > midway > > > > > > > > session (as indeed the scp's of the proxy files seem to > > > > > > > > be > > > > > > > > working) > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > > > > Sent: Saturday, March 9, 2013 3:34:28 PM > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > OK. > > > > > > > > > > > > > > > > > > Ignore what I said about "problem finding java" - > > > > > > > > > thats > > > > > > > > > code > > > > > > > > > in > > > > > > > > > the > > > > > > > > > very long escaped shell command that gets sent to the > > > > > > > > > remote > > > > > > > > > side. > > > > > > > > > I > > > > > > > > > dont *think* thats the problem. > > > > > > > > > > > > > > > > > > I also verified that beagle can connect to ports > > > > > > > > > 50001 > > > > > > > > > etc > > > > > > > > > on > > > > > > > > > swift.rcc, and that seems OK. > > > > > > > > > > > > > > > > > > I exported GLOBUS_HOSTNAME= > > > > > > > > > midway001.rcc.uchicago.edu > > > > > > > > > on > > > > > > > > > the > > > > > > > > > midway > > > > > > > > > side. And the beagle side seems to be connecting > > > > > > > > > there. > > > > > > > > > > > > > > > > > > Im a bit confused about the timestamps I see for the > > > > > > > > > proxy > > > > > > > > > expiration > > > > > > > > > time, but am not yet suspicious of that (although it > > > > > > > > > seems > > > > > > > > > less > > > > > > > > > than > > > > > > > > > 5 hours past GMT... not sure.) > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > From: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > > > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > > > > > Sent: Saturday, March 9, 2013 3:26:32 PM > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'm seeing the same error now.. looking into it > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > > > > > Sent: Saturday, March 9, 2013 3:21:30 PM > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > Looking deeper I see that the logs show problems > > > > > > > > > > with > > > > > > > > > > finding > > > > > > > > > > Java, > > > > > > > > > > I > > > > > > > > > > assume on beagle, ans also service ending > > > > > > > > > > (presumably > > > > > > > > > > coaster > > > > > > > > > > service on midway host). > > > > > > > > > > > > > > > > > > > > I'll dig into these two. > > > > > > > > > > > > > > > > > > > > I see that it scp's the proxies to beagle which I > > > > > > > > > > think > > > > > > > > > > answers > > > > > > > > > > my > > > > > > > > > > question about security. > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > From: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > > > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > > > > > > Sent: Saturday, March 9, 2013 3:15:01 PM > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > OK. Any thoughts about beagle? > > > > > > > > > > > > > > > > > > > > > > Ive been experimenting but still cant get it to > > > > > > > > > > > work, > > > > > > > > > > > same > > > > > > > > > > > error > > > > > > > > > > > (cant connect to bootstrap port) > > > > > > > > > > > > > > > > > > > > > > WHen you tried ssh-cl to beagle with automatic > > > > > > > > > > > coasters, > > > > > > > > > > > what > > > > > > > > > > > configuration (sites env etc) did you use? > > > > > > > > > > > > > > > > > > > > > > I verified that beagle can connect back to the > > > > > > > > > > > midway > > > > > > > > > > > hosts > > > > > > > > > > > and > > > > > > > > > > > ports. > > > > > > > > > > > > > > > > > > > > > > Do we need to specify security or create a proxy > > > > > > > > > > > etc? > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > From: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > > > > > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > > > > > > > Sent: Saturday, March 9, 2013 3:08:58 PM > > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > One way you can override/customize the default > > > > > > > > > > > > templates > > > > > > > > > > > > is > > > > > > > > > > > > to > > > > > > > > > > > > create > > > > > > > > > > > > them in $HOME/.swift/sites (I'm not sure if > > > > > > > > > > > > that's > > > > > > > > > > > > what > > > > > > > > > > > > you > > > > > > > > > > > > mean > > > > > > > > > > > > by > > > > > > > > > > > > a local sites dir or not). But you are right > > > > > > > > > > > > about > > > > > > > > > > > > Midway > > > > > > > > > > > > - > > > > > > > > > > > > I > > > > > > > > > > > > have > > > > > > > > > > > > noticed that when using modis it will sometimes > > > > > > > > > > > > get > > > > > > > > > > > > stuck > > > > > > > > > > > > when > > > > > > > > > > > > it > > > > > > > > > > > > goes to a queue that is busy. Ideally swift > > > > > > > > > > > > replication > > > > > > > > > > > > would > > > > > > > > > > > > be > > > > > > > > > > > > able to help better handle that, but I haven't > > > > > > > > > > > > had > > > > > > > > > > > > much > > > > > > > > > > > > luck > > > > > > > > > > > > with > > > > > > > > > > > > that yet. Another way around this may be to add > > > > > > > > > > > > this > > > > > > > > > > > > to > > > > > > > > > > > > the > > > > > > > > > > > > template: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > key="slurm.exclusive">false > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The swift.log issue was never fixed. It went to > > > > > > > > > > > > swift-devel > > > > > > > > > > > > for > > > > > > > > > > > > discussion but was never fixed. I think it is > > > > > > > > > > > > relatively > > > > > > > > > > > > simple > > > > > > > > > > > > though.. probably worth fixing before release. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > > > > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > > > > > > > Sent: Saturday, March 9, 2013 1:38:47 PM > > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > OK, sounds good re the trip plan. Feel free to > > > > > > > > > > > > stay > > > > > > > > > > > > Tue > > > > > > > > > > > > night > > > > > > > > > > > > to > > > > > > > > > > > > avoid a 4hr drive after a long day. > > > > > > > > > > > > > > > > > > > > > > > > Im trying the modis demo. > > > > > > > > > > > > > > > > > > > > > > > > I tried to create a local sites/ dir so I can > > > > > > > > > > > > modify > > > > > > > > > > > > the > > > > > > > > > > > > sites > > > > > > > > > > > > templates; thats not working for me either yet. > > > > > > > > > > > > > > > > > > > > > > > > For midway, need to force to westmere or sandyb > > > > > > > > > > > > (but > > > > > > > > > > > > not > > > > > > > > > > > > both) > > > > > > > > > > > > and > > > > > > > > > > > > ensure 1-node jobs, because either queue can > > > > > > > > > > > > get > > > > > > > > > > > > filled > > > > > > > > > > > > and > > > > > > > > > > > > not > > > > > > > > > > > > yield an idle node for a long time. maybe need > > > > > > > > > > > > to > > > > > > > > > > > > fiddle > > > > > > > > > > > > jobsPerNode > > > > > > > > > > > > to get at least 1 core when the system is busy > > > > > > > > > > > > and > > > > > > > > > > > > *pretend* > > > > > > > > > > > > that > > > > > > > > > > > > its a node. > > > > > > > > > > > > > > > > > > > > > > > > So to get response I tried beagle-ssh; That > > > > > > > > > > > > isnt > > > > > > > > > > > > working > > > > > > > > > > > > because > > > > > > > > > > > > the > > > > > > > > > > > > template sites file is wrong in swift 0.94 rc4. > > > > > > > > > > > > > > > > > > > > > > > > I also see that swift.log is still getting > > > > > > > > > > > > produced - > > > > > > > > > > > > I > > > > > > > > > > > > thought > > > > > > > > > > > > we > > > > > > > > > > > > eliminated that. Did it come back due to a > > > > > > > > > > > > problem > > > > > > > > > > > > with > > > > > > > > > > > > that > > > > > > > > > > > > fix? > > > > > > > > > > > > > > > > > > > > > > > > I'll keep hacking; suggestions welcome. > > > > > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > From: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > > > > > > > > > > > > > > > > > > > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > > > > > > > > Sent: Saturday, March 9, 2013 12:20:00 PM > > > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Mike, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Looking more closely at the agenda, I think > > > > > > > > > > > > > the > > > > > > > > > > > > > most > > > > > > > > > > > > > interesting/useful talks will be on Tuesday. > > > > > > > > > > > > > Monday > > > > > > > > > > > > > I'll > > > > > > > > > > > > > come > > > > > > > > > > > > > to > > > > > > > > > > > > > Argonne to work on any loose ends and put the > > > > > > > > > > > > > finishing > > > > > > > > > > > > > touches > > > > > > > > > > > > > on > > > > > > > > > > > > > any slides/runs/scripts, then drive to > > > > > > > > > > > > > Indianapolis > > > > > > > > > > > > > on > > > > > > > > > > > > > Monday > > > > > > > > > > > > > afternoon/evening. I have a hotel booked for > > > > > > > > > > > > > Monday > > > > > > > > > > > > > night. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'll do some runs using the routes we talked > > > > > > > > > > > > > about. > > > > > > > > > > > > > I'm > > > > > > > > > > > > > pretty > > > > > > > > > > > > > sure > > > > > > > > > > > > > I > > > > > > > > > > > > > have working configurations for everything we > > > > > > > > > > > > > talked > > > > > > > > > > > > > about, > > > > > > > > > > > > > so > > > > > > > > > > > > > I > > > > > > > > > > > > > think it's really just a matter of plugging > > > > > > > > > > > > > in > > > > > > > > > > > > > the > > > > > > > > > > > > > apps. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > David > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" < wilde at mcs.anl.gov > > > > > > > > > > > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > > > > > > > > Sent: Saturday, March 9, 2013 11:03:15 AM > > > > > > > > > > > > > Subject: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > Hi David, > > > > > > > > > > > > > > > > > > > > > > > > > > I just wanted to let you know that Im looking > > > > > > > > > > > > > into > > > > > > > > > > > > > the > > > > > > > > > > > > > run > > > > > > > > > > > > > options > > > > > > > > > > > > > now. Im hoping to try a few... WIll see how > > > > > > > > > > > > > much > > > > > > > > > > > > > help > > > > > > > > > > > > > I > > > > > > > > > > > > > need. > > > > > > > > > > > > > Have > > > > > > > > > > > > > you decided on a driving time and made hotel > > > > > > > > > > > > > arrangements? > > > > > > > > > > > > > > > > > > > > > > > > > > I would feel free to stay for whatever > > > > > > > > > > > > > portion > > > > > > > > > > > > > of > > > > > > > > > > > > > the > > > > > > > > > > > > > OSG > > > > > > > > > > > > > meeting > > > > > > > > > > > > > you > > > > > > > > > > > > > feel is of value. The only thing I ask is > > > > > > > > > > > > > that > > > > > > > > > > > > > for > > > > > > > > > > > > > Wed > > > > > > > > > > > > > and > > > > > > > > > > > > > Thu > > > > > > > > > > > > > you > > > > > > > > > > > > > stay available online for user-support or > > > > > > > > > > > > > other > > > > > > > > > > > > > assistance > > > > > > > > > > > > > needs > > > > > > > > > > > > > that come up here. And that you engage with > > > > > > > > > > > > > people > > > > > > > > > > > > > that > > > > > > > > > > > > > can > > > > > > > > > > > > > help > > > > > > > > > > > > > us > > > > > > > > > > > > > develop the Swift user community and reliable > > > > > > > > > > > > > OSG > > > > > > > > > > > > > usage. > > > > > > > > > > > > > Rob, > > > > > > > > > > > > > Marco, > > > > > > > > > > > > > Lincoln, and Suchandra would be good to hang > > > > > > > > > > > > > out > > > > > > > > > > > > > with > > > > > > > > > > > > > and > > > > > > > > > > > > > they > > > > > > > > > > > > > can > > > > > > > > > > > > > introduce you to good contacts. > > > > > > > > > > > > > > > > > > > > > > > > > > Of course we will cover your expenses via a > > > > > > > > > > > > > UChicago > > > > > > > > > > > > > travel > > > > > > > > > > > > > expense > > > > > > > > > > > > > report. > > > > > > > > > > > > > > > > > > > > > > > > > > We'll be starting a project with a tiny bit > > > > > > > > > > > > > of > > > > > > > > > > > > > additional > > > > > > > > > > > > > ExTENCI > > > > > > > > > > > > > funds to make Swift do smarter data > > > > > > > > > > > > > management > > > > > > > > > > > > > on > > > > > > > > > > > > > OSG > > > > > > > > > > > > > sites > > > > > > > > > > > > > (and > > > > > > > > > > > > > in > > > > > > > > > > > > > general) so anything you learn about OSG > > > > > > > > > > > > > storage > > > > > > > > > > > > > elements/services/tools will be valuable for > > > > > > > > > > > > > that > > > > > > > > > > > > > (srmcp, > > > > > > > > > > > > > lcgcp, > > > > > > > > > > > > > etc). > > > > > > > > > > > > > > > > > > > > > > > > > > Between now and your talk, lets just focus on > > > > > > > > > > > > > the > > > > > > > > > > > > > talk, > > > > > > > > > > > > > OK? > > > > > > > > > > > > > Im > > > > > > > > > > > > > hoping > > > > > > > > > > > > > we have slides frozen by Monday. > > > > > > > > > > > > > > > > > > > > > > > > > > While I fiddle, if you could do catsn or > > > > > > > > > > > > > other > > > > > > > > > > > > > hello-world-like > > > > > > > > > > > > > tests > > > > > > > > > > > > > to cover the "routes" we discussed, that > > > > > > > > > > > > > would > > > > > > > > > > > > > pave > > > > > > > > > > > > > the > > > > > > > > > > > > > way > > > > > > > > > > > > > for > > > > > > > > > > > > > plugging in the real app examples. > > > > > > > > > > > > > > > > > > > > > > > > > > Sound good? Let me know of any concerns > > > > > > > > > > > > > (other > > > > > > > > > > > > > than > > > > > > > > > > > > > the > > > > > > > > > > > > > fact > > > > > > > > > > > > > that > > > > > > > > > > > > > this is a tad rushed ;) > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks and regards, > > > > > > > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > Michael Wilde > > > > > > > > > > > > > Computation Institute, University of Chicago > > > > > > > > > > > > > Mathematics and Computer Science Division > > > > > > > > > > > > > Argonne National Laboratory > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > > > > > -- > > Ketan > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > From wilde at mcs.anl.gov Sun Mar 10 11:25:19 2013 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sun, 10 Mar 2013 11:25:19 -0500 (CDT) Subject: [Swift-devel] Cant get auto-coasters to run from midway to beagle In-Reply-To: <1362900986.29839.4.camel@echo> Message-ID: <127013041.1275274.1362932719417.JavaMail.root@mcs.anl.gov> OK, great - thanks for the quick fix! I'll test now. Related: is that message about seemingly-past proxy expiration time an issue? It seemed very strange. - Mike ----- Original Message ----- > From: "Mihael Hategan" > To: "Michael Wilde" > Cc: "Swift Devel" > Sent: Sunday, March 10, 2013 1:36:26 AM > Subject: Re: [Swift-devel] Cant get auto-coasters to run from midway to beagle > > Please try now. I made some changes: > > 1. start the service with "-l" so that things in your .profile (such > as > module load sun-java) would be picked up. However, this also means > that > you should unset X509_* stuff or the sshcl proxy forwarding will not > work properly. > > 2. I fixed a bug that caused an extra connection to the coaster > service. > Normally the service connects back to the client and both use that > connection. However, due to some changes in the way credentials were > set > for jobs, and the fact that connections were looked up based on both > hostname and credential, the coaster client would ignore the existing > connection and create another one. The initial one with then time out > at > some point causing the service to crash. > > Mihael > > On Sat, 2013-03-09 at 17:49 -0600, Michael Wilde wrote: > > An update on this provider staging related issue: reducing filesize > > from 17MB to 600KB runs well. > > > > So seems like some kind of flow control or buffer management > > problem, possibly? > > > > May need to take that problem offline - would be a perfect test > > case for Yadu to develop a new stress test for. > > > > - Mike > > > > > > ----- Forwarded Message ----- > > From: "Michael Wilde" > > To: "David Kelly" > > Sent: Saturday, March 9, 2013 5:21:49 PM > > Subject: Re: runs for OSG talk > > > > OK, much better: with 600K files (5x5 reduction or 25X smaller) it > > works well, and fast (form midway to beagle!) > > > > Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified locally) > > > > RunID: 20130309-2319-5zq0jrfg > > Progress: time: Sat, 09 Mar 2013 23:19:45 +0000 > > Progress: time: Sat, 09 Mar 2013 23:19:56 +0000 Selecting > > site:269 Submitting:47 Submitted:1 > > Progress: time: Sat, 09 Mar 2013 23:20:05 +0000 Selecting > > site:269 Stage in:1 Submitted:47 > > Progress: time: Sat, 09 Mar 2013 23:20:09 +0000 Selecting > > site:269 Stage in:47 Active:1 > > Progress: time: Sat, 09 Mar 2013 23:20:10 +0000 Selecting > > site:269 Stage in:46 Active:1 Stage out:1 > > Progress: time: Sat, 09 Mar 2013 23:20:11 +0000 Selecting > > site:250 Stage in:19 Active:28 Stage out:1 Finished > > successfully:19 > > Progress: time: Sat, 09 Mar 2013 23:20:12 +0000 Selecting > > site:229 Stage in:18 Submitting:21 Active:1 Stage out:7 > > Finished successfully:41 > > Progress: time: Sat, 09 Mar 2013 23:20:13 +0000 Selecting > > site:220 Stage in:41 Submitting:1 Active:5 Stage out:1 > > Finished successfully:49 > > Progress: time: Sat, 09 Mar 2013 23:20:14 +0000 Selecting > > site:220 Stage in:38 Active:1 Stage out:9 Finished > > successfully:49 > > Progress: time: Sat, 09 Mar 2013 23:20:15 +0000 Selecting > > site:212 Stage in:30 Submitting:8 Stage out:9 Finished > > successfully:58 > > Progress: time: Sat, 09 Mar 2013 23:20:16 +0000 Selecting > > site:203 Stage in:38 Submitting:8 Submitted:1 Finished > > successfully:67 > > Progress: time: Sat, 09 Mar 2013 23:20:18 +0000 Selecting > > site:202 Stage in:19 Stage out:28 Finished successfully:68 > > Progress: time: Sat, 09 Mar 2013 23:20:19 +0000 Selecting > > site:172 Stage in:33 Submitting:2 Submitted:6 Active:5 Stage > > out:2 Finished successfully:97 > > Progress: time: Sat, 09 Mar 2013 23:20:20 +0000 Selecting > > site:170 Stage in:31 Submitting:2 Stage out:14 Finished > > successfully:100 > > Progress: time: Sat, 09 Mar 2013 23:20:21 +0000 Selecting > > site:162 Stage in:30 Submitting:10 Stage out:6 Finished > > successfully:109 > > Progress: time: Sat, 09 Mar 2013 23:20:22 +0000 Selecting > > site:154 Stage in:39 Submitting:5 Submitted:3 Active:1 > > Finished successfully:115 > > Progress: time: Sat, 09 Mar 2013 23:20:23 +0000 Selecting > > site:154 Stage in:21 Active:10 Stage out:16 Finished > > successfully:116 > > Progress: time: Sat, 09 Mar 2013 23:20:24 +0000 Selecting > > site:126 Stage in:20 Submitting:25 Submitted:1 Stage out:2 > > Finished successfully:143 > > Progress: time: Sat, 09 Mar 2013 23:20:25 +0000 Selecting > > site:124 Stage in:31 Active:2 Stage out:15 Finished > > successfully:145 > > Progress: time: Sat, 09 Mar 2013 23:20:26 +0000 Selecting > > site:110 Stage in:30 Submitting:14 Stage out:3 Finished > > successfully:160 > > Progress: time: Sat, 09 Mar 2013 23:20:27 +0000 Selecting > > site:106 Stage in:43 Submitting:1 Submitted:1 Active:1 Stage > > out:2 Finished successfully:163 > > Progress: time: Sat, 09 Mar 2013 23:20:28 +0000 Selecting > > site:104 Stage in:20 Submitting:2 Active:7 Stage out:19 > > Finished successfully:165 > > Progress: time: Sat, 09 Mar 2013 23:20:29 +0000 Selecting site:78 > > Stage in:29 Submitting:16 Submitted:1 Stage out:2 Finished > > successfully:191 > > Progress: time: Sat, 09 Mar 2013 23:20:31 +0000 Selecting site:76 > > Stage in:30 Stage out:17 Finished successfully:194 > > Progress: time: Sat, 09 Mar 2013 23:20:32 +0000 Selecting site:58 > > Stage in:29 Submitting:18 Active:1 Finished successfully:211 > > Progress: time: Sat, 09 Mar 2013 23:20:33 +0000 Selecting site:58 > > Stage in:33 Active:3 Stage out:12 Finished successfully:211 > > Progress: time: Sat, 09 Mar 2013 23:20:34 +0000 Selecting site:46 > > Stage in:18 Submitting:11 Submitted:1 Active:2 Stage out:14 > > Finished successfully:225 > > Progress: time: Sat, 09 Mar 2013 23:20:35 +0000 Selecting site:30 > > Stage in:29 Active:14 Stage out:3 Finished successfully:241 > > Progress: time: Sat, 09 Mar 2013 23:20:36 +0000 Selecting site:28 > > Stage in:28 Submitting:2 Stage out:17 Finished > > successfully:242 > > Progress: time: Sat, 09 Mar 2013 23:20:37 +0000 Selecting site:10 > > Stage in:30 Submitting:17 Submitted:1 Finished > > successfully:259 > > Progress: time: Sat, 09 Mar 2013 23:20:38 +0000 Selecting site:10 > > Stage in:35 Stage out:13 Finished successfully:259 > > Progress: time: Sat, 09 Mar 2013 23:20:39 +0000 Stage in:21 > > Submitting:6 Submitted:3 Stage out:15 Finished > > successfully:272 > > Progress: time: Sat, 09 Mar 2013 23:20:40 +0000 Stage in:10 > > Active:5 Stage out:14 Finished successfully:288 > > Final status: Sat, 09 Mar 2013 23:20:41 +0000 Finished > > successfully:317 > > > > real 0m58.953s > > user 0m32.573s > > sys 0m1.263s > > + mv /home/wilde/.swift/runs/current/run029.1362871183 > > /home/wilde/.swift/runs/completed > > midway001$ > > > > > > > > ----- Original Message ----- > > > From: "David Kelly" > > > To: "Michael Wilde" > > > Sent: Saturday, March 9, 2013 5:12:59 PM > > > Subject: Re: runs for OSG talk > > > > > > > > > Yep - I had a version where the input files were in a very > > > similar > > > format (PGM, 1 byte per pixel). I'll add that back, but without > > > the > > > small PGM header in the files. > > > > > > ----- Original Message ----- > > > > > > > > > From: "Michael Wilde" > > > To: "David Kelly" > > > Sent: Saturday, March 9, 2013 5:04:43 PM > > > Subject: Re: runs for OSG talk > > > > > > I think we need to cut down the size of these files for a demo > > > (although they are great for a stress test). > > > > > > First, the RGB format by itself uses 3 bytes per pixel when it > > > only > > > needs one (for land use) > > > > > > Second, we should cut down by a factor of 9 (3x3) or 16 (4x4). > > > > > > I tried that using simple convert statements, but it always seems > > > to > > > yield a file exactly double what it should be. > > > > > > More on this later; was hoping to get things working "as is" > > > first. > > > > > > I assume you could get the perl code to work on > > > one-byte-per-pixel > > > instead of the default 3 for the convert rgb format? > > > > > > - Mike > > > > > > ----- Original Message ----- > > > > From: "David Kelly" > > > > To: "Michael Wilde" > > > > Sent: Saturday, March 9, 2013 4:36:30 PM > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > That would probably be a good idea for a new script, to show > > > > how to > > > > stage apps like that. For now I updated the scripts on lustre.. > > > > hopefully that helps. > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > From: "Michael Wilde" > > > > To: "David Kelly" > > > > Sent: Saturday, March 9, 2013 4:29:14 PM > > > > Subject: Re: runs for OSG talk > > > > > > > > OK, I see that its trying to run getlanduse.sh from your > > > > /lustre > > > > dir > > > > on beagle, which is different than the one Ive got checked out. > > > > It > > > > seems to get an error in a stderr redirect??? Let me se what I > > > > need > > > > to do to get the beagle side in sync. > > > > > > > > Seems like since these are perl scripts, we should make the > > > > app() > > > > /bin/sh and send the script as data, perhaps? > > > > > > > > - Mike > > > > > > > > ----- Original Message ----- > > > > > From: "Michael Wilde" > > > > > To: "David Kelly" > > > > > Sent: Saturday, March 9, 2013 4:19:31 PM > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > OK, making progress. Now I dialed down the throttle and node > > > > > counts > > > > > to 48 jobs. > > > > > > > > > > Now I get further, for ./demo and site=4 script=2: > > > > > > > > > > RunID: 20130309-2214-1oi3rvea > > > > > Progress: time: Sat, 09 Mar 2013 22:14:06 +0000 > > > > > Progress: time: Sat, 09 Mar 2013 22:14:17 +0000 Selecting > > > > > site:269 > > > > > Submitting:47 Submitted:1 > > > > > Progress: time: Sat, 09 Mar 2013 22:14:22 +0000 Selecting > > > > > site:269 > > > > > Stage in:1 Submitted:47 > > > > > Progress: time: Sat, 09 Mar 2013 22:14:28 +0000 Selecting > > > > > site:269 > > > > > Stage in:25 Submitted:23 > > > > > Progress: time: Sat, 09 Mar 2013 22:14:36 +0000 Selecting > > > > > site:269 > > > > > Stage in:48 > > > > > Progress: time: Sat, 09 Mar 2013 22:15:06 +0000 Selecting > > > > > site:269 > > > > > Stage in:48 > > > > > Progress: time: Sat, 09 Mar 2013 22:15:36 +0000 Selecting > > > > > site:269 > > > > > Stage in:48 > > > > > Progress: time: Sat, 09 Mar 2013 22:16:06 +0000 Selecting > > > > > site:269 > > > > > Stage in:48 > > > > > Progress: time: Sat, 09 Mar 2013 22:16:26 +0000 Selecting > > > > > site:269 > > > > > Stage in:47 Active:1 > > > > > Progress: time: Sat, 09 Mar 2013 22:16:27 +0000 Selecting > > > > > site:269 > > > > > Stage in:36 Active:12 > > > > > Progress: time: Sat, 09 Mar 2013 22:16:29 +0000 Selecting > > > > > site:269 > > > > > Stage in:24 Active:24 > > > > > Progress: time: Sat, 09 Mar 2013 22:16:34 +0000 Selecting > > > > > site:269 > > > > > Stage in:24 Active:23 Stage out:1 > > > > > Progress: time: Sat, 09 Mar 2013 22:16:35 +0000 Selecting > > > > > site:269 > > > > > Stage in:14 Active:33 Stage out:1 > > > > > Execution failed: > > > > > Exception in getlanduse: > > > > > Arguments: > > > > > [home/wilde/osgdemo/modis/svn/data/modis/2002/h08v04.rgb] > > > > > Host: beagle > > > > > Directory: > > > > > modis02-20130309-2214-1oi3rvea/jobs/k/getlanduse-ko5qjd6l > > > > > > > > > > Caused by: > > > > > Application /lustre/beagle/davidk/modis/bin/getlanduse.sh > > > > > failed > > > > > with an exit code of 1 > > > > > getLandUse, modis02.swift, line 20 > > > > > > > > > > real 2m31.463s > > > > > user 1m33.238s > > > > > sys 0m2.160s > > > > > + mv /home/wilde/.swift/runs/current/run024.1362867244 > > > > > /home/wilde/.swift/runs/completed > > > > > midway001$ > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > From: "David Kelly" > > > > > > To: "Michael Wilde" > > > > > > Sent: Saturday, March 9, 2013 3:55:30 PM > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > ok, I'll take a look at that. The run dir I used was > > > > > > /scratch/midway/davidkelly999/modis/run011 > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > To: "David Kelly" > > > > > > Sent: Saturday, March 9, 2013 3:52:28 PM > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > I just tried this, but didnt work - same prob. > > > > > > > > > > > > But if its working for you now, we must be close. > > > > > > > > > > > > Not yet sure what the diff is... > > > > > > > > > > > > My run dir is /home/wilde/osgdemo/modis/svn/run021 > > > > > > > > > > > > - Mike > > > > > > > > > > > > ----- Original Message ----- > > > > > > > From: "David Kelly" > > > > > > > To: "Michael Wilde" > > > > > > > Sent: Saturday, March 9, 2013 3:46:13 PM > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > Had to make sure I was using the IP address on eth4 > > > > > > > (128.135.112.71 > > > > > > > for midway-login1), not a local address or an infiniband > > > > > > > address. > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > From: "David Kelly" > > > > > > > To: "Michael Wilde" > > > > > > > Sent: Saturday, March 9, 2013 3:43:51 PM > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > I just got it working. I had to adjust for the > > > > > > > differences in > > > > > > > my > > > > > > > username on Beagle/Midway, then I had to set > > > > > > > GLOBUS_HOSTNAME > > > > > > > on > > > > > > > Midway to the IP address, rather than the full hostname > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > To: "David Kelly" > > > > > > > Sent: Saturday, March 9, 2013 3:40:03 PM > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > From: "David Kelly" > > > > > > > > To: "Michael Wilde" > > > > > > > > Sent: Saturday, March 9, 2013 3:34:58 PM > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > Is your username the same on beagle and midway? > > > > > > > > > > > > > > Yes. And I verified that I can ssh to login4 on beagle > > > > > > > from > > > > > > > my > > > > > > > midway > > > > > > > session (as indeed the scp's of the proxy files seem to > > > > > > > be > > > > > > > working) > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > To: "David Kelly" > > > > > > > > Sent: Saturday, March 9, 2013 3:34:28 PM > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > OK. > > > > > > > > > > > > > > > > Ignore what I said about "problem finding java" - thats > > > > > > > > code > > > > > > > > in > > > > > > > > the > > > > > > > > very long escaped shell command that gets sent to the > > > > > > > > remote > > > > > > > > side. > > > > > > > > I > > > > > > > > dont *think* thats the problem. > > > > > > > > > > > > > > > > I also verified that beagle can connect to ports 50001 > > > > > > > > etc > > > > > > > > on > > > > > > > > swift.rcc, and that seems OK. > > > > > > > > > > > > > > > > I exported GLOBUS_HOSTNAME=midway001.rcc.uchicago.edu > > > > > > > > on > > > > > > > > the > > > > > > > > midway > > > > > > > > side. And the beagle side seems to be connecting there. > > > > > > > > > > > > > > > > Im a bit confused about the timestamps I see for the > > > > > > > > proxy > > > > > > > > expiration > > > > > > > > time, but am not yet suspicious of that (although it > > > > > > > > seems > > > > > > > > less > > > > > > > > than > > > > > > > > 5 hours past GMT... not sure.) > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > From: "David Kelly" > > > > > > > > > To: "Michael Wilde" > > > > > > > > > Sent: Saturday, March 9, 2013 3:26:32 PM > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > I'm seeing the same error now.. looking into it > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > To: "David Kelly" > > > > > > > > > Sent: Saturday, March 9, 2013 3:21:30 PM > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > Looking deeper I see that the logs show problems with > > > > > > > > > finding > > > > > > > > > Java, > > > > > > > > > I > > > > > > > > > assume on beagle, ans also service ending (presumably > > > > > > > > > coaster > > > > > > > > > service on midway host). > > > > > > > > > > > > > > > > > > I'll dig into these two. > > > > > > > > > > > > > > > > > > I see that it scp's the proxies to beagle which I > > > > > > > > > think > > > > > > > > > answers > > > > > > > > > my > > > > > > > > > question about security. > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > > To: "David Kelly" > > > > > > > > > > Sent: Saturday, March 9, 2013 3:15:01 PM > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > OK. Any thoughts about beagle? > > > > > > > > > > > > > > > > > > > > Ive been experimenting but still cant get it to > > > > > > > > > > work, > > > > > > > > > > same > > > > > > > > > > error > > > > > > > > > > (cant connect to bootstrap port) > > > > > > > > > > > > > > > > > > > > WHen you tried ssh-cl to beagle with automatic > > > > > > > > > > coasters, > > > > > > > > > > what > > > > > > > > > > configuration (sites env etc) did you use? > > > > > > > > > > > > > > > > > > > > I verified that beagle can connect back to the > > > > > > > > > > midway > > > > > > > > > > hosts > > > > > > > > > > and > > > > > > > > > > ports. > > > > > > > > > > > > > > > > > > > > Do we need to specify security or create a proxy > > > > > > > > > > etc? > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > From: "David Kelly" > > > > > > > > > > > To: "Michael Wilde" > > > > > > > > > > > Sent: Saturday, March 9, 2013 3:08:58 PM > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > One way you can override/customize the default > > > > > > > > > > > templates > > > > > > > > > > > is > > > > > > > > > > > to > > > > > > > > > > > create > > > > > > > > > > > them in $HOME/.swift/sites (I'm not sure if > > > > > > > > > > > that's > > > > > > > > > > > what > > > > > > > > > > > you > > > > > > > > > > > mean > > > > > > > > > > > by > > > > > > > > > > > a local sites dir or not). But you are right > > > > > > > > > > > about > > > > > > > > > > > Midway > > > > > > > > > > > - > > > > > > > > > > > I > > > > > > > > > > > have > > > > > > > > > > > noticed that when using modis it will sometimes > > > > > > > > > > > get > > > > > > > > > > > stuck > > > > > > > > > > > when > > > > > > > > > > > it > > > > > > > > > > > goes to a queue that is busy. Ideally swift > > > > > > > > > > > replication > > > > > > > > > > > would > > > > > > > > > > > be > > > > > > > > > > > able to help better handle that, but I haven't > > > > > > > > > > > had > > > > > > > > > > > much > > > > > > > > > > > luck > > > > > > > > > > > with > > > > > > > > > > > that yet. Another way around this may be to add > > > > > > > > > > > this > > > > > > > > > > > to > > > > > > > > > > > the > > > > > > > > > > > template: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > key="slurm.exclusive">false > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The swift.log issue was never fixed. It went to > > > > > > > > > > > swift-devel > > > > > > > > > > > for > > > > > > > > > > > discussion but was never fixed. I think it is > > > > > > > > > > > relatively > > > > > > > > > > > simple > > > > > > > > > > > though.. probably worth fixing before release. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > > > To: "David Kelly" > > > > > > > > > > > Sent: Saturday, March 9, 2013 1:38:47 PM > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > OK, sounds good re the trip plan. Feel free to > > > > > > > > > > > stay > > > > > > > > > > > Tue > > > > > > > > > > > night > > > > > > > > > > > to > > > > > > > > > > > avoid a 4hr drive after a long day. > > > > > > > > > > > > > > > > > > > > > > Im trying the modis demo. > > > > > > > > > > > > > > > > > > > > > > I tried to create a local sites/ dir so I can > > > > > > > > > > > modify > > > > > > > > > > > the > > > > > > > > > > > sites > > > > > > > > > > > templates; thats not working for me either yet. > > > > > > > > > > > > > > > > > > > > > > For midway, need to force to westmere or sandyb > > > > > > > > > > > (but > > > > > > > > > > > not > > > > > > > > > > > both) > > > > > > > > > > > and > > > > > > > > > > > ensure 1-node jobs, because either queue can get > > > > > > > > > > > filled > > > > > > > > > > > and > > > > > > > > > > > not > > > > > > > > > > > yield an idle node for a long time. maybe need to > > > > > > > > > > > fiddle > > > > > > > > > > > jobsPerNode > > > > > > > > > > > to get at least 1 core when the system is busy > > > > > > > > > > > and > > > > > > > > > > > *pretend* > > > > > > > > > > > that > > > > > > > > > > > its a node. > > > > > > > > > > > > > > > > > > > > > > So to get response I tried beagle-ssh; That isnt > > > > > > > > > > > working > > > > > > > > > > > because > > > > > > > > > > > the > > > > > > > > > > > template sites file is wrong in swift 0.94 rc4. > > > > > > > > > > > > > > > > > > > > > > I also see that swift.log is still getting > > > > > > > > > > > produced - > > > > > > > > > > > I > > > > > > > > > > > thought > > > > > > > > > > > we > > > > > > > > > > > eliminated that. Did it come back due to a > > > > > > > > > > > problem > > > > > > > > > > > with > > > > > > > > > > > that > > > > > > > > > > > fix? > > > > > > > > > > > > > > > > > > > > > > I'll keep hacking; suggestions welcome. > > > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > From: "David Kelly" > > > > > > > > > > > > To: "Michael Wilde" > > > > > > > > > > > > Sent: Saturday, March 9, 2013 12:20:00 PM > > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Mike, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Looking more closely at the agenda, I think the > > > > > > > > > > > > most > > > > > > > > > > > > interesting/useful talks will be on Tuesday. > > > > > > > > > > > > Monday > > > > > > > > > > > > I'll > > > > > > > > > > > > come > > > > > > > > > > > > to > > > > > > > > > > > > Argonne to work on any loose ends and put the > > > > > > > > > > > > finishing > > > > > > > > > > > > touches > > > > > > > > > > > > on > > > > > > > > > > > > any slides/runs/scripts, then drive to > > > > > > > > > > > > Indianapolis > > > > > > > > > > > > on > > > > > > > > > > > > Monday > > > > > > > > > > > > afternoon/evening. I have a hotel booked for > > > > > > > > > > > > Monday > > > > > > > > > > > > night. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'll do some runs using the routes we talked > > > > > > > > > > > > about. > > > > > > > > > > > > I'm > > > > > > > > > > > > pretty > > > > > > > > > > > > sure > > > > > > > > > > > > I > > > > > > > > > > > > have working configurations for everything we > > > > > > > > > > > > talked > > > > > > > > > > > > about, > > > > > > > > > > > > so > > > > > > > > > > > > I > > > > > > > > > > > > think it's really just a matter of plugging in > > > > > > > > > > > > the > > > > > > > > > > > > apps. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > David > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > > > > To: "David Kelly" > > > > > > > > > > > > Sent: Saturday, March 9, 2013 11:03:15 AM > > > > > > > > > > > > Subject: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > Hi David, > > > > > > > > > > > > > > > > > > > > > > > > I just wanted to let you know that Im looking > > > > > > > > > > > > into > > > > > > > > > > > > the > > > > > > > > > > > > run > > > > > > > > > > > > options > > > > > > > > > > > > now. Im hoping to try a few... WIll see how > > > > > > > > > > > > much > > > > > > > > > > > > help > > > > > > > > > > > > I > > > > > > > > > > > > need. > > > > > > > > > > > > Have > > > > > > > > > > > > you decided on a driving time and made hotel > > > > > > > > > > > > arrangements? > > > > > > > > > > > > > > > > > > > > > > > > I would feel free to stay for whatever portion > > > > > > > > > > > > of > > > > > > > > > > > > the > > > > > > > > > > > > OSG > > > > > > > > > > > > meeting > > > > > > > > > > > > you > > > > > > > > > > > > feel is of value. The only thing I ask is that > > > > > > > > > > > > for > > > > > > > > > > > > Wed > > > > > > > > > > > > and > > > > > > > > > > > > Thu > > > > > > > > > > > > you > > > > > > > > > > > > stay available online for user-support or other > > > > > > > > > > > > assistance > > > > > > > > > > > > needs > > > > > > > > > > > > that come up here. And that you engage with > > > > > > > > > > > > people > > > > > > > > > > > > that > > > > > > > > > > > > can > > > > > > > > > > > > help > > > > > > > > > > > > us > > > > > > > > > > > > develop the Swift user community and reliable > > > > > > > > > > > > OSG > > > > > > > > > > > > usage. > > > > > > > > > > > > Rob, > > > > > > > > > > > > Marco, > > > > > > > > > > > > Lincoln, and Suchandra would be good to hang > > > > > > > > > > > > out > > > > > > > > > > > > with > > > > > > > > > > > > and > > > > > > > > > > > > they > > > > > > > > > > > > can > > > > > > > > > > > > introduce you to good contacts. > > > > > > > > > > > > > > > > > > > > > > > > Of course we will cover your expenses via a > > > > > > > > > > > > UChicago > > > > > > > > > > > > travel > > > > > > > > > > > > expense > > > > > > > > > > > > report. > > > > > > > > > > > > > > > > > > > > > > > > We'll be starting a project with a tiny bit of > > > > > > > > > > > > additional > > > > > > > > > > > > ExTENCI > > > > > > > > > > > > funds to make Swift do smarter data management > > > > > > > > > > > > on > > > > > > > > > > > > OSG > > > > > > > > > > > > sites > > > > > > > > > > > > (and > > > > > > > > > > > > in > > > > > > > > > > > > general) so anything you learn about OSG > > > > > > > > > > > > storage > > > > > > > > > > > > elements/services/tools will be valuable for > > > > > > > > > > > > that > > > > > > > > > > > > (srmcp, > > > > > > > > > > > > lcgcp, > > > > > > > > > > > > etc). > > > > > > > > > > > > > > > > > > > > > > > > Between now and your talk, lets just focus on > > > > > > > > > > > > the > > > > > > > > > > > > talk, > > > > > > > > > > > > OK? > > > > > > > > > > > > Im > > > > > > > > > > > > hoping > > > > > > > > > > > > we have slides frozen by Monday. > > > > > > > > > > > > > > > > > > > > > > > > While I fiddle, if you could do catsn or other > > > > > > > > > > > > hello-world-like > > > > > > > > > > > > tests > > > > > > > > > > > > to cover the "routes" we discussed, that would > > > > > > > > > > > > pave > > > > > > > > > > > > the > > > > > > > > > > > > way > > > > > > > > > > > > for > > > > > > > > > > > > plugging in the real app examples. > > > > > > > > > > > > > > > > > > > > > > > > Sound good? Let me know of any concerns (other > > > > > > > > > > > > than > > > > > > > > > > > > the > > > > > > > > > > > > fact > > > > > > > > > > > > that > > > > > > > > > > > > this is a tad rushed ;) > > > > > > > > > > > > > > > > > > > > > > > > Thanks and regards, > > > > > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > Michael Wilde > > > > > > > > > > > > Computation Institute, University of Chicago > > > > > > > > > > > > Mathematics and Computer Science Division > > > > > > > > > > > > Argonne National Laboratory > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > From wilde at mcs.anl.gov Sun Mar 10 12:01:53 2013 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sun, 10 Mar 2013 12:01:53 -0500 (CDT) Subject: [Swift-devel] Cant get auto-coasters to run from midway to beagle In-Reply-To: <1362900986.29839.4.camel@echo> Message-ID: <52673064.1276305.1362934913928.JavaMail.root@mcs.anl.gov> Here's run034: seems to be a bit better, but still dies. This is with throttle of 48 jobs on 48 cores (2 nodes), fom swift.rcc to beagle. 17MB files. Still seems to curiously die about 4 mins into the run, which suggests some kind of timeout is still lurking??? - Mike Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified locally) RunID: 20130310-1639-kyb8hca9 Progress: time: Sun, 10 Mar 2013 16:39:45 +0000 Progress: time: Sun, 10 Mar 2013 16:39:56 +0000 Selecting site:269 Submitting:47 Submitted:1 Progress: time: Sun, 10 Mar 2013 16:40:01 +0000 Selecting site:269 Stage in:1 Submitted:47 Progress: time: Sun, 10 Mar 2013 16:40:15 +0000 Selecting site:269 Stage in:48 Progress: time: Sun, 10 Mar 2013 16:40:45 +0000 Selecting site:269 Stage in:48 Progress: time: Sun, 10 Mar 2013 16:41:15 +0000 Selecting site:269 Stage in:48 Progress: time: Sun, 10 Mar 2013 16:41:45 +0000 Selecting site:269 Stage in:48 Progress: time: Sun, 10 Mar 2013 16:42:11 +0000 Selecting site:269 Stage in:47 Active:1 Progress: time: Sun, 10 Mar 2013 16:42:12 +0000 Selecting site:269 Stage in:41 Active:7 Progress: time: Sun, 10 Mar 2013 16:42:13 +0000 Selecting site:269 Stage in:23 Active:25 Progress: time: Sun, 10 Mar 2013 16:42:15 +0000 Selecting site:269 Active:48 Progress: time: Sun, 10 Mar 2013 16:42:17 +0000 Selecting site:269 Active:47 Stage out:1 Progress: time: Sun, 10 Mar 2013 16:42:18 +0000 Selecting site:268 Stage in:1 Active:46 Stage out:1 Finished successfully:1 Progress: time: Sun, 10 Mar 2013 16:42:19 +0000 Selecting site:265 Stage in:3 Submitted:1 Active:42 Stage out:2 Finished successfully:4 Progress: time: Sun, 10 Mar 2013 16:42:20 +0000 Selecting site:258 Stage in:6 Submitting:5 Active:23 Stage out:13 Finished successfully:12 Progress: time: Sun, 10 Mar 2013 16:42:21 +0000 Selecting site:244 Stage in:24 Submitting:1 Active:20 Stage out:3 Finished successfully:25 Progress: time: Sun, 10 Mar 2013 16:42:23 +0000 Selecting site:241 Stage in:25 Submitting:3 Stage out:19 Finished successfully:29 Progress: time: Sun, 10 Mar 2013 16:42:24 +0000 Selecting site:221 Stage in:28 Submitting:19 Submitted:1 Finished successfully:48 Progress: time: Sun, 10 Mar 2013 16:42:45 +0000 Selecting site:221 Stage in:48 Finished successfully:48 Progress: time: Sun, 10 Mar 2013 16:42:54 +0000 Selecting site:221 Stage in:47 Active:1 Finished successfully:48 Progress: time: Sun, 10 Mar 2013 16:43:00 +0000 Selecting site:221 Stage in:47 Stage out:1 Finished successfully:48 Progress: time: Sun, 10 Mar 2013 16:43:02 +0000 Selecting site:221 Stage in:47 Finished successfully:49 Progress: time: Sun, 10 Mar 2013 16:43:05 +0000 Selecting site:220 Stage in:47 Submitted:1 Finished successfully:49 Progress: time: Sun, 10 Mar 2013 16:43:15 +0000 Selecting site:220 Stage in:48 Finished successfully:49 Progress: time: Sun, 10 Mar 2013 16:43:45 +0000 Selecting site:220 Stage in:48 Finished successfully:49 Channels: {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60519] -> BufferingChannel, null at id://u7b315f9a-13d552c3f68--7fff-u-362f30fc-13d552c3f50--7fffC=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], null at id://u-362f30fc-13d552c3f50--8000-u7b315f9a-13d552c3f68--8000S=MetaChannel[service-60519] -> BufferingChannel} Context: service-60859 Meta context: service-60519 Progress: time: Sun, 10 Mar 2013 16:43:59 +0000 Selecting site:220 Stage in:47 Active:1 Finished successfully:49 Channels: {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60519] -> BufferingChannel, null at id://u7b315f9a-13d552c3f68--7fff-u-362f30fc-13d552c3f50--7fffC=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], null at id://u-362f30fc-13d552c3f50--8000-u7b315f9a-13d552c3f68--8000S=MetaChannel[service-60519] -> BufferingChannel} Context: service-60663 Meta context: service-60519 Progress: time: Sun, 10 Mar 2013 16:44:05 +0000 Selecting site:220 Stage in:47 Stage out:1 Finished successfully:49 Progress: time: Sun, 10 Mar 2013 16:44:07 +0000 Selecting site:220 Stage in:47 Finished successfully:50 Channels: {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60519] -> BufferingChannel, null at id://u7b315f9a-13d552c3f68--7fff-u-362f30fc-13d552c3f50--7fffC=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], null at id://u-362f30fc-13d552c3f50--8000-u7b315f9a-13d552c3f68--8000S=MetaChannel[service-60519] -> BufferingChannel} Context: service-60081 Meta context: service-60519 Progress: time: Sun, 10 Mar 2013 16:44:09 +0000 Selecting site:219 Stage in:45 Submitting:1 Active:2 Finished successfully:50 Execution failed: Exception in getlanduse: Arguments: [home/wilde/osgdemo/modis/svn/data/modis/2002/h02v11.rgb] Host: beagle Directory: modis02-20130310-1639-kyb8hca9/jobs/9/getlanduse-90fyse6l Caused by: Shutting down worker getLandUse, modis02.swift, line 20 error null real 4m27.007s user 2m44.221s sys 0m3.448s + mv /home/wilde/.swift/runs/current/run034.1362933583 /home/wilde/.swift/runs/completed midway001$ ----- Original Message ----- > From: "Mihael Hategan" > To: "Michael Wilde" > Cc: "Swift Devel" > Sent: Sunday, March 10, 2013 1:36:26 AM > Subject: Re: [Swift-devel] Cant get auto-coasters to run from midway to beagle > > Please try now. I made some changes: > > 1. start the service with "-l" so that things in your .profile (such > as > module load sun-java) would be picked up. However, this also means > that > you should unset X509_* stuff or the sshcl proxy forwarding will not > work properly. > > 2. I fixed a bug that caused an extra connection to the coaster > service. > Normally the service connects back to the client and both use that > connection. However, due to some changes in the way credentials were > set > for jobs, and the fact that connections were looked up based on both > hostname and credential, the coaster client would ignore the existing > connection and create another one. The initial one with then time out > at > some point causing the service to crash. > > Mihael > > On Sat, 2013-03-09 at 17:49 -0600, Michael Wilde wrote: > > An update on this provider staging related issue: reducing filesize > > from 17MB to 600KB runs well. > > > > So seems like some kind of flow control or buffer management > > problem, possibly? > > > > May need to take that problem offline - would be a perfect test > > case for Yadu to develop a new stress test for. > > > > - Mike > > > > > > ----- Forwarded Message ----- > > From: "Michael Wilde" > > To: "David Kelly" > > Sent: Saturday, March 9, 2013 5:21:49 PM > > Subject: Re: runs for OSG talk > > > > OK, much better: with 600K files (5x5 reduction or 25X smaller) it > > works well, and fast (form midway to beagle!) > > > > Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified locally) > > > > RunID: 20130309-2319-5zq0jrfg > > Progress: time: Sat, 09 Mar 2013 23:19:45 +0000 > > Progress: time: Sat, 09 Mar 2013 23:19:56 +0000 Selecting > > site:269 Submitting:47 Submitted:1 > > Progress: time: Sat, 09 Mar 2013 23:20:05 +0000 Selecting > > site:269 Stage in:1 Submitted:47 > > Progress: time: Sat, 09 Mar 2013 23:20:09 +0000 Selecting > > site:269 Stage in:47 Active:1 > > Progress: time: Sat, 09 Mar 2013 23:20:10 +0000 Selecting > > site:269 Stage in:46 Active:1 Stage out:1 > > Progress: time: Sat, 09 Mar 2013 23:20:11 +0000 Selecting > > site:250 Stage in:19 Active:28 Stage out:1 Finished > > successfully:19 > > Progress: time: Sat, 09 Mar 2013 23:20:12 +0000 Selecting > > site:229 Stage in:18 Submitting:21 Active:1 Stage out:7 > > Finished successfully:41 > > Progress: time: Sat, 09 Mar 2013 23:20:13 +0000 Selecting > > site:220 Stage in:41 Submitting:1 Active:5 Stage out:1 > > Finished successfully:49 > > Progress: time: Sat, 09 Mar 2013 23:20:14 +0000 Selecting > > site:220 Stage in:38 Active:1 Stage out:9 Finished > > successfully:49 > > Progress: time: Sat, 09 Mar 2013 23:20:15 +0000 Selecting > > site:212 Stage in:30 Submitting:8 Stage out:9 Finished > > successfully:58 > > Progress: time: Sat, 09 Mar 2013 23:20:16 +0000 Selecting > > site:203 Stage in:38 Submitting:8 Submitted:1 Finished > > successfully:67 > > Progress: time: Sat, 09 Mar 2013 23:20:18 +0000 Selecting > > site:202 Stage in:19 Stage out:28 Finished successfully:68 > > Progress: time: Sat, 09 Mar 2013 23:20:19 +0000 Selecting > > site:172 Stage in:33 Submitting:2 Submitted:6 Active:5 Stage > > out:2 Finished successfully:97 > > Progress: time: Sat, 09 Mar 2013 23:20:20 +0000 Selecting > > site:170 Stage in:31 Submitting:2 Stage out:14 Finished > > successfully:100 > > Progress: time: Sat, 09 Mar 2013 23:20:21 +0000 Selecting > > site:162 Stage in:30 Submitting:10 Stage out:6 Finished > > successfully:109 > > Progress: time: Sat, 09 Mar 2013 23:20:22 +0000 Selecting > > site:154 Stage in:39 Submitting:5 Submitted:3 Active:1 > > Finished successfully:115 > > Progress: time: Sat, 09 Mar 2013 23:20:23 +0000 Selecting > > site:154 Stage in:21 Active:10 Stage out:16 Finished > > successfully:116 > > Progress: time: Sat, 09 Mar 2013 23:20:24 +0000 Selecting > > site:126 Stage in:20 Submitting:25 Submitted:1 Stage out:2 > > Finished successfully:143 > > Progress: time: Sat, 09 Mar 2013 23:20:25 +0000 Selecting > > site:124 Stage in:31 Active:2 Stage out:15 Finished > > successfully:145 > > Progress: time: Sat, 09 Mar 2013 23:20:26 +0000 Selecting > > site:110 Stage in:30 Submitting:14 Stage out:3 Finished > > successfully:160 > > Progress: time: Sat, 09 Mar 2013 23:20:27 +0000 Selecting > > site:106 Stage in:43 Submitting:1 Submitted:1 Active:1 Stage > > out:2 Finished successfully:163 > > Progress: time: Sat, 09 Mar 2013 23:20:28 +0000 Selecting > > site:104 Stage in:20 Submitting:2 Active:7 Stage out:19 > > Finished successfully:165 > > Progress: time: Sat, 09 Mar 2013 23:20:29 +0000 Selecting site:78 > > Stage in:29 Submitting:16 Submitted:1 Stage out:2 Finished > > successfully:191 > > Progress: time: Sat, 09 Mar 2013 23:20:31 +0000 Selecting site:76 > > Stage in:30 Stage out:17 Finished successfully:194 > > Progress: time: Sat, 09 Mar 2013 23:20:32 +0000 Selecting site:58 > > Stage in:29 Submitting:18 Active:1 Finished successfully:211 > > Progress: time: Sat, 09 Mar 2013 23:20:33 +0000 Selecting site:58 > > Stage in:33 Active:3 Stage out:12 Finished successfully:211 > > Progress: time: Sat, 09 Mar 2013 23:20:34 +0000 Selecting site:46 > > Stage in:18 Submitting:11 Submitted:1 Active:2 Stage out:14 > > Finished successfully:225 > > Progress: time: Sat, 09 Mar 2013 23:20:35 +0000 Selecting site:30 > > Stage in:29 Active:14 Stage out:3 Finished successfully:241 > > Progress: time: Sat, 09 Mar 2013 23:20:36 +0000 Selecting site:28 > > Stage in:28 Submitting:2 Stage out:17 Finished > > successfully:242 > > Progress: time: Sat, 09 Mar 2013 23:20:37 +0000 Selecting site:10 > > Stage in:30 Submitting:17 Submitted:1 Finished > > successfully:259 > > Progress: time: Sat, 09 Mar 2013 23:20:38 +0000 Selecting site:10 > > Stage in:35 Stage out:13 Finished successfully:259 > > Progress: time: Sat, 09 Mar 2013 23:20:39 +0000 Stage in:21 > > Submitting:6 Submitted:3 Stage out:15 Finished > > successfully:272 > > Progress: time: Sat, 09 Mar 2013 23:20:40 +0000 Stage in:10 > > Active:5 Stage out:14 Finished successfully:288 > > Final status: Sat, 09 Mar 2013 23:20:41 +0000 Finished > > successfully:317 > > > > real 0m58.953s > > user 0m32.573s > > sys 0m1.263s > > + mv /home/wilde/.swift/runs/current/run029.1362871183 > > /home/wilde/.swift/runs/completed > > midway001$ > > > > > > > > ----- Original Message ----- > > > From: "David Kelly" > > > To: "Michael Wilde" > > > Sent: Saturday, March 9, 2013 5:12:59 PM > > > Subject: Re: runs for OSG talk > > > > > > > > > Yep - I had a version where the input files were in a very > > > similar > > > format (PGM, 1 byte per pixel). I'll add that back, but without > > > the > > > small PGM header in the files. > > > > > > ----- Original Message ----- > > > > > > > > > From: "Michael Wilde" > > > To: "David Kelly" > > > Sent: Saturday, March 9, 2013 5:04:43 PM > > > Subject: Re: runs for OSG talk > > > > > > I think we need to cut down the size of these files for a demo > > > (although they are great for a stress test). > > > > > > First, the RGB format by itself uses 3 bytes per pixel when it > > > only > > > needs one (for land use) > > > > > > Second, we should cut down by a factor of 9 (3x3) or 16 (4x4). > > > > > > I tried that using simple convert statements, but it always seems > > > to > > > yield a file exactly double what it should be. > > > > > > More on this later; was hoping to get things working "as is" > > > first. > > > > > > I assume you could get the perl code to work on > > > one-byte-per-pixel > > > instead of the default 3 for the convert rgb format? > > > > > > - Mike > > > > > > ----- Original Message ----- > > > > From: "David Kelly" > > > > To: "Michael Wilde" > > > > Sent: Saturday, March 9, 2013 4:36:30 PM > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > That would probably be a good idea for a new script, to show > > > > how to > > > > stage apps like that. For now I updated the scripts on lustre.. > > > > hopefully that helps. > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > From: "Michael Wilde" > > > > To: "David Kelly" > > > > Sent: Saturday, March 9, 2013 4:29:14 PM > > > > Subject: Re: runs for OSG talk > > > > > > > > OK, I see that its trying to run getlanduse.sh from your > > > > /lustre > > > > dir > > > > on beagle, which is different than the one Ive got checked out. > > > > It > > > > seems to get an error in a stderr redirect??? Let me se what I > > > > need > > > > to do to get the beagle side in sync. > > > > > > > > Seems like since these are perl scripts, we should make the > > > > app() > > > > /bin/sh and send the script as data, perhaps? > > > > > > > > - Mike > > > > > > > > ----- Original Message ----- > > > > > From: "Michael Wilde" > > > > > To: "David Kelly" > > > > > Sent: Saturday, March 9, 2013 4:19:31 PM > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > OK, making progress. Now I dialed down the throttle and node > > > > > counts > > > > > to 48 jobs. > > > > > > > > > > Now I get further, for ./demo and site=4 script=2: > > > > > > > > > > RunID: 20130309-2214-1oi3rvea > > > > > Progress: time: Sat, 09 Mar 2013 22:14:06 +0000 > > > > > Progress: time: Sat, 09 Mar 2013 22:14:17 +0000 Selecting > > > > > site:269 > > > > > Submitting:47 Submitted:1 > > > > > Progress: time: Sat, 09 Mar 2013 22:14:22 +0000 Selecting > > > > > site:269 > > > > > Stage in:1 Submitted:47 > > > > > Progress: time: Sat, 09 Mar 2013 22:14:28 +0000 Selecting > > > > > site:269 > > > > > Stage in:25 Submitted:23 > > > > > Progress: time: Sat, 09 Mar 2013 22:14:36 +0000 Selecting > > > > > site:269 > > > > > Stage in:48 > > > > > Progress: time: Sat, 09 Mar 2013 22:15:06 +0000 Selecting > > > > > site:269 > > > > > Stage in:48 > > > > > Progress: time: Sat, 09 Mar 2013 22:15:36 +0000 Selecting > > > > > site:269 > > > > > Stage in:48 > > > > > Progress: time: Sat, 09 Mar 2013 22:16:06 +0000 Selecting > > > > > site:269 > > > > > Stage in:48 > > > > > Progress: time: Sat, 09 Mar 2013 22:16:26 +0000 Selecting > > > > > site:269 > > > > > Stage in:47 Active:1 > > > > > Progress: time: Sat, 09 Mar 2013 22:16:27 +0000 Selecting > > > > > site:269 > > > > > Stage in:36 Active:12 > > > > > Progress: time: Sat, 09 Mar 2013 22:16:29 +0000 Selecting > > > > > site:269 > > > > > Stage in:24 Active:24 > > > > > Progress: time: Sat, 09 Mar 2013 22:16:34 +0000 Selecting > > > > > site:269 > > > > > Stage in:24 Active:23 Stage out:1 > > > > > Progress: time: Sat, 09 Mar 2013 22:16:35 +0000 Selecting > > > > > site:269 > > > > > Stage in:14 Active:33 Stage out:1 > > > > > Execution failed: > > > > > Exception in getlanduse: > > > > > Arguments: > > > > > [home/wilde/osgdemo/modis/svn/data/modis/2002/h08v04.rgb] > > > > > Host: beagle > > > > > Directory: > > > > > modis02-20130309-2214-1oi3rvea/jobs/k/getlanduse-ko5qjd6l > > > > > > > > > > Caused by: > > > > > Application /lustre/beagle/davidk/modis/bin/getlanduse.sh > > > > > failed > > > > > with an exit code of 1 > > > > > getLandUse, modis02.swift, line 20 > > > > > > > > > > real 2m31.463s > > > > > user 1m33.238s > > > > > sys 0m2.160s > > > > > + mv /home/wilde/.swift/runs/current/run024.1362867244 > > > > > /home/wilde/.swift/runs/completed > > > > > midway001$ > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > From: "David Kelly" > > > > > > To: "Michael Wilde" > > > > > > Sent: Saturday, March 9, 2013 3:55:30 PM > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > ok, I'll take a look at that. The run dir I used was > > > > > > /scratch/midway/davidkelly999/modis/run011 > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > To: "David Kelly" > > > > > > Sent: Saturday, March 9, 2013 3:52:28 PM > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > I just tried this, but didnt work - same prob. > > > > > > > > > > > > But if its working for you now, we must be close. > > > > > > > > > > > > Not yet sure what the diff is... > > > > > > > > > > > > My run dir is /home/wilde/osgdemo/modis/svn/run021 > > > > > > > > > > > > - Mike > > > > > > > > > > > > ----- Original Message ----- > > > > > > > From: "David Kelly" > > > > > > > To: "Michael Wilde" > > > > > > > Sent: Saturday, March 9, 2013 3:46:13 PM > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > Had to make sure I was using the IP address on eth4 > > > > > > > (128.135.112.71 > > > > > > > for midway-login1), not a local address or an infiniband > > > > > > > address. > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > From: "David Kelly" > > > > > > > To: "Michael Wilde" > > > > > > > Sent: Saturday, March 9, 2013 3:43:51 PM > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > I just got it working. I had to adjust for the > > > > > > > differences in > > > > > > > my > > > > > > > username on Beagle/Midway, then I had to set > > > > > > > GLOBUS_HOSTNAME > > > > > > > on > > > > > > > Midway to the IP address, rather than the full hostname > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > To: "David Kelly" > > > > > > > Sent: Saturday, March 9, 2013 3:40:03 PM > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > From: "David Kelly" > > > > > > > > To: "Michael Wilde" > > > > > > > > Sent: Saturday, March 9, 2013 3:34:58 PM > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > Is your username the same on beagle and midway? > > > > > > > > > > > > > > Yes. And I verified that I can ssh to login4 on beagle > > > > > > > from > > > > > > > my > > > > > > > midway > > > > > > > session (as indeed the scp's of the proxy files seem to > > > > > > > be > > > > > > > working) > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > To: "David Kelly" > > > > > > > > Sent: Saturday, March 9, 2013 3:34:28 PM > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > OK. > > > > > > > > > > > > > > > > Ignore what I said about "problem finding java" - thats > > > > > > > > code > > > > > > > > in > > > > > > > > the > > > > > > > > very long escaped shell command that gets sent to the > > > > > > > > remote > > > > > > > > side. > > > > > > > > I > > > > > > > > dont *think* thats the problem. > > > > > > > > > > > > > > > > I also verified that beagle can connect to ports 50001 > > > > > > > > etc > > > > > > > > on > > > > > > > > swift.rcc, and that seems OK. > > > > > > > > > > > > > > > > I exported GLOBUS_HOSTNAME=midway001.rcc.uchicago.edu > > > > > > > > on > > > > > > > > the > > > > > > > > midway > > > > > > > > side. And the beagle side seems to be connecting there. > > > > > > > > > > > > > > > > Im a bit confused about the timestamps I see for the > > > > > > > > proxy > > > > > > > > expiration > > > > > > > > time, but am not yet suspicious of that (although it > > > > > > > > seems > > > > > > > > less > > > > > > > > than > > > > > > > > 5 hours past GMT... not sure.) > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > From: "David Kelly" > > > > > > > > > To: "Michael Wilde" > > > > > > > > > Sent: Saturday, March 9, 2013 3:26:32 PM > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > I'm seeing the same error now.. looking into it > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > To: "David Kelly" > > > > > > > > > Sent: Saturday, March 9, 2013 3:21:30 PM > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > Looking deeper I see that the logs show problems with > > > > > > > > > finding > > > > > > > > > Java, > > > > > > > > > I > > > > > > > > > assume on beagle, ans also service ending (presumably > > > > > > > > > coaster > > > > > > > > > service on midway host). > > > > > > > > > > > > > > > > > > I'll dig into these two. > > > > > > > > > > > > > > > > > > I see that it scp's the proxies to beagle which I > > > > > > > > > think > > > > > > > > > answers > > > > > > > > > my > > > > > > > > > question about security. > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > > To: "David Kelly" > > > > > > > > > > Sent: Saturday, March 9, 2013 3:15:01 PM > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > OK. Any thoughts about beagle? > > > > > > > > > > > > > > > > > > > > Ive been experimenting but still cant get it to > > > > > > > > > > work, > > > > > > > > > > same > > > > > > > > > > error > > > > > > > > > > (cant connect to bootstrap port) > > > > > > > > > > > > > > > > > > > > WHen you tried ssh-cl to beagle with automatic > > > > > > > > > > coasters, > > > > > > > > > > what > > > > > > > > > > configuration (sites env etc) did you use? > > > > > > > > > > > > > > > > > > > > I verified that beagle can connect back to the > > > > > > > > > > midway > > > > > > > > > > hosts > > > > > > > > > > and > > > > > > > > > > ports. > > > > > > > > > > > > > > > > > > > > Do we need to specify security or create a proxy > > > > > > > > > > etc? > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > From: "David Kelly" > > > > > > > > > > > To: "Michael Wilde" > > > > > > > > > > > Sent: Saturday, March 9, 2013 3:08:58 PM > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > One way you can override/customize the default > > > > > > > > > > > templates > > > > > > > > > > > is > > > > > > > > > > > to > > > > > > > > > > > create > > > > > > > > > > > them in $HOME/.swift/sites (I'm not sure if > > > > > > > > > > > that's > > > > > > > > > > > what > > > > > > > > > > > you > > > > > > > > > > > mean > > > > > > > > > > > by > > > > > > > > > > > a local sites dir or not). But you are right > > > > > > > > > > > about > > > > > > > > > > > Midway > > > > > > > > > > > - > > > > > > > > > > > I > > > > > > > > > > > have > > > > > > > > > > > noticed that when using modis it will sometimes > > > > > > > > > > > get > > > > > > > > > > > stuck > > > > > > > > > > > when > > > > > > > > > > > it > > > > > > > > > > > goes to a queue that is busy. Ideally swift > > > > > > > > > > > replication > > > > > > > > > > > would > > > > > > > > > > > be > > > > > > > > > > > able to help better handle that, but I haven't > > > > > > > > > > > had > > > > > > > > > > > much > > > > > > > > > > > luck > > > > > > > > > > > with > > > > > > > > > > > that yet. Another way around this may be to add > > > > > > > > > > > this > > > > > > > > > > > to > > > > > > > > > > > the > > > > > > > > > > > template: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > key="slurm.exclusive">false > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The swift.log issue was never fixed. It went to > > > > > > > > > > > swift-devel > > > > > > > > > > > for > > > > > > > > > > > discussion but was never fixed. I think it is > > > > > > > > > > > relatively > > > > > > > > > > > simple > > > > > > > > > > > though.. probably worth fixing before release. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > > > To: "David Kelly" > > > > > > > > > > > Sent: Saturday, March 9, 2013 1:38:47 PM > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > OK, sounds good re the trip plan. Feel free to > > > > > > > > > > > stay > > > > > > > > > > > Tue > > > > > > > > > > > night > > > > > > > > > > > to > > > > > > > > > > > avoid a 4hr drive after a long day. > > > > > > > > > > > > > > > > > > > > > > Im trying the modis demo. > > > > > > > > > > > > > > > > > > > > > > I tried to create a local sites/ dir so I can > > > > > > > > > > > modify > > > > > > > > > > > the > > > > > > > > > > > sites > > > > > > > > > > > templates; thats not working for me either yet. > > > > > > > > > > > > > > > > > > > > > > For midway, need to force to westmere or sandyb > > > > > > > > > > > (but > > > > > > > > > > > not > > > > > > > > > > > both) > > > > > > > > > > > and > > > > > > > > > > > ensure 1-node jobs, because either queue can get > > > > > > > > > > > filled > > > > > > > > > > > and > > > > > > > > > > > not > > > > > > > > > > > yield an idle node for a long time. maybe need to > > > > > > > > > > > fiddle > > > > > > > > > > > jobsPerNode > > > > > > > > > > > to get at least 1 core when the system is busy > > > > > > > > > > > and > > > > > > > > > > > *pretend* > > > > > > > > > > > that > > > > > > > > > > > its a node. > > > > > > > > > > > > > > > > > > > > > > So to get response I tried beagle-ssh; That isnt > > > > > > > > > > > working > > > > > > > > > > > because > > > > > > > > > > > the > > > > > > > > > > > template sites file is wrong in swift 0.94 rc4. > > > > > > > > > > > > > > > > > > > > > > I also see that swift.log is still getting > > > > > > > > > > > produced - > > > > > > > > > > > I > > > > > > > > > > > thought > > > > > > > > > > > we > > > > > > > > > > > eliminated that. Did it come back due to a > > > > > > > > > > > problem > > > > > > > > > > > with > > > > > > > > > > > that > > > > > > > > > > > fix? > > > > > > > > > > > > > > > > > > > > > > I'll keep hacking; suggestions welcome. > > > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > From: "David Kelly" > > > > > > > > > > > > To: "Michael Wilde" > > > > > > > > > > > > Sent: Saturday, March 9, 2013 12:20:00 PM > > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Mike, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Looking more closely at the agenda, I think the > > > > > > > > > > > > most > > > > > > > > > > > > interesting/useful talks will be on Tuesday. > > > > > > > > > > > > Monday > > > > > > > > > > > > I'll > > > > > > > > > > > > come > > > > > > > > > > > > to > > > > > > > > > > > > Argonne to work on any loose ends and put the > > > > > > > > > > > > finishing > > > > > > > > > > > > touches > > > > > > > > > > > > on > > > > > > > > > > > > any slides/runs/scripts, then drive to > > > > > > > > > > > > Indianapolis > > > > > > > > > > > > on > > > > > > > > > > > > Monday > > > > > > > > > > > > afternoon/evening. I have a hotel booked for > > > > > > > > > > > > Monday > > > > > > > > > > > > night. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'll do some runs using the routes we talked > > > > > > > > > > > > about. > > > > > > > > > > > > I'm > > > > > > > > > > > > pretty > > > > > > > > > > > > sure > > > > > > > > > > > > I > > > > > > > > > > > > have working configurations for everything we > > > > > > > > > > > > talked > > > > > > > > > > > > about, > > > > > > > > > > > > so > > > > > > > > > > > > I > > > > > > > > > > > > think it's really just a matter of plugging in > > > > > > > > > > > > the > > > > > > > > > > > > apps. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > David > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > > > > To: "David Kelly" > > > > > > > > > > > > Sent: Saturday, March 9, 2013 11:03:15 AM > > > > > > > > > > > > Subject: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > Hi David, > > > > > > > > > > > > > > > > > > > > > > > > I just wanted to let you know that Im looking > > > > > > > > > > > > into > > > > > > > > > > > > the > > > > > > > > > > > > run > > > > > > > > > > > > options > > > > > > > > > > > > now. Im hoping to try a few... WIll see how > > > > > > > > > > > > much > > > > > > > > > > > > help > > > > > > > > > > > > I > > > > > > > > > > > > need. > > > > > > > > > > > > Have > > > > > > > > > > > > you decided on a driving time and made hotel > > > > > > > > > > > > arrangements? > > > > > > > > > > > > > > > > > > > > > > > > I would feel free to stay for whatever portion > > > > > > > > > > > > of > > > > > > > > > > > > the > > > > > > > > > > > > OSG > > > > > > > > > > > > meeting > > > > > > > > > > > > you > > > > > > > > > > > > feel is of value. The only thing I ask is that > > > > > > > > > > > > for > > > > > > > > > > > > Wed > > > > > > > > > > > > and > > > > > > > > > > > > Thu > > > > > > > > > > > > you > > > > > > > > > > > > stay available online for user-support or other > > > > > > > > > > > > assistance > > > > > > > > > > > > needs > > > > > > > > > > > > that come up here. And that you engage with > > > > > > > > > > > > people > > > > > > > > > > > > that > > > > > > > > > > > > can > > > > > > > > > > > > help > > > > > > > > > > > > us > > > > > > > > > > > > develop the Swift user community and reliable > > > > > > > > > > > > OSG > > > > > > > > > > > > usage. > > > > > > > > > > > > Rob, > > > > > > > > > > > > Marco, > > > > > > > > > > > > Lincoln, and Suchandra would be good to hang > > > > > > > > > > > > out > > > > > > > > > > > > with > > > > > > > > > > > > and > > > > > > > > > > > > they > > > > > > > > > > > > can > > > > > > > > > > > > introduce you to good contacts. > > > > > > > > > > > > > > > > > > > > > > > > Of course we will cover your expenses via a > > > > > > > > > > > > UChicago > > > > > > > > > > > > travel > > > > > > > > > > > > expense > > > > > > > > > > > > report. > > > > > > > > > > > > > > > > > > > > > > > > We'll be starting a project with a tiny bit of > > > > > > > > > > > > additional > > > > > > > > > > > > ExTENCI > > > > > > > > > > > > funds to make Swift do smarter data management > > > > > > > > > > > > on > > > > > > > > > > > > OSG > > > > > > > > > > > > sites > > > > > > > > > > > > (and > > > > > > > > > > > > in > > > > > > > > > > > > general) so anything you learn about OSG > > > > > > > > > > > > storage > > > > > > > > > > > > elements/services/tools will be valuable for > > > > > > > > > > > > that > > > > > > > > > > > > (srmcp, > > > > > > > > > > > > lcgcp, > > > > > > > > > > > > etc). > > > > > > > > > > > > > > > > > > > > > > > > Between now and your talk, lets just focus on > > > > > > > > > > > > the > > > > > > > > > > > > talk, > > > > > > > > > > > > OK? > > > > > > > > > > > > Im > > > > > > > > > > > > hoping > > > > > > > > > > > > we have slides frozen by Monday. > > > > > > > > > > > > > > > > > > > > > > > > While I fiddle, if you could do catsn or other > > > > > > > > > > > > hello-world-like > > > > > > > > > > > > tests > > > > > > > > > > > > to cover the "routes" we discussed, that would > > > > > > > > > > > > pave > > > > > > > > > > > > the > > > > > > > > > > > > way > > > > > > > > > > > > for > > > > > > > > > > > > plugging in the real app examples. > > > > > > > > > > > > > > > > > > > > > > > > Sound good? Let me know of any concerns (other > > > > > > > > > > > > than > > > > > > > > > > > > the > > > > > > > > > > > > fact > > > > > > > > > > > > that > > > > > > > > > > > > this is a tad rushed ;) > > > > > > > > > > > > > > > > > > > > > > > > Thanks and regards, > > > > > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > Michael Wilde > > > > > > > > > > > > Computation Institute, University of Chicago > > > > > > > > > > > > Mathematics and Computer Science Division > > > > > > > > > > > > Argonne National Laboratory > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > From hategan at mcs.anl.gov Sun Mar 10 15:06:25 2013 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 10 Mar 2013 13:06:25 -0700 Subject: [Swift-devel] Cant get auto-coasters to run from midway to beagle In-Reply-To: <52673064.1276305.1362934913928.JavaMail.root@mcs.anl.gov> References: <52673064.1276305.1362934913928.JavaMail.root@mcs.anl.gov> Message-ID: <1362945985.32419.5.camel@echo> ChannelContext Notifying commands and handlers about exception org.globus.cog.karajan.workflow.service.TimeoutException: Channel timed out. lastTime=940817-071255.807, now=130310-164156.506, channel=GSSSChannel-1463847073(1)[service-60519] Are you sure you are running with the latest code? . There was a (inconsequential mostly) bug before that set lastTime to Long.MAX_TIME before creating that exception. That was fixed. Your message indicates the code you are using does not have that fix (year xx94 is what comes out of Long.MAX_TIME). I gotta go now, but I'll come back later and check some more. There is something weird going on there besides that. Mihael On Sun, 2013-03-10 at 12:01 -0500, Michael Wilde wrote: > Here's run034: seems to be a bit better, but still dies. This is with throttle of 48 jobs on 48 cores (2 nodes), fom swift.rcc to beagle. 17MB files. Still seems to curiously die about 4 mins into the run, which suggests some kind of timeout is still lurking??? > > - Mike > > Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified locally) > > RunID: 20130310-1639-kyb8hca9 > Progress: time: Sun, 10 Mar 2013 16:39:45 +0000 > Progress: time: Sun, 10 Mar 2013 16:39:56 +0000 Selecting site:269 Submitting:47 Submitted:1 > Progress: time: Sun, 10 Mar 2013 16:40:01 +0000 Selecting site:269 Stage in:1 Submitted:47 > Progress: time: Sun, 10 Mar 2013 16:40:15 +0000 Selecting site:269 Stage in:48 > Progress: time: Sun, 10 Mar 2013 16:40:45 +0000 Selecting site:269 Stage in:48 > Progress: time: Sun, 10 Mar 2013 16:41:15 +0000 Selecting site:269 Stage in:48 > Progress: time: Sun, 10 Mar 2013 16:41:45 +0000 Selecting site:269 Stage in:48 > Progress: time: Sun, 10 Mar 2013 16:42:11 +0000 Selecting site:269 Stage in:47 Active:1 > Progress: time: Sun, 10 Mar 2013 16:42:12 +0000 Selecting site:269 Stage in:41 Active:7 > Progress: time: Sun, 10 Mar 2013 16:42:13 +0000 Selecting site:269 Stage in:23 Active:25 > Progress: time: Sun, 10 Mar 2013 16:42:15 +0000 Selecting site:269 Active:48 > Progress: time: Sun, 10 Mar 2013 16:42:17 +0000 Selecting site:269 Active:47 Stage out:1 > Progress: time: Sun, 10 Mar 2013 16:42:18 +0000 Selecting site:268 Stage in:1 Active:46 Stage out:1 Finished successfully:1 > Progress: time: Sun, 10 Mar 2013 16:42:19 +0000 Selecting site:265 Stage in:3 Submitted:1 Active:42 Stage out:2 Finished successfully:4 > Progress: time: Sun, 10 Mar 2013 16:42:20 +0000 Selecting site:258 Stage in:6 Submitting:5 Active:23 Stage out:13 Finished successfully:12 > Progress: time: Sun, 10 Mar 2013 16:42:21 +0000 Selecting site:244 Stage in:24 Submitting:1 Active:20 Stage out:3 Finished successfully:25 > Progress: time: Sun, 10 Mar 2013 16:42:23 +0000 Selecting site:241 Stage in:25 Submitting:3 Stage out:19 Finished successfully:29 > Progress: time: Sun, 10 Mar 2013 16:42:24 +0000 Selecting site:221 Stage in:28 Submitting:19 Submitted:1 Finished successfully:48 > Progress: time: Sun, 10 Mar 2013 16:42:45 +0000 Selecting site:221 Stage in:48 Finished successfully:48 > Progress: time: Sun, 10 Mar 2013 16:42:54 +0000 Selecting site:221 Stage in:47 Active:1 Finished successfully:48 > Progress: time: Sun, 10 Mar 2013 16:43:00 +0000 Selecting site:221 Stage in:47 Stage out:1 Finished successfully:48 > Progress: time: Sun, 10 Mar 2013 16:43:02 +0000 Selecting site:221 Stage in:47 Finished successfully:49 > Progress: time: Sun, 10 Mar 2013 16:43:05 +0000 Selecting site:220 Stage in:47 Submitted:1 Finished successfully:49 > Progress: time: Sun, 10 Mar 2013 16:43:15 +0000 Selecting site:220 Stage in:48 Finished successfully:49 > Progress: time: Sun, 10 Mar 2013 16:43:45 +0000 Selecting site:220 Stage in:48 Finished successfully:49 > Channels: {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60519] -> BufferingChannel, null at id://u7b315f9a-13d552c3f68--7fff-u-362f30fc-13d552c3f50--7fffC=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], null at id://u-362f30fc-13d552c3f50--8000-u7b315f9a-13d552c3f68--8000S=MetaChannel[service-60519] -> BufferingChannel} > Context: service-60859 > Meta context: service-60519 > Progress: time: Sun, 10 Mar 2013 16:43:59 +0000 Selecting site:220 Stage in:47 Active:1 Finished successfully:49 > Channels: {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60519] -> BufferingChannel, null at id://u7b315f9a-13d552c3f68--7fff-u-362f30fc-13d552c3f50--7fffC=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], null at id://u-362f30fc-13d552c3f50--8000-u7b315f9a-13d552c3f68--8000S=MetaChannel[service-60519] -> BufferingChannel} > Context: service-60663 > Meta context: service-60519 > Progress: time: Sun, 10 Mar 2013 16:44:05 +0000 Selecting site:220 Stage in:47 Stage out:1 Finished successfully:49 > Progress: time: Sun, 10 Mar 2013 16:44:07 +0000 Selecting site:220 Stage in:47 Finished successfully:50 > Channels: {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60519] -> BufferingChannel, null at id://u7b315f9a-13d552c3f68--7fff-u-362f30fc-13d552c3f50--7fffC=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], null at id://u-362f30fc-13d552c3f50--8000-u7b315f9a-13d552c3f68--8000S=MetaChannel[service-60519] -> BufferingChannel} > Context: service-60081 > Meta context: service-60519 > Progress: time: Sun, 10 Mar 2013 16:44:09 +0000 Selecting site:219 Stage in:45 Submitting:1 Active:2 Finished successfully:50 > Execution failed: > Exception in getlanduse: > Arguments: [home/wilde/osgdemo/modis/svn/data/modis/2002/h02v11.rgb] > Host: beagle > Directory: modis02-20130310-1639-kyb8hca9/jobs/9/getlanduse-90fyse6l > > Caused by: > Shutting down worker > getLandUse, modis02.swift, line 20 > error null > > real 4m27.007s > user 2m44.221s > sys 0m3.448s > + mv /home/wilde/.swift/runs/current/run034.1362933583 /home/wilde/.swift/runs/completed > midway001$ > > > ----- Original Message ----- > > From: "Mihael Hategan" > > To: "Michael Wilde" > > Cc: "Swift Devel" > > Sent: Sunday, March 10, 2013 1:36:26 AM > > Subject: Re: [Swift-devel] Cant get auto-coasters to run from midway to beagle > > > > Please try now. I made some changes: > > > > 1. start the service with "-l" so that things in your .profile (such > > as > > module load sun-java) would be picked up. However, this also means > > that > > you should unset X509_* stuff or the sshcl proxy forwarding will not > > work properly. > > > > 2. I fixed a bug that caused an extra connection to the coaster > > service. > > Normally the service connects back to the client and both use that > > connection. However, due to some changes in the way credentials were > > set > > for jobs, and the fact that connections were looked up based on both > > hostname and credential, the coaster client would ignore the existing > > connection and create another one. The initial one with then time out > > at > > some point causing the service to crash. > > > > Mihael > > > > On Sat, 2013-03-09 at 17:49 -0600, Michael Wilde wrote: > > > An update on this provider staging related issue: reducing filesize > > > from 17MB to 600KB runs well. > > > > > > So seems like some kind of flow control or buffer management > > > problem, possibly? > > > > > > May need to take that problem offline - would be a perfect test > > > case for Yadu to develop a new stress test for. > > > > > > - Mike > > > > > > > > > ----- Forwarded Message ----- > > > From: "Michael Wilde" > > > To: "David Kelly" > > > Sent: Saturday, March 9, 2013 5:21:49 PM > > > Subject: Re: runs for OSG talk > > > > > > OK, much better: with 600K files (5x5 reduction or 25X smaller) it > > > works well, and fast (form midway to beagle!) > > > > > > Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified locally) > > > > > > RunID: 20130309-2319-5zq0jrfg > > > Progress: time: Sat, 09 Mar 2013 23:19:45 +0000 > > > Progress: time: Sat, 09 Mar 2013 23:19:56 +0000 Selecting > > > site:269 Submitting:47 Submitted:1 > > > Progress: time: Sat, 09 Mar 2013 23:20:05 +0000 Selecting > > > site:269 Stage in:1 Submitted:47 > > > Progress: time: Sat, 09 Mar 2013 23:20:09 +0000 Selecting > > > site:269 Stage in:47 Active:1 > > > Progress: time: Sat, 09 Mar 2013 23:20:10 +0000 Selecting > > > site:269 Stage in:46 Active:1 Stage out:1 > > > Progress: time: Sat, 09 Mar 2013 23:20:11 +0000 Selecting > > > site:250 Stage in:19 Active:28 Stage out:1 Finished > > > successfully:19 > > > Progress: time: Sat, 09 Mar 2013 23:20:12 +0000 Selecting > > > site:229 Stage in:18 Submitting:21 Active:1 Stage out:7 > > > Finished successfully:41 > > > Progress: time: Sat, 09 Mar 2013 23:20:13 +0000 Selecting > > > site:220 Stage in:41 Submitting:1 Active:5 Stage out:1 > > > Finished successfully:49 > > > Progress: time: Sat, 09 Mar 2013 23:20:14 +0000 Selecting > > > site:220 Stage in:38 Active:1 Stage out:9 Finished > > > successfully:49 > > > Progress: time: Sat, 09 Mar 2013 23:20:15 +0000 Selecting > > > site:212 Stage in:30 Submitting:8 Stage out:9 Finished > > > successfully:58 > > > Progress: time: Sat, 09 Mar 2013 23:20:16 +0000 Selecting > > > site:203 Stage in:38 Submitting:8 Submitted:1 Finished > > > successfully:67 > > > Progress: time: Sat, 09 Mar 2013 23:20:18 +0000 Selecting > > > site:202 Stage in:19 Stage out:28 Finished successfully:68 > > > Progress: time: Sat, 09 Mar 2013 23:20:19 +0000 Selecting > > > site:172 Stage in:33 Submitting:2 Submitted:6 Active:5 Stage > > > out:2 Finished successfully:97 > > > Progress: time: Sat, 09 Mar 2013 23:20:20 +0000 Selecting > > > site:170 Stage in:31 Submitting:2 Stage out:14 Finished > > > successfully:100 > > > Progress: time: Sat, 09 Mar 2013 23:20:21 +0000 Selecting > > > site:162 Stage in:30 Submitting:10 Stage out:6 Finished > > > successfully:109 > > > Progress: time: Sat, 09 Mar 2013 23:20:22 +0000 Selecting > > > site:154 Stage in:39 Submitting:5 Submitted:3 Active:1 > > > Finished successfully:115 > > > Progress: time: Sat, 09 Mar 2013 23:20:23 +0000 Selecting > > > site:154 Stage in:21 Active:10 Stage out:16 Finished > > > successfully:116 > > > Progress: time: Sat, 09 Mar 2013 23:20:24 +0000 Selecting > > > site:126 Stage in:20 Submitting:25 Submitted:1 Stage out:2 > > > Finished successfully:143 > > > Progress: time: Sat, 09 Mar 2013 23:20:25 +0000 Selecting > > > site:124 Stage in:31 Active:2 Stage out:15 Finished > > > successfully:145 > > > Progress: time: Sat, 09 Mar 2013 23:20:26 +0000 Selecting > > > site:110 Stage in:30 Submitting:14 Stage out:3 Finished > > > successfully:160 > > > Progress: time: Sat, 09 Mar 2013 23:20:27 +0000 Selecting > > > site:106 Stage in:43 Submitting:1 Submitted:1 Active:1 Stage > > > out:2 Finished successfully:163 > > > Progress: time: Sat, 09 Mar 2013 23:20:28 +0000 Selecting > > > site:104 Stage in:20 Submitting:2 Active:7 Stage out:19 > > > Finished successfully:165 > > > Progress: time: Sat, 09 Mar 2013 23:20:29 +0000 Selecting site:78 > > > Stage in:29 Submitting:16 Submitted:1 Stage out:2 Finished > > > successfully:191 > > > Progress: time: Sat, 09 Mar 2013 23:20:31 +0000 Selecting site:76 > > > Stage in:30 Stage out:17 Finished successfully:194 > > > Progress: time: Sat, 09 Mar 2013 23:20:32 +0000 Selecting site:58 > > > Stage in:29 Submitting:18 Active:1 Finished successfully:211 > > > Progress: time: Sat, 09 Mar 2013 23:20:33 +0000 Selecting site:58 > > > Stage in:33 Active:3 Stage out:12 Finished successfully:211 > > > Progress: time: Sat, 09 Mar 2013 23:20:34 +0000 Selecting site:46 > > > Stage in:18 Submitting:11 Submitted:1 Active:2 Stage out:14 > > > Finished successfully:225 > > > Progress: time: Sat, 09 Mar 2013 23:20:35 +0000 Selecting site:30 > > > Stage in:29 Active:14 Stage out:3 Finished successfully:241 > > > Progress: time: Sat, 09 Mar 2013 23:20:36 +0000 Selecting site:28 > > > Stage in:28 Submitting:2 Stage out:17 Finished > > > successfully:242 > > > Progress: time: Sat, 09 Mar 2013 23:20:37 +0000 Selecting site:10 > > > Stage in:30 Submitting:17 Submitted:1 Finished > > > successfully:259 > > > Progress: time: Sat, 09 Mar 2013 23:20:38 +0000 Selecting site:10 > > > Stage in:35 Stage out:13 Finished successfully:259 > > > Progress: time: Sat, 09 Mar 2013 23:20:39 +0000 Stage in:21 > > > Submitting:6 Submitted:3 Stage out:15 Finished > > > successfully:272 > > > Progress: time: Sat, 09 Mar 2013 23:20:40 +0000 Stage in:10 > > > Active:5 Stage out:14 Finished successfully:288 > > > Final status: Sat, 09 Mar 2013 23:20:41 +0000 Finished > > > successfully:317 > > > > > > real 0m58.953s > > > user 0m32.573s > > > sys 0m1.263s > > > + mv /home/wilde/.swift/runs/current/run029.1362871183 > > > /home/wilde/.swift/runs/completed > > > midway001$ > > > > > > > > > > > > ----- Original Message ----- > > > > From: "David Kelly" > > > > To: "Michael Wilde" > > > > Sent: Saturday, March 9, 2013 5:12:59 PM > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > Yep - I had a version where the input files were in a very > > > > similar > > > > format (PGM, 1 byte per pixel). I'll add that back, but without > > > > the > > > > small PGM header in the files. > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > From: "Michael Wilde" > > > > To: "David Kelly" > > > > Sent: Saturday, March 9, 2013 5:04:43 PM > > > > Subject: Re: runs for OSG talk > > > > > > > > I think we need to cut down the size of these files for a demo > > > > (although they are great for a stress test). > > > > > > > > First, the RGB format by itself uses 3 bytes per pixel when it > > > > only > > > > needs one (for land use) > > > > > > > > Second, we should cut down by a factor of 9 (3x3) or 16 (4x4). > > > > > > > > I tried that using simple convert statements, but it always seems > > > > to > > > > yield a file exactly double what it should be. > > > > > > > > More on this later; was hoping to get things working "as is" > > > > first. > > > > > > > > I assume you could get the perl code to work on > > > > one-byte-per-pixel > > > > instead of the default 3 for the convert rgb format? > > > > > > > > - Mike > > > > > > > > ----- Original Message ----- > > > > > From: "David Kelly" > > > > > To: "Michael Wilde" > > > > > Sent: Saturday, March 9, 2013 4:36:30 PM > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > That would probably be a good idea for a new script, to show > > > > > how to > > > > > stage apps like that. For now I updated the scripts on lustre.. > > > > > hopefully that helps. > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > To: "David Kelly" > > > > > Sent: Saturday, March 9, 2013 4:29:14 PM > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > OK, I see that its trying to run getlanduse.sh from your > > > > > /lustre > > > > > dir > > > > > on beagle, which is different than the one Ive got checked out. > > > > > It > > > > > seems to get an error in a stderr redirect??? Let me se what I > > > > > need > > > > > to do to get the beagle side in sync. > > > > > > > > > > Seems like since these are perl scripts, we should make the > > > > > app() > > > > > /bin/sh and send the script as data, perhaps? > > > > > > > > > > - Mike > > > > > > > > > > ----- Original Message ----- > > > > > > From: "Michael Wilde" > > > > > > To: "David Kelly" > > > > > > Sent: Saturday, March 9, 2013 4:19:31 PM > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > OK, making progress. Now I dialed down the throttle and node > > > > > > counts > > > > > > to 48 jobs. > > > > > > > > > > > > Now I get further, for ./demo and site=4 script=2: > > > > > > > > > > > > RunID: 20130309-2214-1oi3rvea > > > > > > Progress: time: Sat, 09 Mar 2013 22:14:06 +0000 > > > > > > Progress: time: Sat, 09 Mar 2013 22:14:17 +0000 Selecting > > > > > > site:269 > > > > > > Submitting:47 Submitted:1 > > > > > > Progress: time: Sat, 09 Mar 2013 22:14:22 +0000 Selecting > > > > > > site:269 > > > > > > Stage in:1 Submitted:47 > > > > > > Progress: time: Sat, 09 Mar 2013 22:14:28 +0000 Selecting > > > > > > site:269 > > > > > > Stage in:25 Submitted:23 > > > > > > Progress: time: Sat, 09 Mar 2013 22:14:36 +0000 Selecting > > > > > > site:269 > > > > > > Stage in:48 > > > > > > Progress: time: Sat, 09 Mar 2013 22:15:06 +0000 Selecting > > > > > > site:269 > > > > > > Stage in:48 > > > > > > Progress: time: Sat, 09 Mar 2013 22:15:36 +0000 Selecting > > > > > > site:269 > > > > > > Stage in:48 > > > > > > Progress: time: Sat, 09 Mar 2013 22:16:06 +0000 Selecting > > > > > > site:269 > > > > > > Stage in:48 > > > > > > Progress: time: Sat, 09 Mar 2013 22:16:26 +0000 Selecting > > > > > > site:269 > > > > > > Stage in:47 Active:1 > > > > > > Progress: time: Sat, 09 Mar 2013 22:16:27 +0000 Selecting > > > > > > site:269 > > > > > > Stage in:36 Active:12 > > > > > > Progress: time: Sat, 09 Mar 2013 22:16:29 +0000 Selecting > > > > > > site:269 > > > > > > Stage in:24 Active:24 > > > > > > Progress: time: Sat, 09 Mar 2013 22:16:34 +0000 Selecting > > > > > > site:269 > > > > > > Stage in:24 Active:23 Stage out:1 > > > > > > Progress: time: Sat, 09 Mar 2013 22:16:35 +0000 Selecting > > > > > > site:269 > > > > > > Stage in:14 Active:33 Stage out:1 > > > > > > Execution failed: > > > > > > Exception in getlanduse: > > > > > > Arguments: > > > > > > [home/wilde/osgdemo/modis/svn/data/modis/2002/h08v04.rgb] > > > > > > Host: beagle > > > > > > Directory: > > > > > > modis02-20130309-2214-1oi3rvea/jobs/k/getlanduse-ko5qjd6l > > > > > > > > > > > > Caused by: > > > > > > Application /lustre/beagle/davidk/modis/bin/getlanduse.sh > > > > > > failed > > > > > > with an exit code of 1 > > > > > > getLandUse, modis02.swift, line 20 > > > > > > > > > > > > real 2m31.463s > > > > > > user 1m33.238s > > > > > > sys 0m2.160s > > > > > > + mv /home/wilde/.swift/runs/current/run024.1362867244 > > > > > > /home/wilde/.swift/runs/completed > > > > > > midway001$ > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > From: "David Kelly" > > > > > > > To: "Michael Wilde" > > > > > > > Sent: Saturday, March 9, 2013 3:55:30 PM > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > ok, I'll take a look at that. The run dir I used was > > > > > > > /scratch/midway/davidkelly999/modis/run011 > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > To: "David Kelly" > > > > > > > Sent: Saturday, March 9, 2013 3:52:28 PM > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > I just tried this, but didnt work - same prob. > > > > > > > > > > > > > > But if its working for you now, we must be close. > > > > > > > > > > > > > > Not yet sure what the diff is... > > > > > > > > > > > > > > My run dir is /home/wilde/osgdemo/modis/svn/run021 > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > From: "David Kelly" > > > > > > > > To: "Michael Wilde" > > > > > > > > Sent: Saturday, March 9, 2013 3:46:13 PM > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > Had to make sure I was using the IP address on eth4 > > > > > > > > (128.135.112.71 > > > > > > > > for midway-login1), not a local address or an infiniband > > > > > > > > address. > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > From: "David Kelly" > > > > > > > > To: "Michael Wilde" > > > > > > > > Sent: Saturday, March 9, 2013 3:43:51 PM > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > I just got it working. I had to adjust for the > > > > > > > > differences in > > > > > > > > my > > > > > > > > username on Beagle/Midway, then I had to set > > > > > > > > GLOBUS_HOSTNAME > > > > > > > > on > > > > > > > > Midway to the IP address, rather than the full hostname > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > To: "David Kelly" > > > > > > > > Sent: Saturday, March 9, 2013 3:40:03 PM > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > From: "David Kelly" > > > > > > > > > To: "Michael Wilde" > > > > > > > > > Sent: Saturday, March 9, 2013 3:34:58 PM > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > Is your username the same on beagle and midway? > > > > > > > > > > > > > > > > Yes. And I verified that I can ssh to login4 on beagle > > > > > > > > from > > > > > > > > my > > > > > > > > midway > > > > > > > > session (as indeed the scp's of the proxy files seem to > > > > > > > > be > > > > > > > > working) > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > To: "David Kelly" > > > > > > > > > Sent: Saturday, March 9, 2013 3:34:28 PM > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > OK. > > > > > > > > > > > > > > > > > > Ignore what I said about "problem finding java" - thats > > > > > > > > > code > > > > > > > > > in > > > > > > > > > the > > > > > > > > > very long escaped shell command that gets sent to the > > > > > > > > > remote > > > > > > > > > side. > > > > > > > > > I > > > > > > > > > dont *think* thats the problem. > > > > > > > > > > > > > > > > > > I also verified that beagle can connect to ports 50001 > > > > > > > > > etc > > > > > > > > > on > > > > > > > > > swift.rcc, and that seems OK. > > > > > > > > > > > > > > > > > > I exported GLOBUS_HOSTNAME=midway001.rcc.uchicago.edu > > > > > > > > > on > > > > > > > > > the > > > > > > > > > midway > > > > > > > > > side. And the beagle side seems to be connecting there. > > > > > > > > > > > > > > > > > > Im a bit confused about the timestamps I see for the > > > > > > > > > proxy > > > > > > > > > expiration > > > > > > > > > time, but am not yet suspicious of that (although it > > > > > > > > > seems > > > > > > > > > less > > > > > > > > > than > > > > > > > > > 5 hours past GMT... not sure.) > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > From: "David Kelly" > > > > > > > > > > To: "Michael Wilde" > > > > > > > > > > Sent: Saturday, March 9, 2013 3:26:32 PM > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'm seeing the same error now.. looking into it > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > > To: "David Kelly" > > > > > > > > > > Sent: Saturday, March 9, 2013 3:21:30 PM > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > Looking deeper I see that the logs show problems with > > > > > > > > > > finding > > > > > > > > > > Java, > > > > > > > > > > I > > > > > > > > > > assume on beagle, ans also service ending (presumably > > > > > > > > > > coaster > > > > > > > > > > service on midway host). > > > > > > > > > > > > > > > > > > > > I'll dig into these two. > > > > > > > > > > > > > > > > > > > > I see that it scp's the proxies to beagle which I > > > > > > > > > > think > > > > > > > > > > answers > > > > > > > > > > my > > > > > > > > > > question about security. > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > > > To: "David Kelly" > > > > > > > > > > > Sent: Saturday, March 9, 2013 3:15:01 PM > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > OK. Any thoughts about beagle? > > > > > > > > > > > > > > > > > > > > > > Ive been experimenting but still cant get it to > > > > > > > > > > > work, > > > > > > > > > > > same > > > > > > > > > > > error > > > > > > > > > > > (cant connect to bootstrap port) > > > > > > > > > > > > > > > > > > > > > > WHen you tried ssh-cl to beagle with automatic > > > > > > > > > > > coasters, > > > > > > > > > > > what > > > > > > > > > > > configuration (sites env etc) did you use? > > > > > > > > > > > > > > > > > > > > > > I verified that beagle can connect back to the > > > > > > > > > > > midway > > > > > > > > > > > hosts > > > > > > > > > > > and > > > > > > > > > > > ports. > > > > > > > > > > > > > > > > > > > > > > Do we need to specify security or create a proxy > > > > > > > > > > > etc? > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > From: "David Kelly" > > > > > > > > > > > > To: "Michael Wilde" > > > > > > > > > > > > Sent: Saturday, March 9, 2013 3:08:58 PM > > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > One way you can override/customize the default > > > > > > > > > > > > templates > > > > > > > > > > > > is > > > > > > > > > > > > to > > > > > > > > > > > > create > > > > > > > > > > > > them in $HOME/.swift/sites (I'm not sure if > > > > > > > > > > > > that's > > > > > > > > > > > > what > > > > > > > > > > > > you > > > > > > > > > > > > mean > > > > > > > > > > > > by > > > > > > > > > > > > a local sites dir or not). But you are right > > > > > > > > > > > > about > > > > > > > > > > > > Midway > > > > > > > > > > > > - > > > > > > > > > > > > I > > > > > > > > > > > > have > > > > > > > > > > > > noticed that when using modis it will sometimes > > > > > > > > > > > > get > > > > > > > > > > > > stuck > > > > > > > > > > > > when > > > > > > > > > > > > it > > > > > > > > > > > > goes to a queue that is busy. Ideally swift > > > > > > > > > > > > replication > > > > > > > > > > > > would > > > > > > > > > > > > be > > > > > > > > > > > > able to help better handle that, but I haven't > > > > > > > > > > > > had > > > > > > > > > > > > much > > > > > > > > > > > > luck > > > > > > > > > > > > with > > > > > > > > > > > > that yet. Another way around this may be to add > > > > > > > > > > > > this > > > > > > > > > > > > to > > > > > > > > > > > > the > > > > > > > > > > > > template: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > key="slurm.exclusive">false > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The swift.log issue was never fixed. It went to > > > > > > > > > > > > swift-devel > > > > > > > > > > > > for > > > > > > > > > > > > discussion but was never fixed. I think it is > > > > > > > > > > > > relatively > > > > > > > > > > > > simple > > > > > > > > > > > > though.. probably worth fixing before release. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > > > > To: "David Kelly" > > > > > > > > > > > > Sent: Saturday, March 9, 2013 1:38:47 PM > > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > OK, sounds good re the trip plan. Feel free to > > > > > > > > > > > > stay > > > > > > > > > > > > Tue > > > > > > > > > > > > night > > > > > > > > > > > > to > > > > > > > > > > > > avoid a 4hr drive after a long day. > > > > > > > > > > > > > > > > > > > > > > > > Im trying the modis demo. > > > > > > > > > > > > > > > > > > > > > > > > I tried to create a local sites/ dir so I can > > > > > > > > > > > > modify > > > > > > > > > > > > the > > > > > > > > > > > > sites > > > > > > > > > > > > templates; thats not working for me either yet. > > > > > > > > > > > > > > > > > > > > > > > > For midway, need to force to westmere or sandyb > > > > > > > > > > > > (but > > > > > > > > > > > > not > > > > > > > > > > > > both) > > > > > > > > > > > > and > > > > > > > > > > > > ensure 1-node jobs, because either queue can get > > > > > > > > > > > > filled > > > > > > > > > > > > and > > > > > > > > > > > > not > > > > > > > > > > > > yield an idle node for a long time. maybe need to > > > > > > > > > > > > fiddle > > > > > > > > > > > > jobsPerNode > > > > > > > > > > > > to get at least 1 core when the system is busy > > > > > > > > > > > > and > > > > > > > > > > > > *pretend* > > > > > > > > > > > > that > > > > > > > > > > > > its a node. > > > > > > > > > > > > > > > > > > > > > > > > So to get response I tried beagle-ssh; That isnt > > > > > > > > > > > > working > > > > > > > > > > > > because > > > > > > > > > > > > the > > > > > > > > > > > > template sites file is wrong in swift 0.94 rc4. > > > > > > > > > > > > > > > > > > > > > > > > I also see that swift.log is still getting > > > > > > > > > > > > produced - > > > > > > > > > > > > I > > > > > > > > > > > > thought > > > > > > > > > > > > we > > > > > > > > > > > > eliminated that. Did it come back due to a > > > > > > > > > > > > problem > > > > > > > > > > > > with > > > > > > > > > > > > that > > > > > > > > > > > > fix? > > > > > > > > > > > > > > > > > > > > > > > > I'll keep hacking; suggestions welcome. > > > > > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > From: "David Kelly" > > > > > > > > > > > > > To: "Michael Wilde" > > > > > > > > > > > > > Sent: Saturday, March 9, 2013 12:20:00 PM > > > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Mike, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Looking more closely at the agenda, I think the > > > > > > > > > > > > > most > > > > > > > > > > > > > interesting/useful talks will be on Tuesday. > > > > > > > > > > > > > Monday > > > > > > > > > > > > > I'll > > > > > > > > > > > > > come > > > > > > > > > > > > > to > > > > > > > > > > > > > Argonne to work on any loose ends and put the > > > > > > > > > > > > > finishing > > > > > > > > > > > > > touches > > > > > > > > > > > > > on > > > > > > > > > > > > > any slides/runs/scripts, then drive to > > > > > > > > > > > > > Indianapolis > > > > > > > > > > > > > on > > > > > > > > > > > > > Monday > > > > > > > > > > > > > afternoon/evening. I have a hotel booked for > > > > > > > > > > > > > Monday > > > > > > > > > > > > > night. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'll do some runs using the routes we talked > > > > > > > > > > > > > about. > > > > > > > > > > > > > I'm > > > > > > > > > > > > > pretty > > > > > > > > > > > > > sure > > > > > > > > > > > > > I > > > > > > > > > > > > > have working configurations for everything we > > > > > > > > > > > > > talked > > > > > > > > > > > > > about, > > > > > > > > > > > > > so > > > > > > > > > > > > > I > > > > > > > > > > > > > think it's really just a matter of plugging in > > > > > > > > > > > > > the > > > > > > > > > > > > > apps. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > David > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > > > > > To: "David Kelly" > > > > > > > > > > > > > Sent: Saturday, March 9, 2013 11:03:15 AM > > > > > > > > > > > > > Subject: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > Hi David, > > > > > > > > > > > > > > > > > > > > > > > > > > I just wanted to let you know that Im looking > > > > > > > > > > > > > into > > > > > > > > > > > > > the > > > > > > > > > > > > > run > > > > > > > > > > > > > options > > > > > > > > > > > > > now. Im hoping to try a few... WIll see how > > > > > > > > > > > > > much > > > > > > > > > > > > > help > > > > > > > > > > > > > I > > > > > > > > > > > > > need. > > > > > > > > > > > > > Have > > > > > > > > > > > > > you decided on a driving time and made hotel > > > > > > > > > > > > > arrangements? > > > > > > > > > > > > > > > > > > > > > > > > > > I would feel free to stay for whatever portion > > > > > > > > > > > > > of > > > > > > > > > > > > > the > > > > > > > > > > > > > OSG > > > > > > > > > > > > > meeting > > > > > > > > > > > > > you > > > > > > > > > > > > > feel is of value. The only thing I ask is that > > > > > > > > > > > > > for > > > > > > > > > > > > > Wed > > > > > > > > > > > > > and > > > > > > > > > > > > > Thu > > > > > > > > > > > > > you > > > > > > > > > > > > > stay available online for user-support or other > > > > > > > > > > > > > assistance > > > > > > > > > > > > > needs > > > > > > > > > > > > > that come up here. And that you engage with > > > > > > > > > > > > > people > > > > > > > > > > > > > that > > > > > > > > > > > > > can > > > > > > > > > > > > > help > > > > > > > > > > > > > us > > > > > > > > > > > > > develop the Swift user community and reliable > > > > > > > > > > > > > OSG > > > > > > > > > > > > > usage. > > > > > > > > > > > > > Rob, > > > > > > > > > > > > > Marco, > > > > > > > > > > > > > Lincoln, and Suchandra would be good to hang > > > > > > > > > > > > > out > > > > > > > > > > > > > with > > > > > > > > > > > > > and > > > > > > > > > > > > > they > > > > > > > > > > > > > can > > > > > > > > > > > > > introduce you to good contacts. > > > > > > > > > > > > > > > > > > > > > > > > > > Of course we will cover your expenses via a > > > > > > > > > > > > > UChicago > > > > > > > > > > > > > travel > > > > > > > > > > > > > expense > > > > > > > > > > > > > report. > > > > > > > > > > > > > > > > > > > > > > > > > > We'll be starting a project with a tiny bit of > > > > > > > > > > > > > additional > > > > > > > > > > > > > ExTENCI > > > > > > > > > > > > > funds to make Swift do smarter data management > > > > > > > > > > > > > on > > > > > > > > > > > > > OSG > > > > > > > > > > > > > sites > > > > > > > > > > > > > (and > > > > > > > > > > > > > in > > > > > > > > > > > > > general) so anything you learn about OSG > > > > > > > > > > > > > storage > > > > > > > > > > > > > elements/services/tools will be valuable for > > > > > > > > > > > > > that > > > > > > > > > > > > > (srmcp, > > > > > > > > > > > > > lcgcp, > > > > > > > > > > > > > etc). > > > > > > > > > > > > > > > > > > > > > > > > > > Between now and your talk, lets just focus on > > > > > > > > > > > > > the > > > > > > > > > > > > > talk, > > > > > > > > > > > > > OK? > > > > > > > > > > > > > Im > > > > > > > > > > > > > hoping > > > > > > > > > > > > > we have slides frozen by Monday. > > > > > > > > > > > > > > > > > > > > > > > > > > While I fiddle, if you could do catsn or other > > > > > > > > > > > > > hello-world-like > > > > > > > > > > > > > tests > > > > > > > > > > > > > to cover the "routes" we discussed, that would > > > > > > > > > > > > > pave > > > > > > > > > > > > > the > > > > > > > > > > > > > way > > > > > > > > > > > > > for > > > > > > > > > > > > > plugging in the real app examples. > > > > > > > > > > > > > > > > > > > > > > > > > > Sound good? Let me know of any concerns (other > > > > > > > > > > > > > than > > > > > > > > > > > > > the > > > > > > > > > > > > > fact > > > > > > > > > > > > > that > > > > > > > > > > > > > this is a tad rushed ;) > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks and regards, > > > > > > > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > Michael Wilde > > > > > > > > > > > > > Computation Institute, University of Chicago > > > > > > > > > > > > > Mathematics and Computer Science Division > > > > > > > > > > > > > Argonne National Laboratory > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > From wilde at mcs.anl.gov Sun Mar 10 15:20:53 2013 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sun, 10 Mar 2013 15:20:53 -0500 (CDT) Subject: [Swift-devel] Cant get auto-coasters to run from midway to beagle In-Reply-To: <1362945985.32419.5.camel@echo> Message-ID: <884747621.1285936.1362946853450.JavaMail.root@mcs.anl.gov> Duh. Thank you. I didn't build a new release, was using same 0.94 RC4 code. Sorry about that. Will retest. - Mike ----- Original Message ----- > From: "Mihael Hategan" > To: "Michael Wilde" > Cc: "Swift Devel" > Sent: Sunday, March 10, 2013 3:06:25 PM > Subject: Re: [Swift-devel] Cant get auto-coasters to run from midway to beagle > > ChannelContext Notifying commands and handlers about exception > org.globus.cog.karajan.workflow.service.TimeoutException: Channel > timed > out. lastTime=940817-071255.807, now=130310-164156.506, > channel=GSSSChannel-1463847073(1)[service-60519] > > Are you sure you are running with the latest code? . There was a > (inconsequential mostly) bug before that set lastTime to > Long.MAX_TIME > before creating that exception. That was fixed. Your message > indicates > the code you are using does not have that fix (year xx94 is what > comes > out of Long.MAX_TIME). > > I gotta go now, but I'll come back later and check some more. There > is > something weird going on there besides that. > > Mihael > > On Sun, 2013-03-10 at 12:01 -0500, Michael Wilde wrote: > > Here's run034: seems to be a bit better, but still dies. This is > > with throttle of 48 jobs on 48 cores (2 nodes), fom swift.rcc to > > beagle. 17MB files. Still seems to curiously die about 4 mins > > into the run, which suggests some kind of timeout is still > > lurking??? > > > > - Mike > > > > Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified locally) > > > > RunID: 20130310-1639-kyb8hca9 > > Progress: time: Sun, 10 Mar 2013 16:39:45 +0000 > > Progress: time: Sun, 10 Mar 2013 16:39:56 +0000 Selecting > > site:269 Submitting:47 Submitted:1 > > Progress: time: Sun, 10 Mar 2013 16:40:01 +0000 Selecting > > site:269 Stage in:1 Submitted:47 > > Progress: time: Sun, 10 Mar 2013 16:40:15 +0000 Selecting > > site:269 Stage in:48 > > Progress: time: Sun, 10 Mar 2013 16:40:45 +0000 Selecting > > site:269 Stage in:48 > > Progress: time: Sun, 10 Mar 2013 16:41:15 +0000 Selecting > > site:269 Stage in:48 > > Progress: time: Sun, 10 Mar 2013 16:41:45 +0000 Selecting > > site:269 Stage in:48 > > Progress: time: Sun, 10 Mar 2013 16:42:11 +0000 Selecting > > site:269 Stage in:47 Active:1 > > Progress: time: Sun, 10 Mar 2013 16:42:12 +0000 Selecting > > site:269 Stage in:41 Active:7 > > Progress: time: Sun, 10 Mar 2013 16:42:13 +0000 Selecting > > site:269 Stage in:23 Active:25 > > Progress: time: Sun, 10 Mar 2013 16:42:15 +0000 Selecting > > site:269 Active:48 > > Progress: time: Sun, 10 Mar 2013 16:42:17 +0000 Selecting > > site:269 Active:47 Stage out:1 > > Progress: time: Sun, 10 Mar 2013 16:42:18 +0000 Selecting > > site:268 Stage in:1 Active:46 Stage out:1 Finished > > successfully:1 > > Progress: time: Sun, 10 Mar 2013 16:42:19 +0000 Selecting > > site:265 Stage in:3 Submitted:1 Active:42 Stage out:2 > > Finished successfully:4 > > Progress: time: Sun, 10 Mar 2013 16:42:20 +0000 Selecting > > site:258 Stage in:6 Submitting:5 Active:23 Stage out:13 > > Finished successfully:12 > > Progress: time: Sun, 10 Mar 2013 16:42:21 +0000 Selecting > > site:244 Stage in:24 Submitting:1 Active:20 Stage out:3 > > Finished successfully:25 > > Progress: time: Sun, 10 Mar 2013 16:42:23 +0000 Selecting > > site:241 Stage in:25 Submitting:3 Stage out:19 Finished > > successfully:29 > > Progress: time: Sun, 10 Mar 2013 16:42:24 +0000 Selecting > > site:221 Stage in:28 Submitting:19 Submitted:1 Finished > > successfully:48 > > Progress: time: Sun, 10 Mar 2013 16:42:45 +0000 Selecting > > site:221 Stage in:48 Finished successfully:48 > > Progress: time: Sun, 10 Mar 2013 16:42:54 +0000 Selecting > > site:221 Stage in:47 Active:1 Finished successfully:48 > > Progress: time: Sun, 10 Mar 2013 16:43:00 +0000 Selecting > > site:221 Stage in:47 Stage out:1 Finished successfully:48 > > Progress: time: Sun, 10 Mar 2013 16:43:02 +0000 Selecting > > site:221 Stage in:47 Finished successfully:49 > > Progress: time: Sun, 10 Mar 2013 16:43:05 +0000 Selecting > > site:220 Stage in:47 Submitted:1 Finished successfully:49 > > Progress: time: Sun, 10 Mar 2013 16:43:15 +0000 Selecting > > site:220 Stage in:48 Finished successfully:49 > > Progress: time: Sun, 10 Mar 2013 16:43:45 +0000 Selecting > > site:220 Stage in:48 Finished successfully:49 > > Channels: > > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] > > -> > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60519] > > -> BufferingChannel, > > null at id://u7b315f9a-13d552c3f68--7fff-u-362f30fc-13d552c3f50--7fffC=MetaChannel[https://192.5.86.107:50000] > > -> > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > null at id://u-362f30fc-13d552c3f50--8000-u7b315f9a-13d552c3f68--8000S=MetaChannel[service-60519] > > -> BufferingChannel} > > Context: service-60859 > > Meta context: service-60519 > > Progress: time: Sun, 10 Mar 2013 16:43:59 +0000 Selecting > > site:220 Stage in:47 Active:1 Finished successfully:49 > > Channels: > > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] > > -> > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60519] > > -> BufferingChannel, > > null at id://u7b315f9a-13d552c3f68--7fff-u-362f30fc-13d552c3f50--7fffC=MetaChannel[https://192.5.86.107:50000] > > -> > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > null at id://u-362f30fc-13d552c3f50--8000-u7b315f9a-13d552c3f68--8000S=MetaChannel[service-60519] > > -> BufferingChannel} > > Context: service-60663 > > Meta context: service-60519 > > Progress: time: Sun, 10 Mar 2013 16:44:05 +0000 Selecting > > site:220 Stage in:47 Stage out:1 Finished successfully:49 > > Progress: time: Sun, 10 Mar 2013 16:44:07 +0000 Selecting > > site:220 Stage in:47 Finished successfully:50 > > Channels: > > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] > > -> > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60519] > > -> BufferingChannel, > > null at id://u7b315f9a-13d552c3f68--7fff-u-362f30fc-13d552c3f50--7fffC=MetaChannel[https://192.5.86.107:50000] > > -> > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > null at id://u-362f30fc-13d552c3f50--8000-u7b315f9a-13d552c3f68--8000S=MetaChannel[service-60519] > > -> BufferingChannel} > > Context: service-60081 > > Meta context: service-60519 > > Progress: time: Sun, 10 Mar 2013 16:44:09 +0000 Selecting > > site:219 Stage in:45 Submitting:1 Active:2 Finished > > successfully:50 > > Execution failed: > > Exception in getlanduse: > > Arguments: > > [home/wilde/osgdemo/modis/svn/data/modis/2002/h02v11.rgb] > > Host: beagle > > Directory: > > modis02-20130310-1639-kyb8hca9/jobs/9/getlanduse-90fyse6l > > > > Caused by: > > Shutting down worker > > getLandUse, modis02.swift, line 20 > > error null > > > > real 4m27.007s > > user 2m44.221s > > sys 0m3.448s > > + mv /home/wilde/.swift/runs/current/run034.1362933583 > > /home/wilde/.swift/runs/completed > > midway001$ > > > > > > ----- Original Message ----- > > > From: "Mihael Hategan" > > > To: "Michael Wilde" > > > Cc: "Swift Devel" > > > Sent: Sunday, March 10, 2013 1:36:26 AM > > > Subject: Re: [Swift-devel] Cant get auto-coasters to run from > > > midway to beagle > > > > > > Please try now. I made some changes: > > > > > > 1. start the service with "-l" so that things in your .profile > > > (such > > > as > > > module load sun-java) would be picked up. However, this also > > > means > > > that > > > you should unset X509_* stuff or the sshcl proxy forwarding will > > > not > > > work properly. > > > > > > 2. I fixed a bug that caused an extra connection to the coaster > > > service. > > > Normally the service connects back to the client and both use > > > that > > > connection. However, due to some changes in the way credentials > > > were > > > set > > > for jobs, and the fact that connections were looked up based on > > > both > > > hostname and credential, the coaster client would ignore the > > > existing > > > connection and create another one. The initial one with then time > > > out > > > at > > > some point causing the service to crash. > > > > > > Mihael > > > > > > On Sat, 2013-03-09 at 17:49 -0600, Michael Wilde wrote: > > > > An update on this provider staging related issue: reducing > > > > filesize > > > > from 17MB to 600KB runs well. > > > > > > > > So seems like some kind of flow control or buffer management > > > > problem, possibly? > > > > > > > > May need to take that problem offline - would be a perfect test > > > > case for Yadu to develop a new stress test for. > > > > > > > > - Mike > > > > > > > > > > > > ----- Forwarded Message ----- > > > > From: "Michael Wilde" > > > > To: "David Kelly" > > > > Sent: Saturday, March 9, 2013 5:21:49 PM > > > > Subject: Re: runs for OSG talk > > > > > > > > OK, much better: with 600K files (5x5 reduction or 25X smaller) > > > > it > > > > works well, and fast (form midway to beagle!) > > > > > > > > Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified locally) > > > > > > > > RunID: 20130309-2319-5zq0jrfg > > > > Progress: time: Sat, 09 Mar 2013 23:19:45 +0000 > > > > Progress: time: Sat, 09 Mar 2013 23:19:56 +0000 Selecting > > > > site:269 Submitting:47 Submitted:1 > > > > Progress: time: Sat, 09 Mar 2013 23:20:05 +0000 Selecting > > > > site:269 Stage in:1 Submitted:47 > > > > Progress: time: Sat, 09 Mar 2013 23:20:09 +0000 Selecting > > > > site:269 Stage in:47 Active:1 > > > > Progress: time: Sat, 09 Mar 2013 23:20:10 +0000 Selecting > > > > site:269 Stage in:46 Active:1 Stage out:1 > > > > Progress: time: Sat, 09 Mar 2013 23:20:11 +0000 Selecting > > > > site:250 Stage in:19 Active:28 Stage out:1 Finished > > > > successfully:19 > > > > Progress: time: Sat, 09 Mar 2013 23:20:12 +0000 Selecting > > > > site:229 Stage in:18 Submitting:21 Active:1 Stage out:7 > > > > Finished successfully:41 > > > > Progress: time: Sat, 09 Mar 2013 23:20:13 +0000 Selecting > > > > site:220 Stage in:41 Submitting:1 Active:5 Stage out:1 > > > > Finished successfully:49 > > > > Progress: time: Sat, 09 Mar 2013 23:20:14 +0000 Selecting > > > > site:220 Stage in:38 Active:1 Stage out:9 Finished > > > > successfully:49 > > > > Progress: time: Sat, 09 Mar 2013 23:20:15 +0000 Selecting > > > > site:212 Stage in:30 Submitting:8 Stage out:9 Finished > > > > successfully:58 > > > > Progress: time: Sat, 09 Mar 2013 23:20:16 +0000 Selecting > > > > site:203 Stage in:38 Submitting:8 Submitted:1 Finished > > > > successfully:67 > > > > Progress: time: Sat, 09 Mar 2013 23:20:18 +0000 Selecting > > > > site:202 Stage in:19 Stage out:28 Finished successfully:68 > > > > Progress: time: Sat, 09 Mar 2013 23:20:19 +0000 Selecting > > > > site:172 Stage in:33 Submitting:2 Submitted:6 Active:5 > > > > Stage > > > > out:2 Finished successfully:97 > > > > Progress: time: Sat, 09 Mar 2013 23:20:20 +0000 Selecting > > > > site:170 Stage in:31 Submitting:2 Stage out:14 Finished > > > > successfully:100 > > > > Progress: time: Sat, 09 Mar 2013 23:20:21 +0000 Selecting > > > > site:162 Stage in:30 Submitting:10 Stage out:6 Finished > > > > successfully:109 > > > > Progress: time: Sat, 09 Mar 2013 23:20:22 +0000 Selecting > > > > site:154 Stage in:39 Submitting:5 Submitted:3 Active:1 > > > > Finished successfully:115 > > > > Progress: time: Sat, 09 Mar 2013 23:20:23 +0000 Selecting > > > > site:154 Stage in:21 Active:10 Stage out:16 Finished > > > > successfully:116 > > > > Progress: time: Sat, 09 Mar 2013 23:20:24 +0000 Selecting > > > > site:126 Stage in:20 Submitting:25 Submitted:1 Stage out:2 > > > > Finished successfully:143 > > > > Progress: time: Sat, 09 Mar 2013 23:20:25 +0000 Selecting > > > > site:124 Stage in:31 Active:2 Stage out:15 Finished > > > > successfully:145 > > > > Progress: time: Sat, 09 Mar 2013 23:20:26 +0000 Selecting > > > > site:110 Stage in:30 Submitting:14 Stage out:3 Finished > > > > successfully:160 > > > > Progress: time: Sat, 09 Mar 2013 23:20:27 +0000 Selecting > > > > site:106 Stage in:43 Submitting:1 Submitted:1 Active:1 > > > > Stage > > > > out:2 Finished successfully:163 > > > > Progress: time: Sat, 09 Mar 2013 23:20:28 +0000 Selecting > > > > site:104 Stage in:20 Submitting:2 Active:7 Stage out:19 > > > > Finished successfully:165 > > > > Progress: time: Sat, 09 Mar 2013 23:20:29 +0000 Selecting > > > > site:78 > > > > Stage in:29 Submitting:16 Submitted:1 Stage out:2 Finished > > > > successfully:191 > > > > Progress: time: Sat, 09 Mar 2013 23:20:31 +0000 Selecting > > > > site:76 > > > > Stage in:30 Stage out:17 Finished successfully:194 > > > > Progress: time: Sat, 09 Mar 2013 23:20:32 +0000 Selecting > > > > site:58 > > > > Stage in:29 Submitting:18 Active:1 Finished > > > > successfully:211 > > > > Progress: time: Sat, 09 Mar 2013 23:20:33 +0000 Selecting > > > > site:58 > > > > Stage in:33 Active:3 Stage out:12 Finished successfully:211 > > > > Progress: time: Sat, 09 Mar 2013 23:20:34 +0000 Selecting > > > > site:46 > > > > Stage in:18 Submitting:11 Submitted:1 Active:2 Stage > > > > out:14 > > > > Finished successfully:225 > > > > Progress: time: Sat, 09 Mar 2013 23:20:35 +0000 Selecting > > > > site:30 > > > > Stage in:29 Active:14 Stage out:3 Finished successfully:241 > > > > Progress: time: Sat, 09 Mar 2013 23:20:36 +0000 Selecting > > > > site:28 > > > > Stage in:28 Submitting:2 Stage out:17 Finished > > > > successfully:242 > > > > Progress: time: Sat, 09 Mar 2013 23:20:37 +0000 Selecting > > > > site:10 > > > > Stage in:30 Submitting:17 Submitted:1 Finished > > > > successfully:259 > > > > Progress: time: Sat, 09 Mar 2013 23:20:38 +0000 Selecting > > > > site:10 > > > > Stage in:35 Stage out:13 Finished successfully:259 > > > > Progress: time: Sat, 09 Mar 2013 23:20:39 +0000 Stage in:21 > > > > Submitting:6 Submitted:3 Stage out:15 Finished > > > > successfully:272 > > > > Progress: time: Sat, 09 Mar 2013 23:20:40 +0000 Stage in:10 > > > > Active:5 Stage out:14 Finished successfully:288 > > > > Final status: Sat, 09 Mar 2013 23:20:41 +0000 Finished > > > > successfully:317 > > > > > > > > real 0m58.953s > > > > user 0m32.573s > > > > sys 0m1.263s > > > > + mv /home/wilde/.swift/runs/current/run029.1362871183 > > > > /home/wilde/.swift/runs/completed > > > > midway001$ > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > From: "David Kelly" > > > > > To: "Michael Wilde" > > > > > Sent: Saturday, March 9, 2013 5:12:59 PM > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > Yep - I had a version where the input files were in a very > > > > > similar > > > > > format (PGM, 1 byte per pixel). I'll add that back, but > > > > > without > > > > > the > > > > > small PGM header in the files. > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > To: "David Kelly" > > > > > Sent: Saturday, March 9, 2013 5:04:43 PM > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > I think we need to cut down the size of these files for a > > > > > demo > > > > > (although they are great for a stress test). > > > > > > > > > > First, the RGB format by itself uses 3 bytes per pixel when > > > > > it > > > > > only > > > > > needs one (for land use) > > > > > > > > > > Second, we should cut down by a factor of 9 (3x3) or 16 > > > > > (4x4). > > > > > > > > > > I tried that using simple convert statements, but it always > > > > > seems > > > > > to > > > > > yield a file exactly double what it should be. > > > > > > > > > > More on this later; was hoping to get things working "as is" > > > > > first. > > > > > > > > > > I assume you could get the perl code to work on > > > > > one-byte-per-pixel > > > > > instead of the default 3 for the convert rgb format? > > > > > > > > > > - Mike > > > > > > > > > > ----- Original Message ----- > > > > > > From: "David Kelly" > > > > > > To: "Michael Wilde" > > > > > > Sent: Saturday, March 9, 2013 4:36:30 PM > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > That would probably be a good idea for a new script, to > > > > > > show > > > > > > how to > > > > > > stage apps like that. For now I updated the scripts on > > > > > > lustre.. > > > > > > hopefully that helps. > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > To: "David Kelly" > > > > > > Sent: Saturday, March 9, 2013 4:29:14 PM > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > OK, I see that its trying to run getlanduse.sh from your > > > > > > /lustre > > > > > > dir > > > > > > on beagle, which is different than the one Ive got checked > > > > > > out. > > > > > > It > > > > > > seems to get an error in a stderr redirect??? Let me se > > > > > > what I > > > > > > need > > > > > > to do to get the beagle side in sync. > > > > > > > > > > > > Seems like since these are perl scripts, we should make the > > > > > > app() > > > > > > /bin/sh and send the script as data, perhaps? > > > > > > > > > > > > - Mike > > > > > > > > > > > > ----- Original Message ----- > > > > > > > From: "Michael Wilde" > > > > > > > To: "David Kelly" > > > > > > > Sent: Saturday, March 9, 2013 4:19:31 PM > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > OK, making progress. Now I dialed down the throttle and > > > > > > > node > > > > > > > counts > > > > > > > to 48 jobs. > > > > > > > > > > > > > > Now I get further, for ./demo and site=4 script=2: > > > > > > > > > > > > > > RunID: 20130309-2214-1oi3rvea > > > > > > > Progress: time: Sat, 09 Mar 2013 22:14:06 +0000 > > > > > > > Progress: time: Sat, 09 Mar 2013 22:14:17 +0000 Selecting > > > > > > > site:269 > > > > > > > Submitting:47 Submitted:1 > > > > > > > Progress: time: Sat, 09 Mar 2013 22:14:22 +0000 Selecting > > > > > > > site:269 > > > > > > > Stage in:1 Submitted:47 > > > > > > > Progress: time: Sat, 09 Mar 2013 22:14:28 +0000 Selecting > > > > > > > site:269 > > > > > > > Stage in:25 Submitted:23 > > > > > > > Progress: time: Sat, 09 Mar 2013 22:14:36 +0000 Selecting > > > > > > > site:269 > > > > > > > Stage in:48 > > > > > > > Progress: time: Sat, 09 Mar 2013 22:15:06 +0000 Selecting > > > > > > > site:269 > > > > > > > Stage in:48 > > > > > > > Progress: time: Sat, 09 Mar 2013 22:15:36 +0000 Selecting > > > > > > > site:269 > > > > > > > Stage in:48 > > > > > > > Progress: time: Sat, 09 Mar 2013 22:16:06 +0000 Selecting > > > > > > > site:269 > > > > > > > Stage in:48 > > > > > > > Progress: time: Sat, 09 Mar 2013 22:16:26 +0000 Selecting > > > > > > > site:269 > > > > > > > Stage in:47 Active:1 > > > > > > > Progress: time: Sat, 09 Mar 2013 22:16:27 +0000 Selecting > > > > > > > site:269 > > > > > > > Stage in:36 Active:12 > > > > > > > Progress: time: Sat, 09 Mar 2013 22:16:29 +0000 Selecting > > > > > > > site:269 > > > > > > > Stage in:24 Active:24 > > > > > > > Progress: time: Sat, 09 Mar 2013 22:16:34 +0000 Selecting > > > > > > > site:269 > > > > > > > Stage in:24 Active:23 Stage out:1 > > > > > > > Progress: time: Sat, 09 Mar 2013 22:16:35 +0000 Selecting > > > > > > > site:269 > > > > > > > Stage in:14 Active:33 Stage out:1 > > > > > > > Execution failed: > > > > > > > Exception in getlanduse: > > > > > > > Arguments: > > > > > > > [home/wilde/osgdemo/modis/svn/data/modis/2002/h08v04.rgb] > > > > > > > Host: beagle > > > > > > > Directory: > > > > > > > modis02-20130309-2214-1oi3rvea/jobs/k/getlanduse-ko5qjd6l > > > > > > > > > > > > > > Caused by: > > > > > > > Application /lustre/beagle/davidk/modis/bin/getlanduse.sh > > > > > > > failed > > > > > > > with an exit code of 1 > > > > > > > getLandUse, modis02.swift, line 20 > > > > > > > > > > > > > > real 2m31.463s > > > > > > > user 1m33.238s > > > > > > > sys 0m2.160s > > > > > > > + mv /home/wilde/.swift/runs/current/run024.1362867244 > > > > > > > /home/wilde/.swift/runs/completed > > > > > > > midway001$ > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > From: "David Kelly" > > > > > > > > To: "Michael Wilde" > > > > > > > > Sent: Saturday, March 9, 2013 3:55:30 PM > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > ok, I'll take a look at that. The run dir I used was > > > > > > > > /scratch/midway/davidkelly999/modis/run011 > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > To: "David Kelly" > > > > > > > > Sent: Saturday, March 9, 2013 3:52:28 PM > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > I just tried this, but didnt work - same prob. > > > > > > > > > > > > > > > > But if its working for you now, we must be close. > > > > > > > > > > > > > > > > Not yet sure what the diff is... > > > > > > > > > > > > > > > > My run dir is /home/wilde/osgdemo/modis/svn/run021 > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > From: "David Kelly" > > > > > > > > > To: "Michael Wilde" > > > > > > > > > Sent: Saturday, March 9, 2013 3:46:13 PM > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > Had to make sure I was using the IP address on eth4 > > > > > > > > > (128.135.112.71 > > > > > > > > > for midway-login1), not a local address or an > > > > > > > > > infiniband > > > > > > > > > address. > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "David Kelly" > > > > > > > > > To: "Michael Wilde" > > > > > > > > > Sent: Saturday, March 9, 2013 3:43:51 PM > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > I just got it working. I had to adjust for the > > > > > > > > > differences in > > > > > > > > > my > > > > > > > > > username on Beagle/Midway, then I had to set > > > > > > > > > GLOBUS_HOSTNAME > > > > > > > > > on > > > > > > > > > Midway to the IP address, rather than the full > > > > > > > > > hostname > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > To: "David Kelly" > > > > > > > > > Sent: Saturday, March 9, 2013 3:40:03 PM > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > From: "David Kelly" > > > > > > > > > > To: "Michael Wilde" > > > > > > > > > > Sent: Saturday, March 9, 2013 3:34:58 PM > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Is your username the same on beagle and midway? > > > > > > > > > > > > > > > > > > Yes. And I verified that I can ssh to login4 on > > > > > > > > > beagle > > > > > > > > > from > > > > > > > > > my > > > > > > > > > midway > > > > > > > > > session (as indeed the scp's of the proxy files seem > > > > > > > > > to > > > > > > > > > be > > > > > > > > > working) > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > > To: "David Kelly" > > > > > > > > > > Sent: Saturday, March 9, 2013 3:34:28 PM > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > OK. > > > > > > > > > > > > > > > > > > > > Ignore what I said about "problem finding java" - > > > > > > > > > > thats > > > > > > > > > > code > > > > > > > > > > in > > > > > > > > > > the > > > > > > > > > > very long escaped shell command that gets sent to > > > > > > > > > > the > > > > > > > > > > remote > > > > > > > > > > side. > > > > > > > > > > I > > > > > > > > > > dont *think* thats the problem. > > > > > > > > > > > > > > > > > > > > I also verified that beagle can connect to ports > > > > > > > > > > 50001 > > > > > > > > > > etc > > > > > > > > > > on > > > > > > > > > > swift.rcc, and that seems OK. > > > > > > > > > > > > > > > > > > > > I exported > > > > > > > > > > GLOBUS_HOSTNAME=midway001.rcc.uchicago.edu > > > > > > > > > > on > > > > > > > > > > the > > > > > > > > > > midway > > > > > > > > > > side. And the beagle side seems to be connecting > > > > > > > > > > there. > > > > > > > > > > > > > > > > > > > > Im a bit confused about the timestamps I see for > > > > > > > > > > the > > > > > > > > > > proxy > > > > > > > > > > expiration > > > > > > > > > > time, but am not yet suspicious of that (although > > > > > > > > > > it > > > > > > > > > > seems > > > > > > > > > > less > > > > > > > > > > than > > > > > > > > > > 5 hours past GMT... not sure.) > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > From: "David Kelly" > > > > > > > > > > > To: "Michael Wilde" > > > > > > > > > > > Sent: Saturday, March 9, 2013 3:26:32 PM > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'm seeing the same error now.. looking into it > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > > > To: "David Kelly" > > > > > > > > > > > Sent: Saturday, March 9, 2013 3:21:30 PM > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > Looking deeper I see that the logs show problems > > > > > > > > > > > with > > > > > > > > > > > finding > > > > > > > > > > > Java, > > > > > > > > > > > I > > > > > > > > > > > assume on beagle, ans also service ending > > > > > > > > > > > (presumably > > > > > > > > > > > coaster > > > > > > > > > > > service on midway host). > > > > > > > > > > > > > > > > > > > > > > I'll dig into these two. > > > > > > > > > > > > > > > > > > > > > > I see that it scp's the proxies to beagle which I > > > > > > > > > > > think > > > > > > > > > > > answers > > > > > > > > > > > my > > > > > > > > > > > question about security. > > > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > > > > To: "David Kelly" > > > > > > > > > > > > Sent: Saturday, March 9, 2013 3:15:01 PM > > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > OK. Any thoughts about beagle? > > > > > > > > > > > > > > > > > > > > > > > > Ive been experimenting but still cant get it to > > > > > > > > > > > > work, > > > > > > > > > > > > same > > > > > > > > > > > > error > > > > > > > > > > > > (cant connect to bootstrap port) > > > > > > > > > > > > > > > > > > > > > > > > WHen you tried ssh-cl to beagle with automatic > > > > > > > > > > > > coasters, > > > > > > > > > > > > what > > > > > > > > > > > > configuration (sites env etc) did you use? > > > > > > > > > > > > > > > > > > > > > > > > I verified that beagle can connect back to the > > > > > > > > > > > > midway > > > > > > > > > > > > hosts > > > > > > > > > > > > and > > > > > > > > > > > > ports. > > > > > > > > > > > > > > > > > > > > > > > > Do we need to specify security or create a > > > > > > > > > > > > proxy > > > > > > > > > > > > etc? > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > From: "David Kelly" > > > > > > > > > > > > > To: "Michael Wilde" > > > > > > > > > > > > > Sent: Saturday, March 9, 2013 3:08:58 PM > > > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > One way you can override/customize the > > > > > > > > > > > > > default > > > > > > > > > > > > > templates > > > > > > > > > > > > > is > > > > > > > > > > > > > to > > > > > > > > > > > > > create > > > > > > > > > > > > > them in $HOME/.swift/sites (I'm not sure if > > > > > > > > > > > > > that's > > > > > > > > > > > > > what > > > > > > > > > > > > > you > > > > > > > > > > > > > mean > > > > > > > > > > > > > by > > > > > > > > > > > > > a local sites dir or not). But you are right > > > > > > > > > > > > > about > > > > > > > > > > > > > Midway > > > > > > > > > > > > > - > > > > > > > > > > > > > I > > > > > > > > > > > > > have > > > > > > > > > > > > > noticed that when using modis it will > > > > > > > > > > > > > sometimes > > > > > > > > > > > > > get > > > > > > > > > > > > > stuck > > > > > > > > > > > > > when > > > > > > > > > > > > > it > > > > > > > > > > > > > goes to a queue that is busy. Ideally swift > > > > > > > > > > > > > replication > > > > > > > > > > > > > would > > > > > > > > > > > > > be > > > > > > > > > > > > > able to help better handle that, but I > > > > > > > > > > > > > haven't > > > > > > > > > > > > > had > > > > > > > > > > > > > much > > > > > > > > > > > > > luck > > > > > > > > > > > > > with > > > > > > > > > > > > > that yet. Another way around this may be to > > > > > > > > > > > > > add > > > > > > > > > > > > > this > > > > > > > > > > > > > to > > > > > > > > > > > > > the > > > > > > > > > > > > > template: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > key="slurm.exclusive">false > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The swift.log issue was never fixed. It went > > > > > > > > > > > > > to > > > > > > > > > > > > > swift-devel > > > > > > > > > > > > > for > > > > > > > > > > > > > discussion but was never fixed. I think it is > > > > > > > > > > > > > relatively > > > > > > > > > > > > > simple > > > > > > > > > > > > > though.. probably worth fixing before > > > > > > > > > > > > > release. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > > > > > To: "David Kelly" > > > > > > > > > > > > > Sent: Saturday, March 9, 2013 1:38:47 PM > > > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > OK, sounds good re the trip plan. Feel free > > > > > > > > > > > > > to > > > > > > > > > > > > > stay > > > > > > > > > > > > > Tue > > > > > > > > > > > > > night > > > > > > > > > > > > > to > > > > > > > > > > > > > avoid a 4hr drive after a long day. > > > > > > > > > > > > > > > > > > > > > > > > > > Im trying the modis demo. > > > > > > > > > > > > > > > > > > > > > > > > > > I tried to create a local sites/ dir so I can > > > > > > > > > > > > > modify > > > > > > > > > > > > > the > > > > > > > > > > > > > sites > > > > > > > > > > > > > templates; thats not working for me either > > > > > > > > > > > > > yet. > > > > > > > > > > > > > > > > > > > > > > > > > > For midway, need to force to westmere or > > > > > > > > > > > > > sandyb > > > > > > > > > > > > > (but > > > > > > > > > > > > > not > > > > > > > > > > > > > both) > > > > > > > > > > > > > and > > > > > > > > > > > > > ensure 1-node jobs, because either queue can > > > > > > > > > > > > > get > > > > > > > > > > > > > filled > > > > > > > > > > > > > and > > > > > > > > > > > > > not > > > > > > > > > > > > > yield an idle node for a long time. maybe > > > > > > > > > > > > > need to > > > > > > > > > > > > > fiddle > > > > > > > > > > > > > jobsPerNode > > > > > > > > > > > > > to get at least 1 core when the system is > > > > > > > > > > > > > busy > > > > > > > > > > > > > and > > > > > > > > > > > > > *pretend* > > > > > > > > > > > > > that > > > > > > > > > > > > > its a node. > > > > > > > > > > > > > > > > > > > > > > > > > > So to get response I tried beagle-ssh; That > > > > > > > > > > > > > isnt > > > > > > > > > > > > > working > > > > > > > > > > > > > because > > > > > > > > > > > > > the > > > > > > > > > > > > > template sites file is wrong in swift 0.94 > > > > > > > > > > > > > rc4. > > > > > > > > > > > > > > > > > > > > > > > > > > I also see that swift.log is still getting > > > > > > > > > > > > > produced - > > > > > > > > > > > > > I > > > > > > > > > > > > > thought > > > > > > > > > > > > > we > > > > > > > > > > > > > eliminated that. Did it come back due to a > > > > > > > > > > > > > problem > > > > > > > > > > > > > with > > > > > > > > > > > > > that > > > > > > > > > > > > > fix? > > > > > > > > > > > > > > > > > > > > > > > > > > I'll keep hacking; suggestions welcome. > > > > > > > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > From: "David Kelly" > > > > > > > > > > > > > > > > > > > > > > > > > > > > To: "Michael Wilde" > > > > > > > > > > > > > > Sent: Saturday, March 9, 2013 12:20:00 PM > > > > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Mike, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Looking more closely at the agenda, I think > > > > > > > > > > > > > > the > > > > > > > > > > > > > > most > > > > > > > > > > > > > > interesting/useful talks will be on > > > > > > > > > > > > > > Tuesday. > > > > > > > > > > > > > > Monday > > > > > > > > > > > > > > I'll > > > > > > > > > > > > > > come > > > > > > > > > > > > > > to > > > > > > > > > > > > > > Argonne to work on any loose ends and put > > > > > > > > > > > > > > the > > > > > > > > > > > > > > finishing > > > > > > > > > > > > > > touches > > > > > > > > > > > > > > on > > > > > > > > > > > > > > any slides/runs/scripts, then drive to > > > > > > > > > > > > > > Indianapolis > > > > > > > > > > > > > > on > > > > > > > > > > > > > > Monday > > > > > > > > > > > > > > afternoon/evening. I have a hotel booked > > > > > > > > > > > > > > for > > > > > > > > > > > > > > Monday > > > > > > > > > > > > > > night. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'll do some runs using the routes we > > > > > > > > > > > > > > talked > > > > > > > > > > > > > > about. > > > > > > > > > > > > > > I'm > > > > > > > > > > > > > > pretty > > > > > > > > > > > > > > sure > > > > > > > > > > > > > > I > > > > > > > > > > > > > > have working configurations for everything > > > > > > > > > > > > > > we > > > > > > > > > > > > > > talked > > > > > > > > > > > > > > about, > > > > > > > > > > > > > > so > > > > > > > > > > > > > > I > > > > > > > > > > > > > > think it's really just a matter of plugging > > > > > > > > > > > > > > in > > > > > > > > > > > > > > the > > > > > > > > > > > > > > apps. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > David > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > > > > > > To: "David Kelly" > > > > > > > > > > > > > > Sent: Saturday, March 9, 2013 11:03:15 AM > > > > > > > > > > > > > > Subject: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi David, > > > > > > > > > > > > > > > > > > > > > > > > > > > > I just wanted to let you know that Im > > > > > > > > > > > > > > looking > > > > > > > > > > > > > > into > > > > > > > > > > > > > > the > > > > > > > > > > > > > > run > > > > > > > > > > > > > > options > > > > > > > > > > > > > > now. Im hoping to try a few... WIll see how > > > > > > > > > > > > > > much > > > > > > > > > > > > > > help > > > > > > > > > > > > > > I > > > > > > > > > > > > > > need. > > > > > > > > > > > > > > Have > > > > > > > > > > > > > > you decided on a driving time and made > > > > > > > > > > > > > > hotel > > > > > > > > > > > > > > arrangements? > > > > > > > > > > > > > > > > > > > > > > > > > > > > I would feel free to stay for whatever > > > > > > > > > > > > > > portion > > > > > > > > > > > > > > of > > > > > > > > > > > > > > the > > > > > > > > > > > > > > OSG > > > > > > > > > > > > > > meeting > > > > > > > > > > > > > > you > > > > > > > > > > > > > > feel is of value. The only thing I ask is > > > > > > > > > > > > > > that > > > > > > > > > > > > > > for > > > > > > > > > > > > > > Wed > > > > > > > > > > > > > > and > > > > > > > > > > > > > > Thu > > > > > > > > > > > > > > you > > > > > > > > > > > > > > stay available online for user-support or > > > > > > > > > > > > > > other > > > > > > > > > > > > > > assistance > > > > > > > > > > > > > > needs > > > > > > > > > > > > > > that come up here. And that you engage with > > > > > > > > > > > > > > people > > > > > > > > > > > > > > that > > > > > > > > > > > > > > can > > > > > > > > > > > > > > help > > > > > > > > > > > > > > us > > > > > > > > > > > > > > develop the Swift user community and > > > > > > > > > > > > > > reliable > > > > > > > > > > > > > > OSG > > > > > > > > > > > > > > usage. > > > > > > > > > > > > > > Rob, > > > > > > > > > > > > > > Marco, > > > > > > > > > > > > > > Lincoln, and Suchandra would be good to > > > > > > > > > > > > > > hang > > > > > > > > > > > > > > out > > > > > > > > > > > > > > with > > > > > > > > > > > > > > and > > > > > > > > > > > > > > they > > > > > > > > > > > > > > can > > > > > > > > > > > > > > introduce you to good contacts. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Of course we will cover your expenses via a > > > > > > > > > > > > > > UChicago > > > > > > > > > > > > > > travel > > > > > > > > > > > > > > expense > > > > > > > > > > > > > > report. > > > > > > > > > > > > > > > > > > > > > > > > > > > > We'll be starting a project with a tiny bit > > > > > > > > > > > > > > of > > > > > > > > > > > > > > additional > > > > > > > > > > > > > > ExTENCI > > > > > > > > > > > > > > funds to make Swift do smarter data > > > > > > > > > > > > > > management > > > > > > > > > > > > > > on > > > > > > > > > > > > > > OSG > > > > > > > > > > > > > > sites > > > > > > > > > > > > > > (and > > > > > > > > > > > > > > in > > > > > > > > > > > > > > general) so anything you learn about OSG > > > > > > > > > > > > > > storage > > > > > > > > > > > > > > elements/services/tools will be valuable > > > > > > > > > > > > > > for > > > > > > > > > > > > > > that > > > > > > > > > > > > > > (srmcp, > > > > > > > > > > > > > > lcgcp, > > > > > > > > > > > > > > etc). > > > > > > > > > > > > > > > > > > > > > > > > > > > > Between now and your talk, lets just focus > > > > > > > > > > > > > > on > > > > > > > > > > > > > > the > > > > > > > > > > > > > > talk, > > > > > > > > > > > > > > OK? > > > > > > > > > > > > > > Im > > > > > > > > > > > > > > hoping > > > > > > > > > > > > > > we have slides frozen by Monday. > > > > > > > > > > > > > > > > > > > > > > > > > > > > While I fiddle, if you could do catsn or > > > > > > > > > > > > > > other > > > > > > > > > > > > > > hello-world-like > > > > > > > > > > > > > > tests > > > > > > > > > > > > > > to cover the "routes" we discussed, that > > > > > > > > > > > > > > would > > > > > > > > > > > > > > pave > > > > > > > > > > > > > > the > > > > > > > > > > > > > > way > > > > > > > > > > > > > > for > > > > > > > > > > > > > > plugging in the real app examples. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Sound good? Let me know of any concerns > > > > > > > > > > > > > > (other > > > > > > > > > > > > > > than > > > > > > > > > > > > > > the > > > > > > > > > > > > > > fact > > > > > > > > > > > > > > that > > > > > > > > > > > > > > this is a tad rushed ;) > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks and regards, > > > > > > > > > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > Michael Wilde > > > > > > > > > > > > > > Computation Institute, University of > > > > > > > > > > > > > > Chicago > > > > > > > > > > > > > > Mathematics and Computer Science Division > > > > > > > > > > > > > > Argonne National Laboratory > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > > > > > > > From wilde at mcs.anl.gov Sun Mar 10 16:28:08 2013 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sun, 10 Mar 2013 16:28:08 -0500 (CDT) Subject: [Swift-devel] Swift reference manual and syntax specification In-Reply-To: Message-ID: <993449932.1291539.1362950888769.JavaMail.root@mcs.anl.gov> Hi All, We have an activity underway within the ExTENCI project to improve Swift's tutorial and documentation, in collaboration with Amy Apon (chair of Computer Science at Clemson and OSG collaborator) and Clemson graduate student Eric Skogen. I suggested that we use support questions on the swift-* email lists as one guide to where documentation needs improvement, citing a recent question from a new user who didn't understand what can go in an app() function. Amy recently suggested that we publish a syntax specification for Swift (which is a great idea and long overdue) and pointed out that the syntax spec could/would have made clear to this new user that you can't put an assignment in an app def. What I believe we need for Swift is a top-notch User Guide that can double as a hands-on tutorial, and a separate, concise but complete Reference Manual that precisely states language syntax and semantics, and can be used by both users (to answer questions not made clear in the User Guide) and developers (to discuss subtleties or proposed changes in semantics and to deal with implementation issues). While the User Guide needs to be organized in a reasonable order for exposing the language, the Reference Manual needs to be laid out more top-down (in terms of compilation units) and bottom-up (in terms of lexical issues). Eric, David, and Ketan are discussing and working on the User Guide. Ive asked Ketan to see if the informal BNF-like syntax of the classic K&R C book can be readily adapted to Swift, and also suggested that the concise compact style of the Swift/T User Guide is in fact more of a reference manual already, and could form the basis of a complete reference manual that ideally can serve both Swift/K and Swift/T. We'll need and be having a lot more discussion on both the User Guide and Reference Manual, but with this e-mail I wanted to move all such discussion to this list (swift-devel). - Mike ----- Original Message ----- > From: "Amy Apon" > To: "Michael Wilde" > Cc: "Eric Skogen" , "David Kelly" , "Ketan Maheshwari" > > Sent: Sunday, March 10, 2013 1:05:54 PM > Subject: Re: [Swift-user] Variable Declaration > > Mike, > > > This is a good idea -- looking at support questions to understand > where the "pain points" are. > > > This particular question is related to our conversation on the last > call. If we could come up with a grammar or grammar-like description > of Swift, then explaining the concept below (that an app() > function's body is restricted to contain a single command line > template and nothing else),would be explained with a tutorial about > the grammar. Are we still looking at this? > > > I would like to schedule another call that includes you, me, and > Eric, I think, to keep in touch. How does your time look on > Wednesday afternoon this week? > > > Amy > > > > > > Amy Apon, Ph.D. > Professor and Chair, Division of Computer Science, School of > Computing > Clemson University > 221 McAdams > Phone: 864-656-5769 > > > > > > > > > On Sun, Mar 10, 2013 at 12:30 PM, Michael Wilde < wilde at mcs.anl.gov > > wrote: > > > Eric, All, > > I think we can gain a lot of insight into documentation/training > needs by observing the questions users come with when first using > Swift, such as this one. > > - restrictions on what can be in an app() body, and how you code > around them > > - guidance on the use of types and type names > > In other words, we can look at support questions and ask "how can the > learning roadmap make such support questions less likely to happen". > > Mike > > ----- Forwarded Message ----- > From: "Tim Armstrong" < tim.g.armstrong at gmail.com > > To: "Michael Wilde" < wilde at mcs.anl.gov > > Sent: Saturday, March 9, 2013 5:56:39 PM > Subject: Re: [Swift-user] Variable Declaration > > What Mike said. I was also going to say that your line: > > type string; > > may cause problems: string is a built-in type and you don't need to > define it. > > - Tim > > > On Sat, Mar 9, 2013 at 5:55 PM, Michael Wilde < wilde at mcs.anl.gov > > wrote: > > > Jay, > > An app() function's body is restricted to contain a single command > line template and nothing else. So instead of: > > > app (messagefile t) parse(messagefile n) { > string v = @regexp("abcdefghi", "c(def)g","monkey"); > echo @extractint(n) stdout=@filename(t); > } > > ...you need create a separate ("compound") function to do things like > string v=(). You can embed arbitrary expressions in the app() body, > but it can only have one semicolon-terminated command. Thats the > likely cause of the syntax error. > > Also note that your statement "string v = ..." is creating a value > that as far as I can see is not used anywhere, so you may want to > re-examine that logic. > > - Mike > > > > ----- Original Message ----- > > From: "Jay Lee" < jlee734 at gmail.com > > > To: swift-user at ci.uchicago.edu > > Sent: Saturday, March 9, 2013 5:48:17 PM > > Subject: [Swift-user] Variable Declaration > > > > > > Hello, > > > > I just started with swift today, so excuse my lack of knowledge. I > > have the following code: > > > > type messagefile; > > type string; > > > > app (messagefile t) parse(messagefile n) { > > string v = @regexp("abcdefghi", "c(def)g","monkey"); > > echo @extractint(n) stdout=@filename(t); > > } > > > > app (messagefile t) greeting() { > > echo "Hello, world!" stdout=@filename(t); > > } > > > > messagefile outfile <"hello.txt">; > > messagefile input <"compile.txt">; > > > > outfile = parse(input); > > > > > > > > I get an error: Could not compile SwiftScript source: line 6:10: > > expecting a semicolon, found '=' > > > > I found that there are mappers that can be used to declare > > variables > > (namely files), but are these required? > > > > > > > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > From wilde at mcs.anl.gov Sun Mar 10 16:37:49 2013 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sun, 10 Mar 2013 16:37:49 -0500 (CDT) Subject: [Swift-devel] Cant get auto-coasters to run from midway to beagle In-Reply-To: <884747621.1285936.1362946853450.JavaMail.root@mcs.anl.gov> Message-ID: <1649997961.1291675.1362951469901.JavaMail.root@mcs.anl.gov> Mihael, it seems that the problem is still there under the current trunk - see below. This is in: midway:/home/wilde/osgdemo/modis/svn/run035. The "cog modified locally" is a hopefully inconsequential change in worker.pl where I open stdin to /dev/null rather than close it, before launching an app, to remedy an unrelated MPI problem. - Mike Swift trunk swift-r6362 cog-r3637 (cog modified locally) RunID: 20130310-2055-4lqjiftd Progress: time: Sun, 10 Mar 2013 20:55:52 +0000 Progress: time: Sun, 10 Mar 2013 20:56:06 +0000 Selecting site:269 Submitting:47 Submitted:1 Progress: time: Sun, 10 Mar 2013 20:56:12 +0000 Selecting site:269 Stage in:1 Submitted:47 Progress: time: Sun, 10 Mar 2013 20:56:16 +0000 Selecting site:269 Stage in:25 Submitted:23 Progress: time: Sun, 10 Mar 2013 20:56:22 +0000 Selecting site:269 Stage in:48 Progress: time: Sun, 10 Mar 2013 20:56:52 +0000 Selecting site:269 Stage in:48 Progress: time: Sun, 10 Mar 2013 20:57:22 +0000 Selecting site:269 Stage in:48 Progress: time: Sun, 10 Mar 2013 20:57:52 +0000 Selecting site:269 Stage in:48 Progress: time: Sun, 10 Mar 2013 20:58:19 +0000 Selecting site:269 Stage in:47 Active:1 Progress: time: Sun, 10 Mar 2013 20:58:20 +0000 Selecting site:269 Stage in:26 Active:22 Progress: time: Sun, 10 Mar 2013 20:58:22 +0000 Selecting site:269 Stage in:24 Active:24 Progress: time: Sun, 10 Mar 2013 20:58:24 +0000 Selecting site:269 Stage in:23 Active:25 Progress: time: Sun, 10 Mar 2013 20:58:26 +0000 Selecting site:269 Active:47 Stage out:1 Progress: time: Sun, 10 Mar 2013 20:58:27 +0000 Selecting site:260 Stage in:7 Submitting:1 Submitted:1 Active:39 Finished successfully:9 Progress: time: Sun, 10 Mar 2013 20:58:28 +0000 Selecting site:258 Stage in:9 Submitting:1 Submitted:1 Active:24 Stage out:13 Finished successfully:11 Progress: time: Sun, 10 Mar 2013 20:58:29 +0000 Selecting site:245 Stage in:23 Submitted:1 Active:24 Finished successfully:24 Progress: time: Sun, 10 Mar 2013 20:58:31 +0000 Selecting site:245 Stage in:24 Active:23 Stage out:1 Finished successfully:24 Progress: time: Sun, 10 Mar 2013 20:58:32 +0000 Selecting site:245 Stage in:24 Active:23 Finished successfully:25 Progress: time: Sun, 10 Mar 2013 20:58:34 +0000 Selecting site:244 Stage in:24 Submitting:1 Stage out:22 Finished successfully:26 Progress: time: Sun, 10 Mar 2013 20:58:35 +0000 Selecting site:221 Stage in:25 Submitting:22 Submitted:1 Finished successfully:48 Progress: time: Sun, 10 Mar 2013 20:58:52 +0000 Selecting site:221 Stage in:48 Finished successfully:48 Progress: time: Sun, 10 Mar 2013 20:59:22 +0000 Selecting site:221 Stage in:48 Finished successfully:48 Progress: time: Sun, 10 Mar 2013 20:59:52 +0000 Selecting site:221 Stage in:48 Finished successfully:48 Progress: time: Sun, 10 Mar 2013 20:59:56 +0000 Selecting site:221 Stage in:47 Active:1 Finished successfully:48 Progress: time: Sun, 10 Mar 2013 21:00:02 +0000 Selecting site:221 Stage in:47 Stage out:1 Finished successfully:48 Progress: time: Sun, 10 Mar 2013 21:00:05 +0000 Selecting site:221 Stage in:47 Finished successfully:49 Channels: {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], null at id://u-3bec3eab-13d5616c4bd--8000-u4684d136-13d5616c4d0--8000S=MetaChannel[service-60734] -> BufferingChannel, /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60734] -> BufferingChannel, null at id://u4684d136-13d5616c4d0--7fff-u-3bec3eab-13d5616c4bd--7fffC=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]} Context: service-60018 Meta context: service-60734 Progress: time: Sun, 10 Mar 2013 21:00:07 +0000 Selecting site:220 Stage in:47 Submitted:1 Finished successfully:49 Channels: {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], null at id://u-3bec3eab-13d5616c4bd--8000-u4684d136-13d5616c4d0--8000S=MetaChannel[service-60734] -> BufferingChannel, /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60734] -> BufferingChannel, null at id://u4684d136-13d5616c4d0--7fff-u-3bec3eab-13d5616c4bd--7fffC=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]} Context: service-60263 Meta context: service-60734 Channels: {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], null at id://u-3bec3eab-13d5616c4bd--8000-u4684d136-13d5616c4d0--8000S=MetaChannel[service-60734] -> BufferingChannel, /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60734] -> BufferingChannel, null at id://u4684d136-13d5616c4d0--7fff-u-3bec3eab-13d5616c4bd--7fffC=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]} Context: service-60408 Meta context: service-60734 Progress: time: Sun, 10 Mar 2013 21:00:18 +0000 Selecting site:220 Stage in:47 Active:1 Finished successfully:49 Progress: time: Sun, 10 Mar 2013 21:00:19 +0000 Selecting site:220 Stage in:46 Active:2 Finished successfully:49 Execution failed: Exception in getlanduse: Arguments: [home/wilde/osgdemo/modis/svn/data/modis/2002/h12v09.rgb] Host: beagle Directory: modis02-20130310-2055-4lqjiftd/jobs/y/getlanduse-yht64f6l Caused by: Shutting down worker getLandUse, modis02.swift, line 20 error null real 4m29.509s user 2m45.981s sys 0m3.520s ----- Original Message ----- > From: "Michael Wilde" > To: "Mihael Hategan" > Cc: "Swift Devel" > Sent: Sunday, March 10, 2013 3:20:53 PM > Subject: Re: [Swift-devel] Cant get auto-coasters to run from midway to beagle > > Duh. Thank you. I didn't build a new release, was using same 0.94 > RC4 code. > > Sorry about that. Will retest. > > - Mike > > > ----- Original Message ----- > > From: "Mihael Hategan" > > To: "Michael Wilde" > > Cc: "Swift Devel" > > Sent: Sunday, March 10, 2013 3:06:25 PM > > Subject: Re: [Swift-devel] Cant get auto-coasters to run from > > midway to beagle > > > > ChannelContext Notifying commands and handlers about exception > > org.globus.cog.karajan.workflow.service.TimeoutException: Channel > > timed > > out. lastTime=940817-071255.807, now=130310-164156.506, > > channel=GSSSChannel-1463847073(1)[service-60519] > > > > Are you sure you are running with the latest code? . There was a > > (inconsequential mostly) bug before that set lastTime to > > Long.MAX_TIME > > before creating that exception. That was fixed. Your message > > indicates > > the code you are using does not have that fix (year xx94 is what > > comes > > out of Long.MAX_TIME). > > > > I gotta go now, but I'll come back later and check some more. There > > is > > something weird going on there besides that. > > > > Mihael > > > > On Sun, 2013-03-10 at 12:01 -0500, Michael Wilde wrote: > > > Here's run034: seems to be a bit better, but still dies. This is > > > with throttle of 48 jobs on 48 cores (2 nodes), fom swift.rcc to > > > beagle. 17MB files. Still seems to curiously die about 4 mins > > > into the run, which suggests some kind of timeout is still > > > lurking??? > > > > > > - Mike > > > > > > Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified locally) > > > > > > RunID: 20130310-1639-kyb8hca9 > > > Progress: time: Sun, 10 Mar 2013 16:39:45 +0000 > > > Progress: time: Sun, 10 Mar 2013 16:39:56 +0000 Selecting > > > site:269 Submitting:47 Submitted:1 > > > Progress: time: Sun, 10 Mar 2013 16:40:01 +0000 Selecting > > > site:269 Stage in:1 Submitted:47 > > > Progress: time: Sun, 10 Mar 2013 16:40:15 +0000 Selecting > > > site:269 Stage in:48 > > > Progress: time: Sun, 10 Mar 2013 16:40:45 +0000 Selecting > > > site:269 Stage in:48 > > > Progress: time: Sun, 10 Mar 2013 16:41:15 +0000 Selecting > > > site:269 Stage in:48 > > > Progress: time: Sun, 10 Mar 2013 16:41:45 +0000 Selecting > > > site:269 Stage in:48 > > > Progress: time: Sun, 10 Mar 2013 16:42:11 +0000 Selecting > > > site:269 Stage in:47 Active:1 > > > Progress: time: Sun, 10 Mar 2013 16:42:12 +0000 Selecting > > > site:269 Stage in:41 Active:7 > > > Progress: time: Sun, 10 Mar 2013 16:42:13 +0000 Selecting > > > site:269 Stage in:23 Active:25 > > > Progress: time: Sun, 10 Mar 2013 16:42:15 +0000 Selecting > > > site:269 Active:48 > > > Progress: time: Sun, 10 Mar 2013 16:42:17 +0000 Selecting > > > site:269 Active:47 Stage out:1 > > > Progress: time: Sun, 10 Mar 2013 16:42:18 +0000 Selecting > > > site:268 Stage in:1 Active:46 Stage out:1 Finished > > > successfully:1 > > > Progress: time: Sun, 10 Mar 2013 16:42:19 +0000 Selecting > > > site:265 Stage in:3 Submitted:1 Active:42 Stage out:2 > > > Finished successfully:4 > > > Progress: time: Sun, 10 Mar 2013 16:42:20 +0000 Selecting > > > site:258 Stage in:6 Submitting:5 Active:23 Stage out:13 > > > Finished successfully:12 > > > Progress: time: Sun, 10 Mar 2013 16:42:21 +0000 Selecting > > > site:244 Stage in:24 Submitting:1 Active:20 Stage out:3 > > > Finished successfully:25 > > > Progress: time: Sun, 10 Mar 2013 16:42:23 +0000 Selecting > > > site:241 Stage in:25 Submitting:3 Stage out:19 Finished > > > successfully:29 > > > Progress: time: Sun, 10 Mar 2013 16:42:24 +0000 Selecting > > > site:221 Stage in:28 Submitting:19 Submitted:1 Finished > > > successfully:48 > > > Progress: time: Sun, 10 Mar 2013 16:42:45 +0000 Selecting > > > site:221 Stage in:48 Finished successfully:48 > > > Progress: time: Sun, 10 Mar 2013 16:42:54 +0000 Selecting > > > site:221 Stage in:47 Active:1 Finished successfully:48 > > > Progress: time: Sun, 10 Mar 2013 16:43:00 +0000 Selecting > > > site:221 Stage in:47 Stage out:1 Finished successfully:48 > > > Progress: time: Sun, 10 Mar 2013 16:43:02 +0000 Selecting > > > site:221 Stage in:47 Finished successfully:49 > > > Progress: time: Sun, 10 Mar 2013 16:43:05 +0000 Selecting > > > site:220 Stage in:47 Submitted:1 Finished successfully:49 > > > Progress: time: Sun, 10 Mar 2013 16:43:15 +0000 Selecting > > > site:220 Stage in:48 Finished successfully:49 > > > Progress: time: Sun, 10 Mar 2013 16:43:45 +0000 Selecting > > > site:220 Stage in:48 Finished successfully:49 > > > Channels: > > > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] > > > -> > > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60519] > > > -> BufferingChannel, > > > null at id://u7b315f9a-13d552c3f68--7fff-u-362f30fc-13d552c3f50--7fffC=MetaChannel[https://192.5.86.107:50000] > > > -> > > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > > null at id://u-362f30fc-13d552c3f50--8000-u7b315f9a-13d552c3f68--8000S=MetaChannel[service-60519] > > > -> BufferingChannel} > > > Context: service-60859 > > > Meta context: service-60519 > > > Progress: time: Sun, 10 Mar 2013 16:43:59 +0000 Selecting > > > site:220 Stage in:47 Active:1 Finished successfully:49 > > > Channels: > > > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] > > > -> > > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60519] > > > -> BufferingChannel, > > > null at id://u7b315f9a-13d552c3f68--7fff-u-362f30fc-13d552c3f50--7fffC=MetaChannel[https://192.5.86.107:50000] > > > -> > > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > > null at id://u-362f30fc-13d552c3f50--8000-u7b315f9a-13d552c3f68--8000S=MetaChannel[service-60519] > > > -> BufferingChannel} > > > Context: service-60663 > > > Meta context: service-60519 > > > Progress: time: Sun, 10 Mar 2013 16:44:05 +0000 Selecting > > > site:220 Stage in:47 Stage out:1 Finished successfully:49 > > > Progress: time: Sun, 10 Mar 2013 16:44:07 +0000 Selecting > > > site:220 Stage in:47 Finished successfully:50 > > > Channels: > > > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] > > > -> > > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60519] > > > -> BufferingChannel, > > > null at id://u7b315f9a-13d552c3f68--7fff-u-362f30fc-13d552c3f50--7fffC=MetaChannel[https://192.5.86.107:50000] > > > -> > > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > > null at id://u-362f30fc-13d552c3f50--8000-u7b315f9a-13d552c3f68--8000S=MetaChannel[service-60519] > > > -> BufferingChannel} > > > Context: service-60081 > > > Meta context: service-60519 > > > Progress: time: Sun, 10 Mar 2013 16:44:09 +0000 Selecting > > > site:219 Stage in:45 Submitting:1 Active:2 Finished > > > successfully:50 > > > Execution failed: > > > Exception in getlanduse: > > > Arguments: > > > [home/wilde/osgdemo/modis/svn/data/modis/2002/h02v11.rgb] > > > Host: beagle > > > Directory: > > > modis02-20130310-1639-kyb8hca9/jobs/9/getlanduse-90fyse6l > > > > > > Caused by: > > > Shutting down worker > > > getLandUse, modis02.swift, line 20 > > > error null > > > > > > real 4m27.007s > > > user 2m44.221s > > > sys 0m3.448s > > > + mv /home/wilde/.swift/runs/current/run034.1362933583 > > > /home/wilde/.swift/runs/completed > > > midway001$ > > > > > > > > > ----- Original Message ----- > > > > From: "Mihael Hategan" > > > > To: "Michael Wilde" > > > > Cc: "Swift Devel" > > > > Sent: Sunday, March 10, 2013 1:36:26 AM > > > > Subject: Re: [Swift-devel] Cant get auto-coasters to run from > > > > midway to beagle > > > > > > > > Please try now. I made some changes: > > > > > > > > 1. start the service with "-l" so that things in your .profile > > > > (such > > > > as > > > > module load sun-java) would be picked up. However, this also > > > > means > > > > that > > > > you should unset X509_* stuff or the sshcl proxy forwarding > > > > will > > > > not > > > > work properly. > > > > > > > > 2. I fixed a bug that caused an extra connection to the coaster > > > > service. > > > > Normally the service connects back to the client and both use > > > > that > > > > connection. However, due to some changes in the way credentials > > > > were > > > > set > > > > for jobs, and the fact that connections were looked up based on > > > > both > > > > hostname and credential, the coaster client would ignore the > > > > existing > > > > connection and create another one. The initial one with then > > > > time > > > > out > > > > at > > > > some point causing the service to crash. > > > > > > > > Mihael > > > > > > > > On Sat, 2013-03-09 at 17:49 -0600, Michael Wilde wrote: > > > > > An update on this provider staging related issue: reducing > > > > > filesize > > > > > from 17MB to 600KB runs well. > > > > > > > > > > So seems like some kind of flow control or buffer management > > > > > problem, possibly? > > > > > > > > > > May need to take that problem offline - would be a perfect > > > > > test > > > > > case for Yadu to develop a new stress test for. > > > > > > > > > > - Mike > > > > > > > > > > > > > > > ----- Forwarded Message ----- > > > > > From: "Michael Wilde" > > > > > To: "David Kelly" > > > > > Sent: Saturday, March 9, 2013 5:21:49 PM > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > OK, much better: with 600K files (5x5 reduction or 25X > > > > > smaller) > > > > > it > > > > > works well, and fast (form midway to beagle!) > > > > > > > > > > Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified locally) > > > > > > > > > > RunID: 20130309-2319-5zq0jrfg > > > > > Progress: time: Sat, 09 Mar 2013 23:19:45 +0000 > > > > > Progress: time: Sat, 09 Mar 2013 23:19:56 +0000 Selecting > > > > > site:269 Submitting:47 Submitted:1 > > > > > Progress: time: Sat, 09 Mar 2013 23:20:05 +0000 Selecting > > > > > site:269 Stage in:1 Submitted:47 > > > > > Progress: time: Sat, 09 Mar 2013 23:20:09 +0000 Selecting > > > > > site:269 Stage in:47 Active:1 > > > > > Progress: time: Sat, 09 Mar 2013 23:20:10 +0000 Selecting > > > > > site:269 Stage in:46 Active:1 Stage out:1 > > > > > Progress: time: Sat, 09 Mar 2013 23:20:11 +0000 Selecting > > > > > site:250 Stage in:19 Active:28 Stage out:1 Finished > > > > > successfully:19 > > > > > Progress: time: Sat, 09 Mar 2013 23:20:12 +0000 Selecting > > > > > site:229 Stage in:18 Submitting:21 Active:1 Stage out:7 > > > > > Finished successfully:41 > > > > > Progress: time: Sat, 09 Mar 2013 23:20:13 +0000 Selecting > > > > > site:220 Stage in:41 Submitting:1 Active:5 Stage out:1 > > > > > Finished successfully:49 > > > > > Progress: time: Sat, 09 Mar 2013 23:20:14 +0000 Selecting > > > > > site:220 Stage in:38 Active:1 Stage out:9 Finished > > > > > successfully:49 > > > > > Progress: time: Sat, 09 Mar 2013 23:20:15 +0000 Selecting > > > > > site:212 Stage in:30 Submitting:8 Stage out:9 Finished > > > > > successfully:58 > > > > > Progress: time: Sat, 09 Mar 2013 23:20:16 +0000 Selecting > > > > > site:203 Stage in:38 Submitting:8 Submitted:1 Finished > > > > > successfully:67 > > > > > Progress: time: Sat, 09 Mar 2013 23:20:18 +0000 Selecting > > > > > site:202 Stage in:19 Stage out:28 Finished successfully:68 > > > > > Progress: time: Sat, 09 Mar 2013 23:20:19 +0000 Selecting > > > > > site:172 Stage in:33 Submitting:2 Submitted:6 Active:5 > > > > > Stage > > > > > out:2 Finished successfully:97 > > > > > Progress: time: Sat, 09 Mar 2013 23:20:20 +0000 Selecting > > > > > site:170 Stage in:31 Submitting:2 Stage out:14 Finished > > > > > successfully:100 > > > > > Progress: time: Sat, 09 Mar 2013 23:20:21 +0000 Selecting > > > > > site:162 Stage in:30 Submitting:10 Stage out:6 Finished > > > > > successfully:109 > > > > > Progress: time: Sat, 09 Mar 2013 23:20:22 +0000 Selecting > > > > > site:154 Stage in:39 Submitting:5 Submitted:3 Active:1 > > > > > Finished successfully:115 > > > > > Progress: time: Sat, 09 Mar 2013 23:20:23 +0000 Selecting > > > > > site:154 Stage in:21 Active:10 Stage out:16 Finished > > > > > successfully:116 > > > > > Progress: time: Sat, 09 Mar 2013 23:20:24 +0000 Selecting > > > > > site:126 Stage in:20 Submitting:25 Submitted:1 Stage > > > > > out:2 > > > > > Finished successfully:143 > > > > > Progress: time: Sat, 09 Mar 2013 23:20:25 +0000 Selecting > > > > > site:124 Stage in:31 Active:2 Stage out:15 Finished > > > > > successfully:145 > > > > > Progress: time: Sat, 09 Mar 2013 23:20:26 +0000 Selecting > > > > > site:110 Stage in:30 Submitting:14 Stage out:3 Finished > > > > > successfully:160 > > > > > Progress: time: Sat, 09 Mar 2013 23:20:27 +0000 Selecting > > > > > site:106 Stage in:43 Submitting:1 Submitted:1 Active:1 > > > > > Stage > > > > > out:2 Finished successfully:163 > > > > > Progress: time: Sat, 09 Mar 2013 23:20:28 +0000 Selecting > > > > > site:104 Stage in:20 Submitting:2 Active:7 Stage out:19 > > > > > Finished successfully:165 > > > > > Progress: time: Sat, 09 Mar 2013 23:20:29 +0000 Selecting > > > > > site:78 > > > > > Stage in:29 Submitting:16 Submitted:1 Stage out:2 > > > > > Finished > > > > > successfully:191 > > > > > Progress: time: Sat, 09 Mar 2013 23:20:31 +0000 Selecting > > > > > site:76 > > > > > Stage in:30 Stage out:17 Finished successfully:194 > > > > > Progress: time: Sat, 09 Mar 2013 23:20:32 +0000 Selecting > > > > > site:58 > > > > > Stage in:29 Submitting:18 Active:1 Finished > > > > > successfully:211 > > > > > Progress: time: Sat, 09 Mar 2013 23:20:33 +0000 Selecting > > > > > site:58 > > > > > Stage in:33 Active:3 Stage out:12 Finished > > > > > successfully:211 > > > > > Progress: time: Sat, 09 Mar 2013 23:20:34 +0000 Selecting > > > > > site:46 > > > > > Stage in:18 Submitting:11 Submitted:1 Active:2 Stage > > > > > out:14 > > > > > Finished successfully:225 > > > > > Progress: time: Sat, 09 Mar 2013 23:20:35 +0000 Selecting > > > > > site:30 > > > > > Stage in:29 Active:14 Stage out:3 Finished > > > > > successfully:241 > > > > > Progress: time: Sat, 09 Mar 2013 23:20:36 +0000 Selecting > > > > > site:28 > > > > > Stage in:28 Submitting:2 Stage out:17 Finished > > > > > successfully:242 > > > > > Progress: time: Sat, 09 Mar 2013 23:20:37 +0000 Selecting > > > > > site:10 > > > > > Stage in:30 Submitting:17 Submitted:1 Finished > > > > > successfully:259 > > > > > Progress: time: Sat, 09 Mar 2013 23:20:38 +0000 Selecting > > > > > site:10 > > > > > Stage in:35 Stage out:13 Finished successfully:259 > > > > > Progress: time: Sat, 09 Mar 2013 23:20:39 +0000 Stage in:21 > > > > > Submitting:6 Submitted:3 Stage out:15 Finished > > > > > successfully:272 > > > > > Progress: time: Sat, 09 Mar 2013 23:20:40 +0000 Stage in:10 > > > > > Active:5 Stage out:14 Finished successfully:288 > > > > > Final status: Sat, 09 Mar 2013 23:20:41 +0000 Finished > > > > > successfully:317 > > > > > > > > > > real 0m58.953s > > > > > user 0m32.573s > > > > > sys 0m1.263s > > > > > + mv /home/wilde/.swift/runs/current/run029.1362871183 > > > > > /home/wilde/.swift/runs/completed > > > > > midway001$ > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > From: "David Kelly" > > > > > > To: "Michael Wilde" > > > > > > Sent: Saturday, March 9, 2013 5:12:59 PM > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > Yep - I had a version where the input files were in a very > > > > > > similar > > > > > > format (PGM, 1 byte per pixel). I'll add that back, but > > > > > > without > > > > > > the > > > > > > small PGM header in the files. > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > To: "David Kelly" > > > > > > Sent: Saturday, March 9, 2013 5:04:43 PM > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > I think we need to cut down the size of these files for a > > > > > > demo > > > > > > (although they are great for a stress test). > > > > > > > > > > > > First, the RGB format by itself uses 3 bytes per pixel when > > > > > > it > > > > > > only > > > > > > needs one (for land use) > > > > > > > > > > > > Second, we should cut down by a factor of 9 (3x3) or 16 > > > > > > (4x4). > > > > > > > > > > > > I tried that using simple convert statements, but it always > > > > > > seems > > > > > > to > > > > > > yield a file exactly double what it should be. > > > > > > > > > > > > More on this later; was hoping to get things working "as > > > > > > is" > > > > > > first. > > > > > > > > > > > > I assume you could get the perl code to work on > > > > > > one-byte-per-pixel > > > > > > instead of the default 3 for the convert rgb format? > > > > > > > > > > > > - Mike > > > > > > > > > > > > ----- Original Message ----- > > > > > > > From: "David Kelly" > > > > > > > To: "Michael Wilde" > > > > > > > Sent: Saturday, March 9, 2013 4:36:30 PM > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > That would probably be a good idea for a new script, to > > > > > > > show > > > > > > > how to > > > > > > > stage apps like that. For now I updated the scripts on > > > > > > > lustre.. > > > > > > > hopefully that helps. > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > To: "David Kelly" > > > > > > > Sent: Saturday, March 9, 2013 4:29:14 PM > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > OK, I see that its trying to run getlanduse.sh from your > > > > > > > /lustre > > > > > > > dir > > > > > > > on beagle, which is different than the one Ive got > > > > > > > checked > > > > > > > out. > > > > > > > It > > > > > > > seems to get an error in a stderr redirect??? Let me se > > > > > > > what I > > > > > > > need > > > > > > > to do to get the beagle side in sync. > > > > > > > > > > > > > > Seems like since these are perl scripts, we should make > > > > > > > the > > > > > > > app() > > > > > > > /bin/sh and send the script as data, perhaps? > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > From: "Michael Wilde" > > > > > > > > To: "David Kelly" > > > > > > > > Sent: Saturday, March 9, 2013 4:19:31 PM > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > OK, making progress. Now I dialed down the throttle and > > > > > > > > node > > > > > > > > counts > > > > > > > > to 48 jobs. > > > > > > > > > > > > > > > > Now I get further, for ./demo and site=4 script=2: > > > > > > > > > > > > > > > > RunID: 20130309-2214-1oi3rvea > > > > > > > > Progress: time: Sat, 09 Mar 2013 22:14:06 +0000 > > > > > > > > Progress: time: Sat, 09 Mar 2013 22:14:17 +0000 > > > > > > > > Selecting > > > > > > > > site:269 > > > > > > > > Submitting:47 Submitted:1 > > > > > > > > Progress: time: Sat, 09 Mar 2013 22:14:22 +0000 > > > > > > > > Selecting > > > > > > > > site:269 > > > > > > > > Stage in:1 Submitted:47 > > > > > > > > Progress: time: Sat, 09 Mar 2013 22:14:28 +0000 > > > > > > > > Selecting > > > > > > > > site:269 > > > > > > > > Stage in:25 Submitted:23 > > > > > > > > Progress: time: Sat, 09 Mar 2013 22:14:36 +0000 > > > > > > > > Selecting > > > > > > > > site:269 > > > > > > > > Stage in:48 > > > > > > > > Progress: time: Sat, 09 Mar 2013 22:15:06 +0000 > > > > > > > > Selecting > > > > > > > > site:269 > > > > > > > > Stage in:48 > > > > > > > > Progress: time: Sat, 09 Mar 2013 22:15:36 +0000 > > > > > > > > Selecting > > > > > > > > site:269 > > > > > > > > Stage in:48 > > > > > > > > Progress: time: Sat, 09 Mar 2013 22:16:06 +0000 > > > > > > > > Selecting > > > > > > > > site:269 > > > > > > > > Stage in:48 > > > > > > > > Progress: time: Sat, 09 Mar 2013 22:16:26 +0000 > > > > > > > > Selecting > > > > > > > > site:269 > > > > > > > > Stage in:47 Active:1 > > > > > > > > Progress: time: Sat, 09 Mar 2013 22:16:27 +0000 > > > > > > > > Selecting > > > > > > > > site:269 > > > > > > > > Stage in:36 Active:12 > > > > > > > > Progress: time: Sat, 09 Mar 2013 22:16:29 +0000 > > > > > > > > Selecting > > > > > > > > site:269 > > > > > > > > Stage in:24 Active:24 > > > > > > > > Progress: time: Sat, 09 Mar 2013 22:16:34 +0000 > > > > > > > > Selecting > > > > > > > > site:269 > > > > > > > > Stage in:24 Active:23 Stage out:1 > > > > > > > > Progress: time: Sat, 09 Mar 2013 22:16:35 +0000 > > > > > > > > Selecting > > > > > > > > site:269 > > > > > > > > Stage in:14 Active:33 Stage out:1 > > > > > > > > Execution failed: > > > > > > > > Exception in getlanduse: > > > > > > > > Arguments: > > > > > > > > [home/wilde/osgdemo/modis/svn/data/modis/2002/h08v04.rgb] > > > > > > > > Host: beagle > > > > > > > > Directory: > > > > > > > > modis02-20130309-2214-1oi3rvea/jobs/k/getlanduse-ko5qjd6l > > > > > > > > > > > > > > > > Caused by: > > > > > > > > Application > > > > > > > > /lustre/beagle/davidk/modis/bin/getlanduse.sh > > > > > > > > failed > > > > > > > > with an exit code of 1 > > > > > > > > getLandUse, modis02.swift, line 20 > > > > > > > > > > > > > > > > real 2m31.463s > > > > > > > > user 1m33.238s > > > > > > > > sys 0m2.160s > > > > > > > > + mv /home/wilde/.swift/runs/current/run024.1362867244 > > > > > > > > /home/wilde/.swift/runs/completed > > > > > > > > midway001$ > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > From: "David Kelly" > > > > > > > > > To: "Michael Wilde" > > > > > > > > > Sent: Saturday, March 9, 2013 3:55:30 PM > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > ok, I'll take a look at that. The run dir I used was > > > > > > > > > /scratch/midway/davidkelly999/modis/run011 > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > To: "David Kelly" > > > > > > > > > Sent: Saturday, March 9, 2013 3:52:28 PM > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > I just tried this, but didnt work - same prob. > > > > > > > > > > > > > > > > > > But if its working for you now, we must be close. > > > > > > > > > > > > > > > > > > Not yet sure what the diff is... > > > > > > > > > > > > > > > > > > My run dir is /home/wilde/osgdemo/modis/svn/run021 > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > From: "David Kelly" > > > > > > > > > > To: "Michael Wilde" > > > > > > > > > > Sent: Saturday, March 9, 2013 3:46:13 PM > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Had to make sure I was using the IP address on eth4 > > > > > > > > > > (128.135.112.71 > > > > > > > > > > for midway-login1), not a local address or an > > > > > > > > > > infiniband > > > > > > > > > > address. > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "David Kelly" > > > > > > > > > > To: "Michael Wilde" > > > > > > > > > > Sent: Saturday, March 9, 2013 3:43:51 PM > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I just got it working. I had to adjust for the > > > > > > > > > > differences in > > > > > > > > > > my > > > > > > > > > > username on Beagle/Midway, then I had to set > > > > > > > > > > GLOBUS_HOSTNAME > > > > > > > > > > on > > > > > > > > > > Midway to the IP address, rather than the full > > > > > > > > > > hostname > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > > To: "David Kelly" > > > > > > > > > > Sent: Saturday, March 9, 2013 3:40:03 PM > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > From: "David Kelly" > > > > > > > > > > > To: "Michael Wilde" > > > > > > > > > > > Sent: Saturday, March 9, 2013 3:34:58 PM > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Is your username the same on beagle and midway? > > > > > > > > > > > > > > > > > > > > Yes. And I verified that I can ssh to login4 on > > > > > > > > > > beagle > > > > > > > > > > from > > > > > > > > > > my > > > > > > > > > > midway > > > > > > > > > > session (as indeed the scp's of the proxy files > > > > > > > > > > seem > > > > > > > > > > to > > > > > > > > > > be > > > > > > > > > > working) > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > > > To: "David Kelly" > > > > > > > > > > > Sent: Saturday, March 9, 2013 3:34:28 PM > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > OK. > > > > > > > > > > > > > > > > > > > > > > Ignore what I said about "problem finding java" - > > > > > > > > > > > thats > > > > > > > > > > > code > > > > > > > > > > > in > > > > > > > > > > > the > > > > > > > > > > > very long escaped shell command that gets sent to > > > > > > > > > > > the > > > > > > > > > > > remote > > > > > > > > > > > side. > > > > > > > > > > > I > > > > > > > > > > > dont *think* thats the problem. > > > > > > > > > > > > > > > > > > > > > > I also verified that beagle can connect to ports > > > > > > > > > > > 50001 > > > > > > > > > > > etc > > > > > > > > > > > on > > > > > > > > > > > swift.rcc, and that seems OK. > > > > > > > > > > > > > > > > > > > > > > I exported > > > > > > > > > > > GLOBUS_HOSTNAME=midway001.rcc.uchicago.edu > > > > > > > > > > > on > > > > > > > > > > > the > > > > > > > > > > > midway > > > > > > > > > > > side. And the beagle side seems to be connecting > > > > > > > > > > > there. > > > > > > > > > > > > > > > > > > > > > > Im a bit confused about the timestamps I see for > > > > > > > > > > > the > > > > > > > > > > > proxy > > > > > > > > > > > expiration > > > > > > > > > > > time, but am not yet suspicious of that (although > > > > > > > > > > > it > > > > > > > > > > > seems > > > > > > > > > > > less > > > > > > > > > > > than > > > > > > > > > > > 5 hours past GMT... not sure.) > > > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > From: "David Kelly" > > > > > > > > > > > > To: "Michael Wilde" > > > > > > > > > > > > Sent: Saturday, March 9, 2013 3:26:32 PM > > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'm seeing the same error now.. looking into it > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > > > > To: "David Kelly" > > > > > > > > > > > > Sent: Saturday, March 9, 2013 3:21:30 PM > > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > Looking deeper I see that the logs show > > > > > > > > > > > > problems > > > > > > > > > > > > with > > > > > > > > > > > > finding > > > > > > > > > > > > Java, > > > > > > > > > > > > I > > > > > > > > > > > > assume on beagle, ans also service ending > > > > > > > > > > > > (presumably > > > > > > > > > > > > coaster > > > > > > > > > > > > service on midway host). > > > > > > > > > > > > > > > > > > > > > > > > I'll dig into these two. > > > > > > > > > > > > > > > > > > > > > > > > I see that it scp's the proxies to beagle which > > > > > > > > > > > > I > > > > > > > > > > > > think > > > > > > > > > > > > answers > > > > > > > > > > > > my > > > > > > > > > > > > question about security. > > > > > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > > > > > To: "David Kelly" > > > > > > > > > > > > > Sent: Saturday, March 9, 2013 3:15:01 PM > > > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > OK. Any thoughts about beagle? > > > > > > > > > > > > > > > > > > > > > > > > > > Ive been experimenting but still cant get it > > > > > > > > > > > > > to > > > > > > > > > > > > > work, > > > > > > > > > > > > > same > > > > > > > > > > > > > error > > > > > > > > > > > > > (cant connect to bootstrap port) > > > > > > > > > > > > > > > > > > > > > > > > > > WHen you tried ssh-cl to beagle with > > > > > > > > > > > > > automatic > > > > > > > > > > > > > coasters, > > > > > > > > > > > > > what > > > > > > > > > > > > > configuration (sites env etc) did you use? > > > > > > > > > > > > > > > > > > > > > > > > > > I verified that beagle can connect back to > > > > > > > > > > > > > the > > > > > > > > > > > > > midway > > > > > > > > > > > > > hosts > > > > > > > > > > > > > and > > > > > > > > > > > > > ports. > > > > > > > > > > > > > > > > > > > > > > > > > > Do we need to specify security or create a > > > > > > > > > > > > > proxy > > > > > > > > > > > > > etc? > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > From: "David Kelly" > > > > > > > > > > > > > > > > > > > > > > > > > > > > To: "Michael Wilde" > > > > > > > > > > > > > > Sent: Saturday, March 9, 2013 3:08:58 PM > > > > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > One way you can override/customize the > > > > > > > > > > > > > > default > > > > > > > > > > > > > > templates > > > > > > > > > > > > > > is > > > > > > > > > > > > > > to > > > > > > > > > > > > > > create > > > > > > > > > > > > > > them in $HOME/.swift/sites (I'm not sure if > > > > > > > > > > > > > > that's > > > > > > > > > > > > > > what > > > > > > > > > > > > > > you > > > > > > > > > > > > > > mean > > > > > > > > > > > > > > by > > > > > > > > > > > > > > a local sites dir or not). But you are > > > > > > > > > > > > > > right > > > > > > > > > > > > > > about > > > > > > > > > > > > > > Midway > > > > > > > > > > > > > > - > > > > > > > > > > > > > > I > > > > > > > > > > > > > > have > > > > > > > > > > > > > > noticed that when using modis it will > > > > > > > > > > > > > > sometimes > > > > > > > > > > > > > > get > > > > > > > > > > > > > > stuck > > > > > > > > > > > > > > when > > > > > > > > > > > > > > it > > > > > > > > > > > > > > goes to a queue that is busy. Ideally swift > > > > > > > > > > > > > > replication > > > > > > > > > > > > > > would > > > > > > > > > > > > > > be > > > > > > > > > > > > > > able to help better handle that, but I > > > > > > > > > > > > > > haven't > > > > > > > > > > > > > > had > > > > > > > > > > > > > > much > > > > > > > > > > > > > > luck > > > > > > > > > > > > > > with > > > > > > > > > > > > > > that yet. Another way around this may be to > > > > > > > > > > > > > > add > > > > > > > > > > > > > > this > > > > > > > > > > > > > > to > > > > > > > > > > > > > > the > > > > > > > > > > > > > > template: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > key="slurm.exclusive">false > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The swift.log issue was never fixed. It > > > > > > > > > > > > > > went > > > > > > > > > > > > > > to > > > > > > > > > > > > > > swift-devel > > > > > > > > > > > > > > for > > > > > > > > > > > > > > discussion but was never fixed. I think it > > > > > > > > > > > > > > is > > > > > > > > > > > > > > relatively > > > > > > > > > > > > > > simple > > > > > > > > > > > > > > though.. probably worth fixing before > > > > > > > > > > > > > > release. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > > > > > > To: "David Kelly" > > > > > > > > > > > > > > Sent: Saturday, March 9, 2013 1:38:47 PM > > > > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > OK, sounds good re the trip plan. Feel free > > > > > > > > > > > > > > to > > > > > > > > > > > > > > stay > > > > > > > > > > > > > > Tue > > > > > > > > > > > > > > night > > > > > > > > > > > > > > to > > > > > > > > > > > > > > avoid a 4hr drive after a long day. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Im trying the modis demo. > > > > > > > > > > > > > > > > > > > > > > > > > > > > I tried to create a local sites/ dir so I > > > > > > > > > > > > > > can > > > > > > > > > > > > > > modify > > > > > > > > > > > > > > the > > > > > > > > > > > > > > sites > > > > > > > > > > > > > > templates; thats not working for me either > > > > > > > > > > > > > > yet. > > > > > > > > > > > > > > > > > > > > > > > > > > > > For midway, need to force to westmere or > > > > > > > > > > > > > > sandyb > > > > > > > > > > > > > > (but > > > > > > > > > > > > > > not > > > > > > > > > > > > > > both) > > > > > > > > > > > > > > and > > > > > > > > > > > > > > ensure 1-node jobs, because either queue > > > > > > > > > > > > > > can > > > > > > > > > > > > > > get > > > > > > > > > > > > > > filled > > > > > > > > > > > > > > and > > > > > > > > > > > > > > not > > > > > > > > > > > > > > yield an idle node for a long time. maybe > > > > > > > > > > > > > > need to > > > > > > > > > > > > > > fiddle > > > > > > > > > > > > > > jobsPerNode > > > > > > > > > > > > > > to get at least 1 core when the system is > > > > > > > > > > > > > > busy > > > > > > > > > > > > > > and > > > > > > > > > > > > > > *pretend* > > > > > > > > > > > > > > that > > > > > > > > > > > > > > its a node. > > > > > > > > > > > > > > > > > > > > > > > > > > > > So to get response I tried beagle-ssh; That > > > > > > > > > > > > > > isnt > > > > > > > > > > > > > > working > > > > > > > > > > > > > > because > > > > > > > > > > > > > > the > > > > > > > > > > > > > > template sites file is wrong in swift 0.94 > > > > > > > > > > > > > > rc4. > > > > > > > > > > > > > > > > > > > > > > > > > > > > I also see that swift.log is still getting > > > > > > > > > > > > > > produced - > > > > > > > > > > > > > > I > > > > > > > > > > > > > > thought > > > > > > > > > > > > > > we > > > > > > > > > > > > > > eliminated that. Did it come back due to a > > > > > > > > > > > > > > problem > > > > > > > > > > > > > > with > > > > > > > > > > > > > > that > > > > > > > > > > > > > > fix? > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'll keep hacking; suggestions welcome. > > > > > > > > > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > From: "David Kelly" > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > To: "Michael Wilde" > > > > > > > > > > > > > > > Sent: Saturday, March 9, 2013 12:20:00 PM > > > > > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Mike, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Looking more closely at the agenda, I > > > > > > > > > > > > > > > think > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > most > > > > > > > > > > > > > > > interesting/useful talks will be on > > > > > > > > > > > > > > > Tuesday. > > > > > > > > > > > > > > > Monday > > > > > > > > > > > > > > > I'll > > > > > > > > > > > > > > > come > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > Argonne to work on any loose ends and put > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > finishing > > > > > > > > > > > > > > > touches > > > > > > > > > > > > > > > on > > > > > > > > > > > > > > > any slides/runs/scripts, then drive to > > > > > > > > > > > > > > > Indianapolis > > > > > > > > > > > > > > > on > > > > > > > > > > > > > > > Monday > > > > > > > > > > > > > > > afternoon/evening. I have a hotel booked > > > > > > > > > > > > > > > for > > > > > > > > > > > > > > > Monday > > > > > > > > > > > > > > > night. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'll do some runs using the routes we > > > > > > > > > > > > > > > talked > > > > > > > > > > > > > > > about. > > > > > > > > > > > > > > > I'm > > > > > > > > > > > > > > > pretty > > > > > > > > > > > > > > > sure > > > > > > > > > > > > > > > I > > > > > > > > > > > > > > > have working configurations for > > > > > > > > > > > > > > > everything > > > > > > > > > > > > > > > we > > > > > > > > > > > > > > > talked > > > > > > > > > > > > > > > about, > > > > > > > > > > > > > > > so > > > > > > > > > > > > > > > I > > > > > > > > > > > > > > > think it's really just a matter of > > > > > > > > > > > > > > > plugging > > > > > > > > > > > > > > > in > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > apps. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > David > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > > > > > > > To: "David Kelly" > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Sent: Saturday, March 9, 2013 11:03:15 AM > > > > > > > > > > > > > > > Subject: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi David, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I just wanted to let you know that Im > > > > > > > > > > > > > > > looking > > > > > > > > > > > > > > > into > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > run > > > > > > > > > > > > > > > options > > > > > > > > > > > > > > > now. Im hoping to try a few... WIll see > > > > > > > > > > > > > > > how > > > > > > > > > > > > > > > much > > > > > > > > > > > > > > > help > > > > > > > > > > > > > > > I > > > > > > > > > > > > > > > need. > > > > > > > > > > > > > > > Have > > > > > > > > > > > > > > > you decided on a driving time and made > > > > > > > > > > > > > > > hotel > > > > > > > > > > > > > > > arrangements? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I would feel free to stay for whatever > > > > > > > > > > > > > > > portion > > > > > > > > > > > > > > > of > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > OSG > > > > > > > > > > > > > > > meeting > > > > > > > > > > > > > > > you > > > > > > > > > > > > > > > feel is of value. The only thing I ask is > > > > > > > > > > > > > > > that > > > > > > > > > > > > > > > for > > > > > > > > > > > > > > > Wed > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > Thu > > > > > > > > > > > > > > > you > > > > > > > > > > > > > > > stay available online for user-support or > > > > > > > > > > > > > > > other > > > > > > > > > > > > > > > assistance > > > > > > > > > > > > > > > needs > > > > > > > > > > > > > > > that come up here. And that you engage > > > > > > > > > > > > > > > with > > > > > > > > > > > > > > > people > > > > > > > > > > > > > > > that > > > > > > > > > > > > > > > can > > > > > > > > > > > > > > > help > > > > > > > > > > > > > > > us > > > > > > > > > > > > > > > develop the Swift user community and > > > > > > > > > > > > > > > reliable > > > > > > > > > > > > > > > OSG > > > > > > > > > > > > > > > usage. > > > > > > > > > > > > > > > Rob, > > > > > > > > > > > > > > > Marco, > > > > > > > > > > > > > > > Lincoln, and Suchandra would be good to > > > > > > > > > > > > > > > hang > > > > > > > > > > > > > > > out > > > > > > > > > > > > > > > with > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > they > > > > > > > > > > > > > > > can > > > > > > > > > > > > > > > introduce you to good contacts. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Of course we will cover your expenses via > > > > > > > > > > > > > > > a > > > > > > > > > > > > > > > UChicago > > > > > > > > > > > > > > > travel > > > > > > > > > > > > > > > expense > > > > > > > > > > > > > > > report. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > We'll be starting a project with a tiny > > > > > > > > > > > > > > > bit > > > > > > > > > > > > > > > of > > > > > > > > > > > > > > > additional > > > > > > > > > > > > > > > ExTENCI > > > > > > > > > > > > > > > funds to make Swift do smarter data > > > > > > > > > > > > > > > management > > > > > > > > > > > > > > > on > > > > > > > > > > > > > > > OSG > > > > > > > > > > > > > > > sites > > > > > > > > > > > > > > > (and > > > > > > > > > > > > > > > in > > > > > > > > > > > > > > > general) so anything you learn about OSG > > > > > > > > > > > > > > > storage > > > > > > > > > > > > > > > elements/services/tools will be valuable > > > > > > > > > > > > > > > for > > > > > > > > > > > > > > > that > > > > > > > > > > > > > > > (srmcp, > > > > > > > > > > > > > > > lcgcp, > > > > > > > > > > > > > > > etc). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Between now and your talk, lets just > > > > > > > > > > > > > > > focus > > > > > > > > > > > > > > > on > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > talk, > > > > > > > > > > > > > > > OK? > > > > > > > > > > > > > > > Im > > > > > > > > > > > > > > > hoping > > > > > > > > > > > > > > > we have slides frozen by Monday. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > While I fiddle, if you could do catsn or > > > > > > > > > > > > > > > other > > > > > > > > > > > > > > > hello-world-like > > > > > > > > > > > > > > > tests > > > > > > > > > > > > > > > to cover the "routes" we discussed, that > > > > > > > > > > > > > > > would > > > > > > > > > > > > > > > pave > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > way > > > > > > > > > > > > > > > for > > > > > > > > > > > > > > > plugging in the real app examples. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Sound good? Let me know of any concerns > > > > > > > > > > > > > > > (other > > > > > > > > > > > > > > > than > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > fact > > > > > > > > > > > > > > > that > > > > > > > > > > > > > > > this is a tad rushed ;) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks and regards, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > Michael Wilde > > > > > > > > > > > > > > > Computation Institute, University of > > > > > > > > > > > > > > > Chicago > > > > > > > > > > > > > > > Mathematics and Computer Science Division > > > > > > > > > > > > > > > Argonne National Laboratory > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > Swift-devel mailing list > > > > > Swift-devel at ci.uchicago.edu > > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > From tim.g.armstrong at gmail.com Sun Mar 10 18:07:56 2013 From: tim.g.armstrong at gmail.com (Tim Armstrong) Date: Sun, 10 Mar 2013 18:07:56 -0500 Subject: [Swift-devel] Swift reference manual and syntax specification In-Reply-To: <993449932.1291539.1362950888769.JavaMail.root@mcs.anl.gov> References: <993449932.1291539.1362950888769.JavaMail.root@mcs.anl.gov> Message-ID: Sounds good to me. I've been thinking, for other reasons, about doing a tech report on Swift/T to serve as a reference for academic purposes, since we don't have anything we can readily cite. This would include formal semantics for Swift/T. Aside from formal specifications, I think one major gap is concrete examples that illuminate the trickier points of semantics. I personally find well-constructed examples more helpful for most purposes than formal specifications, but I think that may come down to different learning styles. E.g. one example that I think is enlightening is the one below, which prints 10, 10 times. Really emphasises the point that Swift is declarative rather then imperative. int A[]; foreach i in [1:10] { A[i] = i; trace(size(A)); } - Tim On Sun, Mar 10, 2013 at 4:28 PM, Michael Wilde wrote: > Hi All, > > We have an activity underway within the ExTENCI project to improve Swift's > tutorial and documentation, in collaboration with Amy Apon (chair of > Computer Science at Clemson and OSG collaborator) and Clemson graduate > student Eric Skogen. > > I suggested that we use support questions on the swift-* email lists as > one guide to where documentation needs improvement, citing a recent > question from a new user who didn't understand what can go in an app() > function. > > Amy recently suggested that we publish a syntax specification for Swift > (which is a great idea and long overdue) and pointed out that the syntax > spec could/would have made clear to this new user that you can't put an > assignment in an app def. > > What I believe we need for Swift is a top-notch User Guide that can double > as a hands-on tutorial, and a separate, concise but complete Reference > Manual that precisely states language syntax and semantics, and can be used > by both users (to answer questions not made clear in the User Guide) and > developers (to discuss subtleties or proposed changes in semantics and to > deal with implementation issues). > > While the User Guide needs to be organized in a reasonable order for > exposing the language, the Reference Manual needs to be laid out more > top-down (in terms of compilation units) and bottom-up (in terms of lexical > issues). > > Eric, David, and Ketan are discussing and working on the User Guide. Ive > asked Ketan to see if the informal BNF-like syntax of the classic K&R C > book can be readily adapted to Swift, and also suggested that the concise > compact style of the Swift/T User Guide is in fact more of a reference > manual already, and could form the basis of a complete reference manual > that ideally can serve both Swift/K and Swift/T. > > We'll need and be having a lot more discussion on both the User Guide and > Reference Manual, but with this e-mail I wanted to move all such discussion > to this list (swift-devel). > > - Mike > > ----- Original Message ----- > > From: "Amy Apon" > > To: "Michael Wilde" > > Cc: "Eric Skogen" , "David Kelly" < > davidk at ci.uchicago.edu>, "Ketan Maheshwari" > > > > Sent: Sunday, March 10, 2013 1:05:54 PM > > Subject: Re: [Swift-user] Variable Declaration > > > > Mike, > > > > > > This is a good idea -- looking at support questions to understand > > where the "pain points" are. > > > > > > This particular question is related to our conversation on the last > > call. If we could come up with a grammar or grammar-like description > > of Swift, then explaining the concept below (that an app() > > function's body is restricted to contain a single command line > > template and nothing else),would be explained with a tutorial about > > the grammar. Are we still looking at this? > > > > > > I would like to schedule another call that includes you, me, and > > Eric, I think, to keep in touch. How does your time look on > > Wednesday afternoon this week? > > > > > > Amy > > > > > > > > > > > > Amy Apon, Ph.D. > > Professor and Chair, Division of Computer Science, School of > > Computing > > Clemson University > > 221 McAdams > > Phone: 864-656-5769 > > > > > > > > > > > > > > > > > > On Sun, Mar 10, 2013 at 12:30 PM, Michael Wilde < wilde at mcs.anl.gov > > > wrote: > > > > > > Eric, All, > > > > I think we can gain a lot of insight into documentation/training > > needs by observing the questions users come with when first using > > Swift, such as this one. > > > > - restrictions on what can be in an app() body, and how you code > > around them > > > > - guidance on the use of types and type names > > > > In other words, we can look at support questions and ask "how can the > > learning roadmap make such support questions less likely to happen". > > > > Mike > > > > ----- Forwarded Message ----- > > From: "Tim Armstrong" < tim.g.armstrong at gmail.com > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > Sent: Saturday, March 9, 2013 5:56:39 PM > > Subject: Re: [Swift-user] Variable Declaration > > > > What Mike said. I was also going to say that your line: > > > > type string; > > > > may cause problems: string is a built-in type and you don't need to > > define it. > > > > - Tim > > > > > > On Sat, Mar 9, 2013 at 5:55 PM, Michael Wilde < wilde at mcs.anl.gov > > > wrote: > > > > > > Jay, > > > > An app() function's body is restricted to contain a single command > > line template and nothing else. So instead of: > > > > > > app (messagefile t) parse(messagefile n) { > > string v = @regexp("abcdefghi", "c(def)g","monkey"); > > echo @extractint(n) stdout=@filename(t); > > } > > > > ...you need create a separate ("compound") function to do things like > > string v=(). You can embed arbitrary expressions in the app() body, > > but it can only have one semicolon-terminated command. Thats the > > likely cause of the syntax error. > > > > Also note that your statement "string v = ..." is creating a value > > that as far as I can see is not used anywhere, so you may want to > > re-examine that logic. > > > > - Mike > > > > > > > > ----- Original Message ----- > > > From: "Jay Lee" < jlee734 at gmail.com > > > > To: swift-user at ci.uchicago.edu > > > Sent: Saturday, March 9, 2013 5:48:17 PM > > > Subject: [Swift-user] Variable Declaration > > > > > > > > > Hello, > > > > > > I just started with swift today, so excuse my lack of knowledge. I > > > have the following code: > > > > > > type messagefile; > > > type string; > > > > > > app (messagefile t) parse(messagefile n) { > > > string v = @regexp("abcdefghi", "c(def)g","monkey"); > > > echo @extractint(n) stdout=@filename(t); > > > } > > > > > > app (messagefile t) greeting() { > > > echo "Hello, world!" stdout=@filename(t); > > > } > > > > > > messagefile outfile <"hello.txt">; > > > messagefile input <"compile.txt">; > > > > > > outfile = parse(input); > > > > > > > > > > > > I get an error: Could not compile SwiftScript source: line 6:10: > > > expecting a semicolon, found '=' > > > > > > I found that there are mappers that can be used to declare > > > variables > > > (namely files), but are these required? > > > > > > > > > > > > _______________________________________________ > > > Swift-user mailing list > > > Swift-user at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Sun Mar 10 18:26:53 2013 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sun, 10 Mar 2013 18:26:53 -0500 (CDT) Subject: [Swift-devel] Swift reference manual and syntax specification In-Reply-To: Message-ID: <1649371076.1294063.1362958013186.JavaMail.root@mcs.anl.gov> I agree that a reference manual should contain examples. And I dont think ours will ever be considered a "formal specification" :) I think the difference between a User Guide and a Reference is that the User Guide focuses on helping people learn the language while a reference manual is a place to look up finer details by language construct. Your example is a good one, in that it (and probably several others) would be needed to understand the topic of array closing. The current Swift/T Guide says only "Arrays are part of Swift dataflow semantics. An array is closed when all possible insertions to it are complete" but it doesnt say how to understand that clause "...when all possible insertions to it are complete", in particular what "all possible" means. More examples are needed to understand it fully. Ideally, precise rules and examples complement each other. I think in fact Swift/K would (or used to) deadlock with this example, because A[] wasnt closed until the block completed, and the block could not compete (ie the trace statement could not complete) until the array was closed. So I believe that a reference guide should fully explain how arrays are closed (perhaps, also, what it means for functions and blocks to "complete"). Such explanations should be precise and complete specifications, but not "formal" in the sense of formal semantic abstractions. In other words, "plain English" (or any other written language) should suffice. They should tell users how the language behaves, and ideally tell a new language implementer (or even a new Swift developer) how the language must be implemented so that ideally all Swift implementations behave the same way. - Mike ----- Original Message ----- > From: "Tim Armstrong" > To: "Michael Wilde" > Cc: "Amy Apon" , "Eric Skogen" , "Swift Devel" > > Sent: Sunday, March 10, 2013 6:07:56 PM > Subject: Re: [Swift-devel] Swift reference manual and syntax specification > > Sounds good to me. I've been thinking, for other reasons, about doing > a tech report on Swift/T to serve as a reference for academic > purposes, since we don't have anything we can readily cite. This > would include formal semantics for Swift/T. > > Aside from formal specifications, I think one major gap is concrete > examples that illuminate the trickier points of semantics. I > personally find well-constructed examples more helpful for most > purposes than formal specifications, but I think that may come down > to different learning styles. E.g. one example that I think is > enlightening is the one below, which prints 10, 10 times. Really > emphasises the point that Swift is declarative rather then > imperative. > > int A[]; > foreach i in [1:10] { > A[i] = i; > trace(size(A)); > } > > - Tim > > > > On Sun, Mar 10, 2013 at 4:28 PM, Michael Wilde < wilde at mcs.anl.gov > > wrote: > > > Hi All, > > We have an activity underway within the ExTENCI project to improve > Swift's tutorial and documentation, in collaboration with Amy Apon > (chair of Computer Science at Clemson and OSG collaborator) and > Clemson graduate student Eric Skogen. > > I suggested that we use support questions on the swift-* email lists > as one guide to where documentation needs improvement, citing a > recent question from a new user who didn't understand what can go in > an app() function. > > Amy recently suggested that we publish a syntax specification for > Swift (which is a great idea and long overdue) and pointed out that > the syntax spec could/would have made clear to this new user that > you can't put an assignment in an app def. > > What I believe we need for Swift is a top-notch User Guide that can > double as a hands-on tutorial, and a separate, concise but complete > Reference Manual that precisely states language syntax and > semantics, and can be used by both users (to answer questions not > made clear in the User Guide) and developers (to discuss subtleties > or proposed changes in semantics and to deal with implementation > issues). > > While the User Guide needs to be organized in a reasonable order for > exposing the language, the Reference Manual needs to be laid out > more top-down (in terms of compilation units) and bottom-up (in > terms of lexical issues). > > Eric, David, and Ketan are discussing and working on the User Guide. > Ive asked Ketan to see if the informal BNF-like syntax of the > classic K&R C book can be readily adapted to Swift, and also > suggested that the concise compact style of the Swift/T User Guide > is in fact more of a reference manual already, and could form the > basis of a complete reference manual that ideally can serve both > Swift/K and Swift/T. > > We'll need and be having a lot more discussion on both the User Guide > and Reference Manual, but with this e-mail I wanted to move all such > discussion to this list (swift-devel). > > - Mike > > ----- Original Message ----- > > From: "Amy Apon" < aapon at clemson.edu > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > Cc: "Eric Skogen" < eskogen at g.clemson.edu >, "David Kelly" < > > davidk at ci.uchicago.edu >, "Ketan Maheshwari" > > < ketan at mcs.anl.gov > > > Sent: Sunday, March 10, 2013 1:05:54 PM > > Subject: Re: [Swift-user] Variable Declaration > > > > Mike, > > > > > > This is a good idea -- looking at support questions to understand > > where the "pain points" are. > > > > > > This particular question is related to our conversation on the last > > call. If we could come up with a grammar or grammar-like > > description > > of Swift, then explaining the concept below (that an app() > > function's body is restricted to contain a single command line > > template and nothing else),would be explained with a tutorial about > > the grammar. Are we still looking at this? > > > > > > I would like to schedule another call that includes you, me, and > > Eric, I think, to keep in touch. How does your time look on > > Wednesday afternoon this week? > > > > > > Amy > > > > > > > > > > > > Amy Apon, Ph.D. > > Professor and Chair, Division of Computer Science, School of > > Computing > > Clemson University > > 221 McAdams > > Phone: 864-656-5769 > > > > > > > > > > > > > > > > > > On Sun, Mar 10, 2013 at 12:30 PM, Michael Wilde < wilde at mcs.anl.gov > > > > > wrote: > > > > > > Eric, All, > > > > I think we can gain a lot of insight into documentation/training > > needs by observing the questions users come with when first using > > Swift, such as this one. > > > > - restrictions on what can be in an app() body, and how you code > > around them > > > > - guidance on the use of types and type names > > > > In other words, we can look at support questions and ask "how can > > the > > learning roadmap make such support questions less likely to > > happen". > > > > Mike > > > > ----- Forwarded Message ----- > > From: "Tim Armstrong" < tim.g.armstrong at gmail.com > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > Sent: Saturday, March 9, 2013 5:56:39 PM > > Subject: Re: [Swift-user] Variable Declaration > > > > What Mike said. I was also going to say that your line: > > > > type string; > > > > may cause problems: string is a built-in type and you don't need to > > define it. > > > > - Tim > > > > > > On Sat, Mar 9, 2013 at 5:55 PM, Michael Wilde < wilde at mcs.anl.gov > > > wrote: > > > > > > Jay, > > > > An app() function's body is restricted to contain a single command > > line template and nothing else. So instead of: > > > > > > app (messagefile t) parse(messagefile n) { > > string v = @regexp("abcdefghi", "c(def)g","monkey"); > > echo @extractint(n) stdout=@filename(t); > > } > > > > ...you need create a separate ("compound") function to do things > > like > > string v=(). You can embed arbitrary expressions in the app() body, > > but it can only have one semicolon-terminated command. Thats the > > likely cause of the syntax error. > > > > Also note that your statement "string v = ..." is creating a value > > that as far as I can see is not used anywhere, so you may want to > > re-examine that logic. > > > > - Mike > > > > > > > > ----- Original Message ----- > > > From: "Jay Lee" < jlee734 at gmail.com > > > > To: swift-user at ci.uchicago.edu > > > Sent: Saturday, March 9, 2013 5:48:17 PM > > > Subject: [Swift-user] Variable Declaration > > > > > > > > > Hello, > > > > > > I just started with swift today, so excuse my lack of knowledge. > > > I > > > have the following code: > > > > > > type messagefile; > > > type string; > > > > > > app (messagefile t) parse(messagefile n) { > > > string v = @regexp("abcdefghi", "c(def)g","monkey"); > > > echo @extractint(n) stdout=@filename(t); > > > } > > > > > > app (messagefile t) greeting() { > > > echo "Hello, world!" stdout=@filename(t); > > > } > > > > > > messagefile outfile <"hello.txt">; > > > messagefile input <"compile.txt">; > > > > > > outfile = parse(input); > > > > > > > > > > > > I get an error: Could not compile SwiftScript source: line 6:10: > > > expecting a semicolon, found '=' > > > > > > I found that there are mappers that can be used to declare > > > variables > > > (namely files), but are these required? > > > > > > > > > > > > _______________________________________________ > > > Swift-user mailing list > > > Swift-user at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > From benc at hawaga.org.uk Sun Mar 10 18:46:03 2013 From: benc at hawaga.org.uk (Ben Clifford) Date: Sun, 10 Mar 2013 23:46:03 +0000 (UTC) Subject: [Swift-devel] Swift reference manual and syntax specification In-Reply-To: <1649371076.1294063.1362958013186.JavaMail.root@mcs.anl.gov> References: <1649371076.1294063.1362958013186.JavaMail.root@mcs.anl.gov> Message-ID: > Your example is a good one, in that it (and probably several others) > would be needed to understand the topic of array closing. The current > Swift/T Guide says only "Arrays are part of Swift dataflow semantics. An > array is closed when all possible insertions to it are complete" but it > doesnt say how to understand that clause "...when all possible > insertions to it are complete", in particular what "all possible" means. > More examples are needed to understand it fully. Ideally, precise rules > and examples complement each other. wrt this, one approach to array closing that I had before was regarding the state of the array as constrained to move on a directed graph of states until they reach an end state (the closed state) - that gives not really a state machine, but perhaps something a bit like it, onto which maybe you could say "this kind of Swift code moves an array into this state". I never wrote that more formally, and I don't think it lines up entirely with the way that Swift does things, but it might be interesting to pursue. -- From tim.g.armstrong at gmail.com Sun Mar 10 21:12:13 2013 From: tim.g.armstrong at gmail.com (Tim Armstrong) Date: Sun, 10 Mar 2013 21:12:13 -0500 Subject: [Swift-devel] Swift reference manual and syntax specification In-Reply-To: References: <1649371076.1294063.1362958013186.JavaMail.root@mcs.anl.gov> Message-ID: There's been someone at Indiana working on semantics of monotonic variables, which is pretty much exactly what Ben described: http://www.cs.indiana.edu/~rrnewton/papers/2012-lambdapar-draft.pdf On Sun, Mar 10, 2013 at 6:46 PM, Ben Clifford wrote: > > > Your example is a good one, in that it (and probably several others) > > would be needed to understand the topic of array closing. The current > > Swift/T Guide says only "Arrays are part of Swift dataflow semantics. An > > array is closed when all possible insertions to it are complete" but it > > doesnt say how to understand that clause "...when all possible > > insertions to it are complete", in particular what "all possible" means. > > More examples are needed to understand it fully. Ideally, precise rules > > and examples complement each other. > > wrt this, one approach to array closing that I had before was regarding > the state of the array as constrained to move on a directed graph of > states until they reach an end state (the closed state) - that gives not > really a state machine, but perhaps something a bit like it, onto which > maybe you could say "this kind of Swift code moves an array into this > state". I never wrote that more formally, and I don't think it lines up > entirely with the way that Swift does things, but it might be interesting > to pursue. > > -- > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Sun Mar 10 21:28:34 2013 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 10 Mar 2013 19:28:34 -0700 Subject: [Swift-devel] Cant get auto-coasters to run from midway to beagle In-Reply-To: <1649997961.1291675.1362951469901.JavaMail.root@mcs.anl.gov> References: <1649997961.1291675.1362951469901.JavaMail.root@mcs.anl.gov> Message-ID: <1362968914.958.0.camel@echo> Nono, I did the commits in the 0.94 branch, not trunk. Mihael On Sun, 2013-03-10 at 16:37 -0500, Michael Wilde wrote: > Mihael, it seems that the problem is still there under the current trunk - see below. This is in: > > midway:/home/wilde/osgdemo/modis/svn/run035. > > The "cog modified locally" is a hopefully inconsequential change in worker.pl where I open stdin to /dev/null rather than close it, before launching an app, to remedy an unrelated MPI problem. > > - Mike > > Swift trunk swift-r6362 cog-r3637 (cog modified locally) > > RunID: 20130310-2055-4lqjiftd > Progress: time: Sun, 10 Mar 2013 20:55:52 +0000 > Progress: time: Sun, 10 Mar 2013 20:56:06 +0000 Selecting site:269 Submitting:47 Submitted:1 > Progress: time: Sun, 10 Mar 2013 20:56:12 +0000 Selecting site:269 Stage in:1 Submitted:47 > Progress: time: Sun, 10 Mar 2013 20:56:16 +0000 Selecting site:269 Stage in:25 Submitted:23 > Progress: time: Sun, 10 Mar 2013 20:56:22 +0000 Selecting site:269 Stage in:48 > Progress: time: Sun, 10 Mar 2013 20:56:52 +0000 Selecting site:269 Stage in:48 > Progress: time: Sun, 10 Mar 2013 20:57:22 +0000 Selecting site:269 Stage in:48 > Progress: time: Sun, 10 Mar 2013 20:57:52 +0000 Selecting site:269 Stage in:48 > Progress: time: Sun, 10 Mar 2013 20:58:19 +0000 Selecting site:269 Stage in:47 Active:1 > Progress: time: Sun, 10 Mar 2013 20:58:20 +0000 Selecting site:269 Stage in:26 Active:22 > Progress: time: Sun, 10 Mar 2013 20:58:22 +0000 Selecting site:269 Stage in:24 Active:24 > Progress: time: Sun, 10 Mar 2013 20:58:24 +0000 Selecting site:269 Stage in:23 Active:25 > Progress: time: Sun, 10 Mar 2013 20:58:26 +0000 Selecting site:269 Active:47 Stage out:1 > Progress: time: Sun, 10 Mar 2013 20:58:27 +0000 Selecting site:260 Stage in:7 Submitting:1 Submitted:1 Active:39 Finished successfully:9 > Progress: time: Sun, 10 Mar 2013 20:58:28 +0000 Selecting site:258 Stage in:9 Submitting:1 Submitted:1 Active:24 Stage out:13 Finished successfully:11 > Progress: time: Sun, 10 Mar 2013 20:58:29 +0000 Selecting site:245 Stage in:23 Submitted:1 Active:24 Finished successfully:24 > Progress: time: Sun, 10 Mar 2013 20:58:31 +0000 Selecting site:245 Stage in:24 Active:23 Stage out:1 Finished successfully:24 > Progress: time: Sun, 10 Mar 2013 20:58:32 +0000 Selecting site:245 Stage in:24 Active:23 Finished successfully:25 > Progress: time: Sun, 10 Mar 2013 20:58:34 +0000 Selecting site:244 Stage in:24 Submitting:1 Stage out:22 Finished successfully:26 > Progress: time: Sun, 10 Mar 2013 20:58:35 +0000 Selecting site:221 Stage in:25 Submitting:22 Submitted:1 Finished successfully:48 > Progress: time: Sun, 10 Mar 2013 20:58:52 +0000 Selecting site:221 Stage in:48 Finished successfully:48 > Progress: time: Sun, 10 Mar 2013 20:59:22 +0000 Selecting site:221 Stage in:48 Finished successfully:48 > Progress: time: Sun, 10 Mar 2013 20:59:52 +0000 Selecting site:221 Stage in:48 Finished successfully:48 > Progress: time: Sun, 10 Mar 2013 20:59:56 +0000 Selecting site:221 Stage in:47 Active:1 Finished successfully:48 > Progress: time: Sun, 10 Mar 2013 21:00:02 +0000 Selecting site:221 Stage in:47 Stage out:1 Finished successfully:48 > Progress: time: Sun, 10 Mar 2013 21:00:05 +0000 Selecting site:221 Stage in:47 Finished successfully:49 > Channels: {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], null at id://u-3bec3eab-13d5616c4bd--8000-u4684d136-13d5616c4d0--8000S=MetaChannel[service-60734] -> BufferingChannel, /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60734] -> BufferingChannel, null at id://u4684d136-13d5616c4d0--7fff-u-3bec3eab-13d5616c4bd--7fffC=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]} > Context: service-60018 > Meta context: service-60734 > Progress: time: Sun, 10 Mar 2013 21:00:07 +0000 Selecting site:220 Stage in:47 Submitted:1 Finished successfully:49 > Channels: {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], null at id://u-3bec3eab-13d5616c4bd--8000-u4684d136-13d5616c4d0--8000S=MetaChannel[service-60734] -> BufferingChannel, /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60734] -> BufferingChannel, null at id://u4684d136-13d5616c4d0--7fff-u-3bec3eab-13d5616c4bd--7fffC=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]} > Context: service-60263 > Meta context: service-60734 > Channels: {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], null at id://u-3bec3eab-13d5616c4bd--8000-u4684d136-13d5616c4d0--8000S=MetaChannel[service-60734] -> BufferingChannel, /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60734] -> BufferingChannel, null at id://u4684d136-13d5616c4d0--7fff-u-3bec3eab-13d5616c4bd--7fffC=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]} > Context: service-60408 > Meta context: service-60734 > Progress: time: Sun, 10 Mar 2013 21:00:18 +0000 Selecting site:220 Stage in:47 Active:1 Finished successfully:49 > Progress: time: Sun, 10 Mar 2013 21:00:19 +0000 Selecting site:220 Stage in:46 Active:2 Finished successfully:49 > Execution failed: > Exception in getlanduse: > Arguments: [home/wilde/osgdemo/modis/svn/data/modis/2002/h12v09.rgb] > Host: beagle > Directory: modis02-20130310-2055-4lqjiftd/jobs/y/getlanduse-yht64f6l > > Caused by: > Shutting down worker > getLandUse, modis02.swift, line 20 > error null > > real 4m29.509s > user 2m45.981s > sys 0m3.520s > > > ----- Original Message ----- > > From: "Michael Wilde" > > To: "Mihael Hategan" > > Cc: "Swift Devel" > > Sent: Sunday, March 10, 2013 3:20:53 PM > > Subject: Re: [Swift-devel] Cant get auto-coasters to run from midway to beagle > > > > Duh. Thank you. I didn't build a new release, was using same 0.94 > > RC4 code. > > > > Sorry about that. Will retest. > > > > - Mike > > > > > > ----- Original Message ----- > > > From: "Mihael Hategan" > > > To: "Michael Wilde" > > > Cc: "Swift Devel" > > > Sent: Sunday, March 10, 2013 3:06:25 PM > > > Subject: Re: [Swift-devel] Cant get auto-coasters to run from > > > midway to beagle > > > > > > ChannelContext Notifying commands and handlers about exception > > > org.globus.cog.karajan.workflow.service.TimeoutException: Channel > > > timed > > > out. lastTime=940817-071255.807, now=130310-164156.506, > > > channel=GSSSChannel-1463847073(1)[service-60519] > > > > > > Are you sure you are running with the latest code? . There was a > > > (inconsequential mostly) bug before that set lastTime to > > > Long.MAX_TIME > > > before creating that exception. That was fixed. Your message > > > indicates > > > the code you are using does not have that fix (year xx94 is what > > > comes > > > out of Long.MAX_TIME). > > > > > > I gotta go now, but I'll come back later and check some more. There > > > is > > > something weird going on there besides that. > > > > > > Mihael > > > > > > On Sun, 2013-03-10 at 12:01 -0500, Michael Wilde wrote: > > > > Here's run034: seems to be a bit better, but still dies. This is > > > > with throttle of 48 jobs on 48 cores (2 nodes), fom swift.rcc to > > > > beagle. 17MB files. Still seems to curiously die about 4 mins > > > > into the run, which suggests some kind of timeout is still > > > > lurking??? > > > > > > > > - Mike > > > > > > > > Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified locally) > > > > > > > > RunID: 20130310-1639-kyb8hca9 > > > > Progress: time: Sun, 10 Mar 2013 16:39:45 +0000 > > > > Progress: time: Sun, 10 Mar 2013 16:39:56 +0000 Selecting > > > > site:269 Submitting:47 Submitted:1 > > > > Progress: time: Sun, 10 Mar 2013 16:40:01 +0000 Selecting > > > > site:269 Stage in:1 Submitted:47 > > > > Progress: time: Sun, 10 Mar 2013 16:40:15 +0000 Selecting > > > > site:269 Stage in:48 > > > > Progress: time: Sun, 10 Mar 2013 16:40:45 +0000 Selecting > > > > site:269 Stage in:48 > > > > Progress: time: Sun, 10 Mar 2013 16:41:15 +0000 Selecting > > > > site:269 Stage in:48 > > > > Progress: time: Sun, 10 Mar 2013 16:41:45 +0000 Selecting > > > > site:269 Stage in:48 > > > > Progress: time: Sun, 10 Mar 2013 16:42:11 +0000 Selecting > > > > site:269 Stage in:47 Active:1 > > > > Progress: time: Sun, 10 Mar 2013 16:42:12 +0000 Selecting > > > > site:269 Stage in:41 Active:7 > > > > Progress: time: Sun, 10 Mar 2013 16:42:13 +0000 Selecting > > > > site:269 Stage in:23 Active:25 > > > > Progress: time: Sun, 10 Mar 2013 16:42:15 +0000 Selecting > > > > site:269 Active:48 > > > > Progress: time: Sun, 10 Mar 2013 16:42:17 +0000 Selecting > > > > site:269 Active:47 Stage out:1 > > > > Progress: time: Sun, 10 Mar 2013 16:42:18 +0000 Selecting > > > > site:268 Stage in:1 Active:46 Stage out:1 Finished > > > > successfully:1 > > > > Progress: time: Sun, 10 Mar 2013 16:42:19 +0000 Selecting > > > > site:265 Stage in:3 Submitted:1 Active:42 Stage out:2 > > > > Finished successfully:4 > > > > Progress: time: Sun, 10 Mar 2013 16:42:20 +0000 Selecting > > > > site:258 Stage in:6 Submitting:5 Active:23 Stage out:13 > > > > Finished successfully:12 > > > > Progress: time: Sun, 10 Mar 2013 16:42:21 +0000 Selecting > > > > site:244 Stage in:24 Submitting:1 Active:20 Stage out:3 > > > > Finished successfully:25 > > > > Progress: time: Sun, 10 Mar 2013 16:42:23 +0000 Selecting > > > > site:241 Stage in:25 Submitting:3 Stage out:19 Finished > > > > successfully:29 > > > > Progress: time: Sun, 10 Mar 2013 16:42:24 +0000 Selecting > > > > site:221 Stage in:28 Submitting:19 Submitted:1 Finished > > > > successfully:48 > > > > Progress: time: Sun, 10 Mar 2013 16:42:45 +0000 Selecting > > > > site:221 Stage in:48 Finished successfully:48 > > > > Progress: time: Sun, 10 Mar 2013 16:42:54 +0000 Selecting > > > > site:221 Stage in:47 Active:1 Finished successfully:48 > > > > Progress: time: Sun, 10 Mar 2013 16:43:00 +0000 Selecting > > > > site:221 Stage in:47 Stage out:1 Finished successfully:48 > > > > Progress: time: Sun, 10 Mar 2013 16:43:02 +0000 Selecting > > > > site:221 Stage in:47 Finished successfully:49 > > > > Progress: time: Sun, 10 Mar 2013 16:43:05 +0000 Selecting > > > > site:220 Stage in:47 Submitted:1 Finished successfully:49 > > > > Progress: time: Sun, 10 Mar 2013 16:43:15 +0000 Selecting > > > > site:220 Stage in:48 Finished successfully:49 > > > > Progress: time: Sun, 10 Mar 2013 16:43:45 +0000 Selecting > > > > site:220 Stage in:48 Finished successfully:49 > > > > Channels: > > > > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] > > > > -> > > > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > > > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60519] > > > > -> BufferingChannel, > > > > null at id://u7b315f9a-13d552c3f68--7fff-u-362f30fc-13d552c3f50--7fffC=MetaChannel[https://192.5.86.107:50000] > > > > -> > > > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > > > null at id://u-362f30fc-13d552c3f50--8000-u7b315f9a-13d552c3f68--8000S=MetaChannel[service-60519] > > > > -> BufferingChannel} > > > > Context: service-60859 > > > > Meta context: service-60519 > > > > Progress: time: Sun, 10 Mar 2013 16:43:59 +0000 Selecting > > > > site:220 Stage in:47 Active:1 Finished successfully:49 > > > > Channels: > > > > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] > > > > -> > > > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > > > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60519] > > > > -> BufferingChannel, > > > > null at id://u7b315f9a-13d552c3f68--7fff-u-362f30fc-13d552c3f50--7fffC=MetaChannel[https://192.5.86.107:50000] > > > > -> > > > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > > > null at id://u-362f30fc-13d552c3f50--8000-u7b315f9a-13d552c3f68--8000S=MetaChannel[service-60519] > > > > -> BufferingChannel} > > > > Context: service-60663 > > > > Meta context: service-60519 > > > > Progress: time: Sun, 10 Mar 2013 16:44:05 +0000 Selecting > > > > site:220 Stage in:47 Stage out:1 Finished successfully:49 > > > > Progress: time: Sun, 10 Mar 2013 16:44:07 +0000 Selecting > > > > site:220 Stage in:47 Finished successfully:50 > > > > Channels: > > > > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] > > > > -> > > > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > > > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60519] > > > > -> BufferingChannel, > > > > null at id://u7b315f9a-13d552c3f68--7fff-u-362f30fc-13d552c3f50--7fffC=MetaChannel[https://192.5.86.107:50000] > > > > -> > > > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > > > null at id://u-362f30fc-13d552c3f50--8000-u7b315f9a-13d552c3f68--8000S=MetaChannel[service-60519] > > > > -> BufferingChannel} > > > > Context: service-60081 > > > > Meta context: service-60519 > > > > Progress: time: Sun, 10 Mar 2013 16:44:09 +0000 Selecting > > > > site:219 Stage in:45 Submitting:1 Active:2 Finished > > > > successfully:50 > > > > Execution failed: > > > > Exception in getlanduse: > > > > Arguments: > > > > [home/wilde/osgdemo/modis/svn/data/modis/2002/h02v11.rgb] > > > > Host: beagle > > > > Directory: > > > > modis02-20130310-1639-kyb8hca9/jobs/9/getlanduse-90fyse6l > > > > > > > > Caused by: > > > > Shutting down worker > > > > getLandUse, modis02.swift, line 20 > > > > error null > > > > > > > > real 4m27.007s > > > > user 2m44.221s > > > > sys 0m3.448s > > > > + mv /home/wilde/.swift/runs/current/run034.1362933583 > > > > /home/wilde/.swift/runs/completed > > > > midway001$ > > > > > > > > > > > > ----- Original Message ----- > > > > > From: "Mihael Hategan" > > > > > To: "Michael Wilde" > > > > > Cc: "Swift Devel" > > > > > Sent: Sunday, March 10, 2013 1:36:26 AM > > > > > Subject: Re: [Swift-devel] Cant get auto-coasters to run from > > > > > midway to beagle > > > > > > > > > > Please try now. I made some changes: > > > > > > > > > > 1. start the service with "-l" so that things in your .profile > > > > > (such > > > > > as > > > > > module load sun-java) would be picked up. However, this also > > > > > means > > > > > that > > > > > you should unset X509_* stuff or the sshcl proxy forwarding > > > > > will > > > > > not > > > > > work properly. > > > > > > > > > > 2. I fixed a bug that caused an extra connection to the coaster > > > > > service. > > > > > Normally the service connects back to the client and both use > > > > > that > > > > > connection. However, due to some changes in the way credentials > > > > > were > > > > > set > > > > > for jobs, and the fact that connections were looked up based on > > > > > both > > > > > hostname and credential, the coaster client would ignore the > > > > > existing > > > > > connection and create another one. The initial one with then > > > > > time > > > > > out > > > > > at > > > > > some point causing the service to crash. > > > > > > > > > > Mihael > > > > > > > > > > On Sat, 2013-03-09 at 17:49 -0600, Michael Wilde wrote: > > > > > > An update on this provider staging related issue: reducing > > > > > > filesize > > > > > > from 17MB to 600KB runs well. > > > > > > > > > > > > So seems like some kind of flow control or buffer management > > > > > > problem, possibly? > > > > > > > > > > > > May need to take that problem offline - would be a perfect > > > > > > test > > > > > > case for Yadu to develop a new stress test for. > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > ----- Forwarded Message ----- > > > > > > From: "Michael Wilde" > > > > > > To: "David Kelly" > > > > > > Sent: Saturday, March 9, 2013 5:21:49 PM > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > OK, much better: with 600K files (5x5 reduction or 25X > > > > > > smaller) > > > > > > it > > > > > > works well, and fast (form midway to beagle!) > > > > > > > > > > > > Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified locally) > > > > > > > > > > > > RunID: 20130309-2319-5zq0jrfg > > > > > > Progress: time: Sat, 09 Mar 2013 23:19:45 +0000 > > > > > > Progress: time: Sat, 09 Mar 2013 23:19:56 +0000 Selecting > > > > > > site:269 Submitting:47 Submitted:1 > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:05 +0000 Selecting > > > > > > site:269 Stage in:1 Submitted:47 > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:09 +0000 Selecting > > > > > > site:269 Stage in:47 Active:1 > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:10 +0000 Selecting > > > > > > site:269 Stage in:46 Active:1 Stage out:1 > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:11 +0000 Selecting > > > > > > site:250 Stage in:19 Active:28 Stage out:1 Finished > > > > > > successfully:19 > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:12 +0000 Selecting > > > > > > site:229 Stage in:18 Submitting:21 Active:1 Stage out:7 > > > > > > Finished successfully:41 > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:13 +0000 Selecting > > > > > > site:220 Stage in:41 Submitting:1 Active:5 Stage out:1 > > > > > > Finished successfully:49 > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:14 +0000 Selecting > > > > > > site:220 Stage in:38 Active:1 Stage out:9 Finished > > > > > > successfully:49 > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:15 +0000 Selecting > > > > > > site:212 Stage in:30 Submitting:8 Stage out:9 Finished > > > > > > successfully:58 > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:16 +0000 Selecting > > > > > > site:203 Stage in:38 Submitting:8 Submitted:1 Finished > > > > > > successfully:67 > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:18 +0000 Selecting > > > > > > site:202 Stage in:19 Stage out:28 Finished successfully:68 > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:19 +0000 Selecting > > > > > > site:172 Stage in:33 Submitting:2 Submitted:6 Active:5 > > > > > > Stage > > > > > > out:2 Finished successfully:97 > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:20 +0000 Selecting > > > > > > site:170 Stage in:31 Submitting:2 Stage out:14 Finished > > > > > > successfully:100 > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:21 +0000 Selecting > > > > > > site:162 Stage in:30 Submitting:10 Stage out:6 Finished > > > > > > successfully:109 > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:22 +0000 Selecting > > > > > > site:154 Stage in:39 Submitting:5 Submitted:3 Active:1 > > > > > > Finished successfully:115 > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:23 +0000 Selecting > > > > > > site:154 Stage in:21 Active:10 Stage out:16 Finished > > > > > > successfully:116 > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:24 +0000 Selecting > > > > > > site:126 Stage in:20 Submitting:25 Submitted:1 Stage > > > > > > out:2 > > > > > > Finished successfully:143 > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:25 +0000 Selecting > > > > > > site:124 Stage in:31 Active:2 Stage out:15 Finished > > > > > > successfully:145 > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:26 +0000 Selecting > > > > > > site:110 Stage in:30 Submitting:14 Stage out:3 Finished > > > > > > successfully:160 > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:27 +0000 Selecting > > > > > > site:106 Stage in:43 Submitting:1 Submitted:1 Active:1 > > > > > > Stage > > > > > > out:2 Finished successfully:163 > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:28 +0000 Selecting > > > > > > site:104 Stage in:20 Submitting:2 Active:7 Stage out:19 > > > > > > Finished successfully:165 > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:29 +0000 Selecting > > > > > > site:78 > > > > > > Stage in:29 Submitting:16 Submitted:1 Stage out:2 > > > > > > Finished > > > > > > successfully:191 > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:31 +0000 Selecting > > > > > > site:76 > > > > > > Stage in:30 Stage out:17 Finished successfully:194 > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:32 +0000 Selecting > > > > > > site:58 > > > > > > Stage in:29 Submitting:18 Active:1 Finished > > > > > > successfully:211 > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:33 +0000 Selecting > > > > > > site:58 > > > > > > Stage in:33 Active:3 Stage out:12 Finished > > > > > > successfully:211 > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:34 +0000 Selecting > > > > > > site:46 > > > > > > Stage in:18 Submitting:11 Submitted:1 Active:2 Stage > > > > > > out:14 > > > > > > Finished successfully:225 > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:35 +0000 Selecting > > > > > > site:30 > > > > > > Stage in:29 Active:14 Stage out:3 Finished > > > > > > successfully:241 > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:36 +0000 Selecting > > > > > > site:28 > > > > > > Stage in:28 Submitting:2 Stage out:17 Finished > > > > > > successfully:242 > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:37 +0000 Selecting > > > > > > site:10 > > > > > > Stage in:30 Submitting:17 Submitted:1 Finished > > > > > > successfully:259 > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:38 +0000 Selecting > > > > > > site:10 > > > > > > Stage in:35 Stage out:13 Finished successfully:259 > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:39 +0000 Stage in:21 > > > > > > Submitting:6 Submitted:3 Stage out:15 Finished > > > > > > successfully:272 > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:40 +0000 Stage in:10 > > > > > > Active:5 Stage out:14 Finished successfully:288 > > > > > > Final status: Sat, 09 Mar 2013 23:20:41 +0000 Finished > > > > > > successfully:317 > > > > > > > > > > > > real 0m58.953s > > > > > > user 0m32.573s > > > > > > sys 0m1.263s > > > > > > + mv /home/wilde/.swift/runs/current/run029.1362871183 > > > > > > /home/wilde/.swift/runs/completed > > > > > > midway001$ > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > From: "David Kelly" > > > > > > > To: "Michael Wilde" > > > > > > > Sent: Saturday, March 9, 2013 5:12:59 PM > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > Yep - I had a version where the input files were in a very > > > > > > > similar > > > > > > > format (PGM, 1 byte per pixel). I'll add that back, but > > > > > > > without > > > > > > > the > > > > > > > small PGM header in the files. > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > To: "David Kelly" > > > > > > > Sent: Saturday, March 9, 2013 5:04:43 PM > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > I think we need to cut down the size of these files for a > > > > > > > demo > > > > > > > (although they are great for a stress test). > > > > > > > > > > > > > > First, the RGB format by itself uses 3 bytes per pixel when > > > > > > > it > > > > > > > only > > > > > > > needs one (for land use) > > > > > > > > > > > > > > Second, we should cut down by a factor of 9 (3x3) or 16 > > > > > > > (4x4). > > > > > > > > > > > > > > I tried that using simple convert statements, but it always > > > > > > > seems > > > > > > > to > > > > > > > yield a file exactly double what it should be. > > > > > > > > > > > > > > More on this later; was hoping to get things working "as > > > > > > > is" > > > > > > > first. > > > > > > > > > > > > > > I assume you could get the perl code to work on > > > > > > > one-byte-per-pixel > > > > > > > instead of the default 3 for the convert rgb format? > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > From: "David Kelly" > > > > > > > > To: "Michael Wilde" > > > > > > > > Sent: Saturday, March 9, 2013 4:36:30 PM > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > That would probably be a good idea for a new script, to > > > > > > > > show > > > > > > > > how to > > > > > > > > stage apps like that. For now I updated the scripts on > > > > > > > > lustre.. > > > > > > > > hopefully that helps. > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > To: "David Kelly" > > > > > > > > Sent: Saturday, March 9, 2013 4:29:14 PM > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > OK, I see that its trying to run getlanduse.sh from your > > > > > > > > /lustre > > > > > > > > dir > > > > > > > > on beagle, which is different than the one Ive got > > > > > > > > checked > > > > > > > > out. > > > > > > > > It > > > > > > > > seems to get an error in a stderr redirect??? Let me se > > > > > > > > what I > > > > > > > > need > > > > > > > > to do to get the beagle side in sync. > > > > > > > > > > > > > > > > Seems like since these are perl scripts, we should make > > > > > > > > the > > > > > > > > app() > > > > > > > > /bin/sh and send the script as data, perhaps? > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > From: "Michael Wilde" > > > > > > > > > To: "David Kelly" > > > > > > > > > Sent: Saturday, March 9, 2013 4:19:31 PM > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > OK, making progress. Now I dialed down the throttle and > > > > > > > > > node > > > > > > > > > counts > > > > > > > > > to 48 jobs. > > > > > > > > > > > > > > > > > > Now I get further, for ./demo and site=4 script=2: > > > > > > > > > > > > > > > > > > RunID: 20130309-2214-1oi3rvea > > > > > > > > > Progress: time: Sat, 09 Mar 2013 22:14:06 +0000 > > > > > > > > > Progress: time: Sat, 09 Mar 2013 22:14:17 +0000 > > > > > > > > > Selecting > > > > > > > > > site:269 > > > > > > > > > Submitting:47 Submitted:1 > > > > > > > > > Progress: time: Sat, 09 Mar 2013 22:14:22 +0000 > > > > > > > > > Selecting > > > > > > > > > site:269 > > > > > > > > > Stage in:1 Submitted:47 > > > > > > > > > Progress: time: Sat, 09 Mar 2013 22:14:28 +0000 > > > > > > > > > Selecting > > > > > > > > > site:269 > > > > > > > > > Stage in:25 Submitted:23 > > > > > > > > > Progress: time: Sat, 09 Mar 2013 22:14:36 +0000 > > > > > > > > > Selecting > > > > > > > > > site:269 > > > > > > > > > Stage in:48 > > > > > > > > > Progress: time: Sat, 09 Mar 2013 22:15:06 +0000 > > > > > > > > > Selecting > > > > > > > > > site:269 > > > > > > > > > Stage in:48 > > > > > > > > > Progress: time: Sat, 09 Mar 2013 22:15:36 +0000 > > > > > > > > > Selecting > > > > > > > > > site:269 > > > > > > > > > Stage in:48 > > > > > > > > > Progress: time: Sat, 09 Mar 2013 22:16:06 +0000 > > > > > > > > > Selecting > > > > > > > > > site:269 > > > > > > > > > Stage in:48 > > > > > > > > > Progress: time: Sat, 09 Mar 2013 22:16:26 +0000 > > > > > > > > > Selecting > > > > > > > > > site:269 > > > > > > > > > Stage in:47 Active:1 > > > > > > > > > Progress: time: Sat, 09 Mar 2013 22:16:27 +0000 > > > > > > > > > Selecting > > > > > > > > > site:269 > > > > > > > > > Stage in:36 Active:12 > > > > > > > > > Progress: time: Sat, 09 Mar 2013 22:16:29 +0000 > > > > > > > > > Selecting > > > > > > > > > site:269 > > > > > > > > > Stage in:24 Active:24 > > > > > > > > > Progress: time: Sat, 09 Mar 2013 22:16:34 +0000 > > > > > > > > > Selecting > > > > > > > > > site:269 > > > > > > > > > Stage in:24 Active:23 Stage out:1 > > > > > > > > > Progress: time: Sat, 09 Mar 2013 22:16:35 +0000 > > > > > > > > > Selecting > > > > > > > > > site:269 > > > > > > > > > Stage in:14 Active:33 Stage out:1 > > > > > > > > > Execution failed: > > > > > > > > > Exception in getlanduse: > > > > > > > > > Arguments: > > > > > > > > > [home/wilde/osgdemo/modis/svn/data/modis/2002/h08v04.rgb] > > > > > > > > > Host: beagle > > > > > > > > > Directory: > > > > > > > > > modis02-20130309-2214-1oi3rvea/jobs/k/getlanduse-ko5qjd6l > > > > > > > > > > > > > > > > > > Caused by: > > > > > > > > > Application > > > > > > > > > /lustre/beagle/davidk/modis/bin/getlanduse.sh > > > > > > > > > failed > > > > > > > > > with an exit code of 1 > > > > > > > > > getLandUse, modis02.swift, line 20 > > > > > > > > > > > > > > > > > > real 2m31.463s > > > > > > > > > user 1m33.238s > > > > > > > > > sys 0m2.160s > > > > > > > > > + mv /home/wilde/.swift/runs/current/run024.1362867244 > > > > > > > > > /home/wilde/.swift/runs/completed > > > > > > > > > midway001$ > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > From: "David Kelly" > > > > > > > > > > To: "Michael Wilde" > > > > > > > > > > Sent: Saturday, March 9, 2013 3:55:30 PM > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ok, I'll take a look at that. The run dir I used was > > > > > > > > > > /scratch/midway/davidkelly999/modis/run011 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > > To: "David Kelly" > > > > > > > > > > Sent: Saturday, March 9, 2013 3:52:28 PM > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > I just tried this, but didnt work - same prob. > > > > > > > > > > > > > > > > > > > > But if its working for you now, we must be close. > > > > > > > > > > > > > > > > > > > > Not yet sure what the diff is... > > > > > > > > > > > > > > > > > > > > My run dir is /home/wilde/osgdemo/modis/svn/run021 > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > From: "David Kelly" > > > > > > > > > > > To: "Michael Wilde" > > > > > > > > > > > Sent: Saturday, March 9, 2013 3:46:13 PM > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Had to make sure I was using the IP address on eth4 > > > > > > > > > > > (128.135.112.71 > > > > > > > > > > > for midway-login1), not a local address or an > > > > > > > > > > > infiniband > > > > > > > > > > > address. > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "David Kelly" > > > > > > > > > > > To: "Michael Wilde" > > > > > > > > > > > Sent: Saturday, March 9, 2013 3:43:51 PM > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I just got it working. I had to adjust for the > > > > > > > > > > > differences in > > > > > > > > > > > my > > > > > > > > > > > username on Beagle/Midway, then I had to set > > > > > > > > > > > GLOBUS_HOSTNAME > > > > > > > > > > > on > > > > > > > > > > > Midway to the IP address, rather than the full > > > > > > > > > > > hostname > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > > > To: "David Kelly" > > > > > > > > > > > Sent: Saturday, March 9, 2013 3:40:03 PM > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > From: "David Kelly" > > > > > > > > > > > > To: "Michael Wilde" > > > > > > > > > > > > Sent: Saturday, March 9, 2013 3:34:58 PM > > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Is your username the same on beagle and midway? > > > > > > > > > > > > > > > > > > > > > > Yes. And I verified that I can ssh to login4 on > > > > > > > > > > > beagle > > > > > > > > > > > from > > > > > > > > > > > my > > > > > > > > > > > midway > > > > > > > > > > > session (as indeed the scp's of the proxy files > > > > > > > > > > > seem > > > > > > > > > > > to > > > > > > > > > > > be > > > > > > > > > > > working) > > > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > > > > To: "David Kelly" > > > > > > > > > > > > Sent: Saturday, March 9, 2013 3:34:28 PM > > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > OK. > > > > > > > > > > > > > > > > > > > > > > > > Ignore what I said about "problem finding java" - > > > > > > > > > > > > thats > > > > > > > > > > > > code > > > > > > > > > > > > in > > > > > > > > > > > > the > > > > > > > > > > > > very long escaped shell command that gets sent to > > > > > > > > > > > > the > > > > > > > > > > > > remote > > > > > > > > > > > > side. > > > > > > > > > > > > I > > > > > > > > > > > > dont *think* thats the problem. > > > > > > > > > > > > > > > > > > > > > > > > I also verified that beagle can connect to ports > > > > > > > > > > > > 50001 > > > > > > > > > > > > etc > > > > > > > > > > > > on > > > > > > > > > > > > swift.rcc, and that seems OK. > > > > > > > > > > > > > > > > > > > > > > > > I exported > > > > > > > > > > > > GLOBUS_HOSTNAME=midway001.rcc.uchicago.edu > > > > > > > > > > > > on > > > > > > > > > > > > the > > > > > > > > > > > > midway > > > > > > > > > > > > side. And the beagle side seems to be connecting > > > > > > > > > > > > there. > > > > > > > > > > > > > > > > > > > > > > > > Im a bit confused about the timestamps I see for > > > > > > > > > > > > the > > > > > > > > > > > > proxy > > > > > > > > > > > > expiration > > > > > > > > > > > > time, but am not yet suspicious of that (although > > > > > > > > > > > > it > > > > > > > > > > > > seems > > > > > > > > > > > > less > > > > > > > > > > > > than > > > > > > > > > > > > 5 hours past GMT... not sure.) > > > > > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > From: "David Kelly" > > > > > > > > > > > > > To: "Michael Wilde" > > > > > > > > > > > > > Sent: Saturday, March 9, 2013 3:26:32 PM > > > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'm seeing the same error now.. looking into it > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > > > > > To: "David Kelly" > > > > > > > > > > > > > Sent: Saturday, March 9, 2013 3:21:30 PM > > > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > Looking deeper I see that the logs show > > > > > > > > > > > > > problems > > > > > > > > > > > > > with > > > > > > > > > > > > > finding > > > > > > > > > > > > > Java, > > > > > > > > > > > > > I > > > > > > > > > > > > > assume on beagle, ans also service ending > > > > > > > > > > > > > (presumably > > > > > > > > > > > > > coaster > > > > > > > > > > > > > service on midway host). > > > > > > > > > > > > > > > > > > > > > > > > > > I'll dig into these two. > > > > > > > > > > > > > > > > > > > > > > > > > > I see that it scp's the proxies to beagle which > > > > > > > > > > > > > I > > > > > > > > > > > > > think > > > > > > > > > > > > > answers > > > > > > > > > > > > > my > > > > > > > > > > > > > question about security. > > > > > > > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > > > > > > To: "David Kelly" > > > > > > > > > > > > > > Sent: Saturday, March 9, 2013 3:15:01 PM > > > > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > OK. Any thoughts about beagle? > > > > > > > > > > > > > > > > > > > > > > > > > > > > Ive been experimenting but still cant get it > > > > > > > > > > > > > > to > > > > > > > > > > > > > > work, > > > > > > > > > > > > > > same > > > > > > > > > > > > > > error > > > > > > > > > > > > > > (cant connect to bootstrap port) > > > > > > > > > > > > > > > > > > > > > > > > > > > > WHen you tried ssh-cl to beagle with > > > > > > > > > > > > > > automatic > > > > > > > > > > > > > > coasters, > > > > > > > > > > > > > > what > > > > > > > > > > > > > > configuration (sites env etc) did you use? > > > > > > > > > > > > > > > > > > > > > > > > > > > > I verified that beagle can connect back to > > > > > > > > > > > > > > the > > > > > > > > > > > > > > midway > > > > > > > > > > > > > > hosts > > > > > > > > > > > > > > and > > > > > > > > > > > > > > ports. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Do we need to specify security or create a > > > > > > > > > > > > > > proxy > > > > > > > > > > > > > > etc? > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > From: "David Kelly" > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > To: "Michael Wilde" > > > > > > > > > > > > > > > Sent: Saturday, March 9, 2013 3:08:58 PM > > > > > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > One way you can override/customize the > > > > > > > > > > > > > > > default > > > > > > > > > > > > > > > templates > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > create > > > > > > > > > > > > > > > them in $HOME/.swift/sites (I'm not sure if > > > > > > > > > > > > > > > that's > > > > > > > > > > > > > > > what > > > > > > > > > > > > > > > you > > > > > > > > > > > > > > > mean > > > > > > > > > > > > > > > by > > > > > > > > > > > > > > > a local sites dir or not). But you are > > > > > > > > > > > > > > > right > > > > > > > > > > > > > > > about > > > > > > > > > > > > > > > Midway > > > > > > > > > > > > > > > - > > > > > > > > > > > > > > > I > > > > > > > > > > > > > > > have > > > > > > > > > > > > > > > noticed that when using modis it will > > > > > > > > > > > > > > > sometimes > > > > > > > > > > > > > > > get > > > > > > > > > > > > > > > stuck > > > > > > > > > > > > > > > when > > > > > > > > > > > > > > > it > > > > > > > > > > > > > > > goes to a queue that is busy. Ideally swift > > > > > > > > > > > > > > > replication > > > > > > > > > > > > > > > would > > > > > > > > > > > > > > > be > > > > > > > > > > > > > > > able to help better handle that, but I > > > > > > > > > > > > > > > haven't > > > > > > > > > > > > > > > had > > > > > > > > > > > > > > > much > > > > > > > > > > > > > > > luck > > > > > > > > > > > > > > > with > > > > > > > > > > > > > > > that yet. Another way around this may be to > > > > > > > > > > > > > > > add > > > > > > > > > > > > > > > this > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > template: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > key="slurm.exclusive">false > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The swift.log issue was never fixed. It > > > > > > > > > > > > > > > went > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > swift-devel > > > > > > > > > > > > > > > for > > > > > > > > > > > > > > > discussion but was never fixed. I think it > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > relatively > > > > > > > > > > > > > > > simple > > > > > > > > > > > > > > > though.. probably worth fixing before > > > > > > > > > > > > > > > release. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > > > > > > > To: "David Kelly" > > > > > > > > > > > > > > > Sent: Saturday, March 9, 2013 1:38:47 PM > > > > > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > OK, sounds good re the trip plan. Feel free > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > stay > > > > > > > > > > > > > > > Tue > > > > > > > > > > > > > > > night > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > avoid a 4hr drive after a long day. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Im trying the modis demo. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I tried to create a local sites/ dir so I > > > > > > > > > > > > > > > can > > > > > > > > > > > > > > > modify > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > sites > > > > > > > > > > > > > > > templates; thats not working for me either > > > > > > > > > > > > > > > yet. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > For midway, need to force to westmere or > > > > > > > > > > > > > > > sandyb > > > > > > > > > > > > > > > (but > > > > > > > > > > > > > > > not > > > > > > > > > > > > > > > both) > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > ensure 1-node jobs, because either queue > > > > > > > > > > > > > > > can > > > > > > > > > > > > > > > get > > > > > > > > > > > > > > > filled > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > not > > > > > > > > > > > > > > > yield an idle node for a long time. maybe > > > > > > > > > > > > > > > need to > > > > > > > > > > > > > > > fiddle > > > > > > > > > > > > > > > jobsPerNode > > > > > > > > > > > > > > > to get at least 1 core when the system is > > > > > > > > > > > > > > > busy > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > *pretend* > > > > > > > > > > > > > > > that > > > > > > > > > > > > > > > its a node. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > So to get response I tried beagle-ssh; That > > > > > > > > > > > > > > > isnt > > > > > > > > > > > > > > > working > > > > > > > > > > > > > > > because > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > template sites file is wrong in swift 0.94 > > > > > > > > > > > > > > > rc4. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I also see that swift.log is still getting > > > > > > > > > > > > > > > produced - > > > > > > > > > > > > > > > I > > > > > > > > > > > > > > > thought > > > > > > > > > > > > > > > we > > > > > > > > > > > > > > > eliminated that. Did it come back due to a > > > > > > > > > > > > > > > problem > > > > > > > > > > > > > > > with > > > > > > > > > > > > > > > that > > > > > > > > > > > > > > > fix? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'll keep hacking; suggestions welcome. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > From: "David Kelly" > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > To: "Michael Wilde" > > > > > > > > > > > > > > > > Sent: Saturday, March 9, 2013 12:20:00 PM > > > > > > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Mike, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Looking more closely at the agenda, I > > > > > > > > > > > > > > > > think > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > most > > > > > > > > > > > > > > > > interesting/useful talks will be on > > > > > > > > > > > > > > > > Tuesday. > > > > > > > > > > > > > > > > Monday > > > > > > > > > > > > > > > > I'll > > > > > > > > > > > > > > > > come > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > Argonne to work on any loose ends and put > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > finishing > > > > > > > > > > > > > > > > touches > > > > > > > > > > > > > > > > on > > > > > > > > > > > > > > > > any slides/runs/scripts, then drive to > > > > > > > > > > > > > > > > Indianapolis > > > > > > > > > > > > > > > > on > > > > > > > > > > > > > > > > Monday > > > > > > > > > > > > > > > > afternoon/evening. I have a hotel booked > > > > > > > > > > > > > > > > for > > > > > > > > > > > > > > > > Monday > > > > > > > > > > > > > > > > night. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'll do some runs using the routes we > > > > > > > > > > > > > > > > talked > > > > > > > > > > > > > > > > about. > > > > > > > > > > > > > > > > I'm > > > > > > > > > > > > > > > > pretty > > > > > > > > > > > > > > > > sure > > > > > > > > > > > > > > > > I > > > > > > > > > > > > > > > > have working configurations for > > > > > > > > > > > > > > > > everything > > > > > > > > > > > > > > > > we > > > > > > > > > > > > > > > > talked > > > > > > > > > > > > > > > > about, > > > > > > > > > > > > > > > > so > > > > > > > > > > > > > > > > I > > > > > > > > > > > > > > > > think it's really just a matter of > > > > > > > > > > > > > > > > plugging > > > > > > > > > > > > > > > > in > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > apps. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > David > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > > > > > > > > To: "David Kelly" > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Sent: Saturday, March 9, 2013 11:03:15 AM > > > > > > > > > > > > > > > > Subject: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi David, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I just wanted to let you know that Im > > > > > > > > > > > > > > > > looking > > > > > > > > > > > > > > > > into > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > run > > > > > > > > > > > > > > > > options > > > > > > > > > > > > > > > > now. Im hoping to try a few... WIll see > > > > > > > > > > > > > > > > how > > > > > > > > > > > > > > > > much > > > > > > > > > > > > > > > > help > > > > > > > > > > > > > > > > I > > > > > > > > > > > > > > > > need. > > > > > > > > > > > > > > > > Have > > > > > > > > > > > > > > > > you decided on a driving time and made > > > > > > > > > > > > > > > > hotel > > > > > > > > > > > > > > > > arrangements? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I would feel free to stay for whatever > > > > > > > > > > > > > > > > portion > > > > > > > > > > > > > > > > of > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > OSG > > > > > > > > > > > > > > > > meeting > > > > > > > > > > > > > > > > you > > > > > > > > > > > > > > > > feel is of value. The only thing I ask is > > > > > > > > > > > > > > > > that > > > > > > > > > > > > > > > > for > > > > > > > > > > > > > > > > Wed > > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > Thu > > > > > > > > > > > > > > > > you > > > > > > > > > > > > > > > > stay available online for user-support or > > > > > > > > > > > > > > > > other > > > > > > > > > > > > > > > > assistance > > > > > > > > > > > > > > > > needs > > > > > > > > > > > > > > > > that come up here. And that you engage > > > > > > > > > > > > > > > > with > > > > > > > > > > > > > > > > people > > > > > > > > > > > > > > > > that > > > > > > > > > > > > > > > > can > > > > > > > > > > > > > > > > help > > > > > > > > > > > > > > > > us > > > > > > > > > > > > > > > > develop the Swift user community and > > > > > > > > > > > > > > > > reliable > > > > > > > > > > > > > > > > OSG > > > > > > > > > > > > > > > > usage. > > > > > > > > > > > > > > > > Rob, > > > > > > > > > > > > > > > > Marco, > > > > > > > > > > > > > > > > Lincoln, and Suchandra would be good to > > > > > > > > > > > > > > > > hang > > > > > > > > > > > > > > > > out > > > > > > > > > > > > > > > > with > > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > they > > > > > > > > > > > > > > > > can > > > > > > > > > > > > > > > > introduce you to good contacts. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Of course we will cover your expenses via > > > > > > > > > > > > > > > > a > > > > > > > > > > > > > > > > UChicago > > > > > > > > > > > > > > > > travel > > > > > > > > > > > > > > > > expense > > > > > > > > > > > > > > > > report. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > We'll be starting a project with a tiny > > > > > > > > > > > > > > > > bit > > > > > > > > > > > > > > > > of > > > > > > > > > > > > > > > > additional > > > > > > > > > > > > > > > > ExTENCI > > > > > > > > > > > > > > > > funds to make Swift do smarter data > > > > > > > > > > > > > > > > management > > > > > > > > > > > > > > > > on > > > > > > > > > > > > > > > > OSG > > > > > > > > > > > > > > > > sites > > > > > > > > > > > > > > > > (and > > > > > > > > > > > > > > > > in > > > > > > > > > > > > > > > > general) so anything you learn about OSG > > > > > > > > > > > > > > > > storage > > > > > > > > > > > > > > > > elements/services/tools will be valuable > > > > > > > > > > > > > > > > for > > > > > > > > > > > > > > > > that > > > > > > > > > > > > > > > > (srmcp, > > > > > > > > > > > > > > > > lcgcp, > > > > > > > > > > > > > > > > etc). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Between now and your talk, lets just > > > > > > > > > > > > > > > > focus > > > > > > > > > > > > > > > > on > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > talk, > > > > > > > > > > > > > > > > OK? > > > > > > > > > > > > > > > > Im > > > > > > > > > > > > > > > > hoping > > > > > > > > > > > > > > > > we have slides frozen by Monday. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > While I fiddle, if you could do catsn or > > > > > > > > > > > > > > > > other > > > > > > > > > > > > > > > > hello-world-like > > > > > > > > > > > > > > > > tests > > > > > > > > > > > > > > > > to cover the "routes" we discussed, that > > > > > > > > > > > > > > > > would > > > > > > > > > > > > > > > > pave > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > way > > > > > > > > > > > > > > > > for > > > > > > > > > > > > > > > > plugging in the real app examples. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Sound good? Let me know of any concerns > > > > > > > > > > > > > > > > (other > > > > > > > > > > > > > > > > than > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > fact > > > > > > > > > > > > > > > > that > > > > > > > > > > > > > > > > this is a tad rushed ;) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks and regards, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > Michael Wilde > > > > > > > > > > > > > > > > Computation Institute, University of > > > > > > > > > > > > > > > > Chicago > > > > > > > > > > > > > > > > Mathematics and Computer Science Division > > > > > > > > > > > > > > > > Argonne National Laboratory > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > Swift-devel mailing list > > > > > > Swift-devel at ci.uchicago.edu > > > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > From hategan at mcs.anl.gov Sun Mar 10 21:33:35 2013 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 10 Mar 2013 19:33:35 -0700 Subject: [Swift-devel] Swift reference manual and syntax specification In-Reply-To: References: <993449932.1291539.1362950888769.JavaMail.root@mcs.anl.gov> Message-ID: <1362969215.958.5.camel@echo> On Sun, 2013-03-10 at 18:07 -0500, Tim Armstrong wrote: > Aside from formal specifications, I think one major gap is concrete > examples that illuminate the trickier points of semantics. I personally > find well-constructed examples more helpful for most purposes than formal > specifications, I completely agree. I often found our manual to be lacking. When having forgotten how to do something, I would go to it, but see no examples of, for example, how to do a tc.data line. So I think there are some principles that we should observe when writing docs, like use only one term for a given concept, don't use stuff before defining it, and always have examples for everything. Mihael From hategan at mcs.anl.gov Sun Mar 10 23:09:29 2013 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 10 Mar 2013 21:09:29 -0700 Subject: [Swift-devel] Swift crashing for runs with 1M calls In-Reply-To: References: Message-ID: <1362974969.1383.11.camel@echo> On Sat, 2013-03-02 at 00:53 +0530, Yadu Nand wrote: > I'm trying to see swift behavior with some stress on and I see crashes at > close to 1M calls/loops. > [...] > No events in 10s. > [...] > Progress: time: Sat, 02 Mar 2013 00:23:08 +0530 > Finding dependency loops...Exception in thread "Hang checker" > java.lang.StackOverflowError > at java.util.HashMap.put(HashMap.java:484) > at java.util.HashSet.add(HashSet.java:217) > at org.griphyn.vdl.karajan.HangChecker.findLoop(HangChecker.java:299) > at org.griphyn.vdl.karajan.HangChecker.findLoop(HangChecker.java:303) Yeah, the algorithm for finding loops is recursive. Since it has to deal with 1M threads, it does fail. The code may not be recursive, but the dependency is, so when the hang checker builds the dependency graph, it will essentially try to build a graph that tracks where, say, array[1000000] is. The complexity of that is 2^n in this case, so, yeah... One solution may be to stop the hang checker when things get too big. That way it can still be useful for normal stuff. The hang checker's invocation here is unfortunate though. the assignment int range[] = [2:limit:1]; does the silly thing of actually creating an array with 1M elements (if you say foreach v in range..., that does not happen). It may be useful to change setFieldValue to do a simple reference assignment instead of copying the entire array, but that might not be easy given the way that assignment is implemented (i.e. not when the array is created). Mihael From wilde at mcs.anl.gov Sun Mar 10 23:32:35 2013 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sun, 10 Mar 2013 23:32:35 -0500 (CDT) Subject: [Swift-devel] Cant get auto-coasters to run from midway to beagle In-Reply-To: <1362968914.958.0.camel@echo> Message-ID: <584154174.1299489.1362976355139.JavaMail.root@mcs.anl.gov> OK, I think Ive got the right code now. It worked well on some runs (with 600KB input files to multiple sites) that was getting errors on trunk. My first test on this 0.94 branch code using big files was run054 (midway:/home/wilde/osgdemo/modis/svn) and got the errors below. It didnt seem to get past the staging in of the first 48 jobs (48 was the job throttle), gave one coaster error, and then just started speing the coaster errors below. - Mike Swift 0.94 swift-r6362 cog-r3637 RunID: 20130311-0424-sj1nz1p5 Progress: time: Mon, 11 Mar 2013 04:24:34 +0000 Progress: time: Mon, 11 Mar 2013 04:24:44 +0000 Selecting site:269 Submitting:47 Submitted:1 Progress: time: Mon, 11 Mar 2013 04:24:51 +0000 Selecting site:269 Stage in:1 Submitted:47 Progress: time: Mon, 11 Mar 2013 04:25:04 +0000 Selecting site:269 Stage in:48 Progress: time: Mon, 11 Mar 2013 04:25:34 +0000 Selecting site:269 Stage in:48 Progress: time: Mon, 11 Mar 2013 04:26:04 +0000 Selecting site:269 Stage in:48 Progress: time: Mon, 11 Mar 2013 04:26:34 +0000 Selecting site:269 Stage in:48 Channels: {null at id://u291d7d28-13d57af2819--8000-u4607bc92-13d57af282b--8000S=MetaChannel[service-60829] -> BufferingChannel, null at id://u-35c43cfc-13d57b18527--8000-u-17a12b46-13d57b1854f--8000S=MetaChannel[service-60851] -> GSSSChannel-1577167763(5)[service-60851], /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50002=MetaChannel[service-60851] -> GSSSChannel-1577167763(5)[service-60851]} Context: service-60860 Meta context: service-60829 Progress: time: Mon, 11 Mar 2013 04:27:04 +0000 Selecting site:269 Stage in:48 Progress: time: Mon, 11 Mar 2013 04:27:34 +0000 Selecting site:269 Stage in:48 Progress: time: Mon, 11 Mar 2013 04:28:04 +0000 Selecting site:269 Stage in:48 Progress: time: Mon, 11 Mar 2013 04:28:34 +0000 Selecting site:269 Stage in:48 Channels: {null at id://u291d7d28-13d57af2819--8000-u4607bc92-13d57af282b--8000S=MetaChannel[service-60829] -> BufferingChannel, null at id://u-35c43cfc-13d57b18527--8000-u-17a12b46-13d57b1854f--8000S=MetaChannel[service-60851] -> BufferingChannel, /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50002=MetaChannel[service-60851] -> BufferingChannel} Context: service-60504 Meta context: service-60851 Channels: {null at id://u291d7d28-13d57af2819--8000-u4607bc92-13d57af282b--8000S=MetaChannel[service-60829] -> BufferingChannel, null at id://u-35c43cfc-13d57b18527--8000-u-17a12b46-13d57b1854f--8000S=MetaChannel[service-60851] -> BufferingChannel, /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50002=MetaChannel[service-60851] -> BufferingChannel} Context: service-60813 Meta context: service-60851 Channels: {null at id://u291d7d28-13d57af2819--8000-u4607bc92-13d57af282b--8000S=MetaChannel[service-60829] -> BufferingChannel, null at id://u-35c43cfc-13d57b18527--8000-u-17a12b46-13d57b1854f--8000S=MetaChannel[service-60851] -> BufferingChannel, /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50002=MetaChannel[service-60851] -> BufferingChannel} Context: service-60509 and so on, spewing rapidly! - Mike ----- Original Message ----- > From: "Mihael Hategan" > To: "Michael Wilde" > Cc: "Swift Devel" > Sent: Sunday, March 10, 2013 9:28:34 PM > Subject: Re: [Swift-devel] Cant get auto-coasters to run from midway to beagle > > Nono, I did the commits in the 0.94 branch, not trunk. > > Mihael > > On Sun, 2013-03-10 at 16:37 -0500, Michael Wilde wrote: > > Mihael, it seems that the problem is still there under the current > > trunk - see below. This is in: > > > > midway:/home/wilde/osgdemo/modis/svn/run035. > > > > The "cog modified locally" is a hopefully inconsequential change in > > worker.pl where I open stdin to /dev/null rather than close it, > > before launching an app, to remedy an unrelated MPI problem. > > > > - Mike > > > > Swift trunk swift-r6362 cog-r3637 (cog modified locally) > > > > RunID: 20130310-2055-4lqjiftd > > Progress: time: Sun, 10 Mar 2013 20:55:52 +0000 > > Progress: time: Sun, 10 Mar 2013 20:56:06 +0000 Selecting > > site:269 Submitting:47 Submitted:1 > > Progress: time: Sun, 10 Mar 2013 20:56:12 +0000 Selecting > > site:269 Stage in:1 Submitted:47 > > Progress: time: Sun, 10 Mar 2013 20:56:16 +0000 Selecting > > site:269 Stage in:25 Submitted:23 > > Progress: time: Sun, 10 Mar 2013 20:56:22 +0000 Selecting > > site:269 Stage in:48 > > Progress: time: Sun, 10 Mar 2013 20:56:52 +0000 Selecting > > site:269 Stage in:48 > > Progress: time: Sun, 10 Mar 2013 20:57:22 +0000 Selecting > > site:269 Stage in:48 > > Progress: time: Sun, 10 Mar 2013 20:57:52 +0000 Selecting > > site:269 Stage in:48 > > Progress: time: Sun, 10 Mar 2013 20:58:19 +0000 Selecting > > site:269 Stage in:47 Active:1 > > Progress: time: Sun, 10 Mar 2013 20:58:20 +0000 Selecting > > site:269 Stage in:26 Active:22 > > Progress: time: Sun, 10 Mar 2013 20:58:22 +0000 Selecting > > site:269 Stage in:24 Active:24 > > Progress: time: Sun, 10 Mar 2013 20:58:24 +0000 Selecting > > site:269 Stage in:23 Active:25 > > Progress: time: Sun, 10 Mar 2013 20:58:26 +0000 Selecting > > site:269 Active:47 Stage out:1 > > Progress: time: Sun, 10 Mar 2013 20:58:27 +0000 Selecting > > site:260 Stage in:7 Submitting:1 Submitted:1 Active:39 > > Finished successfully:9 > > Progress: time: Sun, 10 Mar 2013 20:58:28 +0000 Selecting > > site:258 Stage in:9 Submitting:1 Submitted:1 Active:24 Stage > > out:13 Finished successfully:11 > > Progress: time: Sun, 10 Mar 2013 20:58:29 +0000 Selecting > > site:245 Stage in:23 Submitted:1 Active:24 Finished > > successfully:24 > > Progress: time: Sun, 10 Mar 2013 20:58:31 +0000 Selecting > > site:245 Stage in:24 Active:23 Stage out:1 Finished > > successfully:24 > > Progress: time: Sun, 10 Mar 2013 20:58:32 +0000 Selecting > > site:245 Stage in:24 Active:23 Finished successfully:25 > > Progress: time: Sun, 10 Mar 2013 20:58:34 +0000 Selecting > > site:244 Stage in:24 Submitting:1 Stage out:22 Finished > > successfully:26 > > Progress: time: Sun, 10 Mar 2013 20:58:35 +0000 Selecting > > site:221 Stage in:25 Submitting:22 Submitted:1 Finished > > successfully:48 > > Progress: time: Sun, 10 Mar 2013 20:58:52 +0000 Selecting > > site:221 Stage in:48 Finished successfully:48 > > Progress: time: Sun, 10 Mar 2013 20:59:22 +0000 Selecting > > site:221 Stage in:48 Finished successfully:48 > > Progress: time: Sun, 10 Mar 2013 20:59:52 +0000 Selecting > > site:221 Stage in:48 Finished successfully:48 > > Progress: time: Sun, 10 Mar 2013 20:59:56 +0000 Selecting > > site:221 Stage in:47 Active:1 Finished successfully:48 > > Progress: time: Sun, 10 Mar 2013 21:00:02 +0000 Selecting > > site:221 Stage in:47 Stage out:1 Finished successfully:48 > > Progress: time: Sun, 10 Mar 2013 21:00:05 +0000 Selecting > > site:221 Stage in:47 Finished successfully:49 > > Channels: > > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] > > -> > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > null at id://u-3bec3eab-13d5616c4bd--8000-u4684d136-13d5616c4d0--8000S=MetaChannel[service-60734] > > -> BufferingChannel, > > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60734] > > -> BufferingChannel, > > null at id://u4684d136-13d5616c4d0--7fff-u-3bec3eab-13d5616c4bd--7fffC=MetaChannel[https://192.5.86.107:50000] > > -> > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]} > > Context: service-60018 > > Meta context: service-60734 > > Progress: time: Sun, 10 Mar 2013 21:00:07 +0000 Selecting > > site:220 Stage in:47 Submitted:1 Finished successfully:49 > > Channels: > > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] > > -> > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > null at id://u-3bec3eab-13d5616c4bd--8000-u4684d136-13d5616c4d0--8000S=MetaChannel[service-60734] > > -> BufferingChannel, > > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60734] > > -> BufferingChannel, > > null at id://u4684d136-13d5616c4d0--7fff-u-3bec3eab-13d5616c4bd--7fffC=MetaChannel[https://192.5.86.107:50000] > > -> > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]} > > Context: service-60263 > > Meta context: service-60734 > > Channels: > > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] > > -> > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > null at id://u-3bec3eab-13d5616c4bd--8000-u4684d136-13d5616c4d0--8000S=MetaChannel[service-60734] > > -> BufferingChannel, > > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60734] > > -> BufferingChannel, > > null at id://u4684d136-13d5616c4d0--7fff-u-3bec3eab-13d5616c4bd--7fffC=MetaChannel[https://192.5.86.107:50000] > > -> > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000]} > > Context: service-60408 > > Meta context: service-60734 > > Progress: time: Sun, 10 Mar 2013 21:00:18 +0000 Selecting > > site:220 Stage in:47 Active:1 Finished successfully:49 > > Progress: time: Sun, 10 Mar 2013 21:00:19 +0000 Selecting > > site:220 Stage in:46 Active:2 Finished successfully:49 > > Execution failed: > > Exception in getlanduse: > > Arguments: > > [home/wilde/osgdemo/modis/svn/data/modis/2002/h12v09.rgb] > > Host: beagle > > Directory: > > modis02-20130310-2055-4lqjiftd/jobs/y/getlanduse-yht64f6l > > > > Caused by: > > Shutting down worker > > getLandUse, modis02.swift, line 20 > > error null > > > > real 4m29.509s > > user 2m45.981s > > sys 0m3.520s > > > > > > ----- Original Message ----- > > > From: "Michael Wilde" > > > To: "Mihael Hategan" > > > Cc: "Swift Devel" > > > Sent: Sunday, March 10, 2013 3:20:53 PM > > > Subject: Re: [Swift-devel] Cant get auto-coasters to run from > > > midway to beagle > > > > > > Duh. Thank you. I didn't build a new release, was using same > > > 0.94 > > > RC4 code. > > > > > > Sorry about that. Will retest. > > > > > > - Mike > > > > > > > > > ----- Original Message ----- > > > > From: "Mihael Hategan" > > > > To: "Michael Wilde" > > > > Cc: "Swift Devel" > > > > Sent: Sunday, March 10, 2013 3:06:25 PM > > > > Subject: Re: [Swift-devel] Cant get auto-coasters to run from > > > > midway to beagle > > > > > > > > ChannelContext Notifying commands and handlers about exception > > > > org.globus.cog.karajan.workflow.service.TimeoutException: > > > > Channel > > > > timed > > > > out. lastTime=940817-071255.807, now=130310-164156.506, > > > > channel=GSSSChannel-1463847073(1)[service-60519] > > > > > > > > Are you sure you are running with the latest code? . There was > > > > a > > > > (inconsequential mostly) bug before that set lastTime to > > > > Long.MAX_TIME > > > > before creating that exception. That was fixed. Your message > > > > indicates > > > > the code you are using does not have that fix (year xx94 is > > > > what > > > > comes > > > > out of Long.MAX_TIME). > > > > > > > > I gotta go now, but I'll come back later and check some more. > > > > There > > > > is > > > > something weird going on there besides that. > > > > > > > > Mihael > > > > > > > > On Sun, 2013-03-10 at 12:01 -0500, Michael Wilde wrote: > > > > > Here's run034: seems to be a bit better, but still dies. > > > > > This is > > > > > with throttle of 48 jobs on 48 cores (2 nodes), fom swift.rcc > > > > > to > > > > > beagle. 17MB files. Still seems to curiously die about 4 > > > > > mins > > > > > into the run, which suggests some kind of timeout is still > > > > > lurking??? > > > > > > > > > > - Mike > > > > > > > > > > Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified locally) > > > > > > > > > > RunID: 20130310-1639-kyb8hca9 > > > > > Progress: time: Sun, 10 Mar 2013 16:39:45 +0000 > > > > > Progress: time: Sun, 10 Mar 2013 16:39:56 +0000 Selecting > > > > > site:269 Submitting:47 Submitted:1 > > > > > Progress: time: Sun, 10 Mar 2013 16:40:01 +0000 Selecting > > > > > site:269 Stage in:1 Submitted:47 > > > > > Progress: time: Sun, 10 Mar 2013 16:40:15 +0000 Selecting > > > > > site:269 Stage in:48 > > > > > Progress: time: Sun, 10 Mar 2013 16:40:45 +0000 Selecting > > > > > site:269 Stage in:48 > > > > > Progress: time: Sun, 10 Mar 2013 16:41:15 +0000 Selecting > > > > > site:269 Stage in:48 > > > > > Progress: time: Sun, 10 Mar 2013 16:41:45 +0000 Selecting > > > > > site:269 Stage in:48 > > > > > Progress: time: Sun, 10 Mar 2013 16:42:11 +0000 Selecting > > > > > site:269 Stage in:47 Active:1 > > > > > Progress: time: Sun, 10 Mar 2013 16:42:12 +0000 Selecting > > > > > site:269 Stage in:41 Active:7 > > > > > Progress: time: Sun, 10 Mar 2013 16:42:13 +0000 Selecting > > > > > site:269 Stage in:23 Active:25 > > > > > Progress: time: Sun, 10 Mar 2013 16:42:15 +0000 Selecting > > > > > site:269 Active:48 > > > > > Progress: time: Sun, 10 Mar 2013 16:42:17 +0000 Selecting > > > > > site:269 Active:47 Stage out:1 > > > > > Progress: time: Sun, 10 Mar 2013 16:42:18 +0000 Selecting > > > > > site:268 Stage in:1 Active:46 Stage out:1 Finished > > > > > successfully:1 > > > > > Progress: time: Sun, 10 Mar 2013 16:42:19 +0000 Selecting > > > > > site:265 Stage in:3 Submitted:1 Active:42 Stage out:2 > > > > > Finished successfully:4 > > > > > Progress: time: Sun, 10 Mar 2013 16:42:20 +0000 Selecting > > > > > site:258 Stage in:6 Submitting:5 Active:23 Stage out:13 > > > > > Finished successfully:12 > > > > > Progress: time: Sun, 10 Mar 2013 16:42:21 +0000 Selecting > > > > > site:244 Stage in:24 Submitting:1 Active:20 Stage out:3 > > > > > Finished successfully:25 > > > > > Progress: time: Sun, 10 Mar 2013 16:42:23 +0000 Selecting > > > > > site:241 Stage in:25 Submitting:3 Stage out:19 Finished > > > > > successfully:29 > > > > > Progress: time: Sun, 10 Mar 2013 16:42:24 +0000 Selecting > > > > > site:221 Stage in:28 Submitting:19 Submitted:1 Finished > > > > > successfully:48 > > > > > Progress: time: Sun, 10 Mar 2013 16:42:45 +0000 Selecting > > > > > site:221 Stage in:48 Finished successfully:48 > > > > > Progress: time: Sun, 10 Mar 2013 16:42:54 +0000 Selecting > > > > > site:221 Stage in:47 Active:1 Finished successfully:48 > > > > > Progress: time: Sun, 10 Mar 2013 16:43:00 +0000 Selecting > > > > > site:221 Stage in:47 Stage out:1 Finished successfully:48 > > > > > Progress: time: Sun, 10 Mar 2013 16:43:02 +0000 Selecting > > > > > site:221 Stage in:47 Finished successfully:49 > > > > > Progress: time: Sun, 10 Mar 2013 16:43:05 +0000 Selecting > > > > > site:220 Stage in:47 Submitted:1 Finished successfully:49 > > > > > Progress: time: Sun, 10 Mar 2013 16:43:15 +0000 Selecting > > > > > site:220 Stage in:48 Finished successfully:49 > > > > > Progress: time: Sun, 10 Mar 2013 16:43:45 +0000 Selecting > > > > > site:220 Stage in:48 Finished successfully:49 > > > > > Channels: > > > > > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] > > > > > -> > > > > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > > > > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60519] > > > > > -> BufferingChannel, > > > > > null at id://u7b315f9a-13d552c3f68--7fff-u-362f30fc-13d552c3f50--7fffC=MetaChannel[https://192.5.86.107:50000] > > > > > -> > > > > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > > > > null at id://u-362f30fc-13d552c3f50--8000-u7b315f9a-13d552c3f68--8000S=MetaChannel[service-60519] > > > > > -> BufferingChannel} > > > > > Context: service-60859 > > > > > Meta context: service-60519 > > > > > Progress: time: Sun, 10 Mar 2013 16:43:59 +0000 Selecting > > > > > site:220 Stage in:47 Active:1 Finished successfully:49 > > > > > Channels: > > > > > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] > > > > > -> > > > > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > > > > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60519] > > > > > -> BufferingChannel, > > > > > null at id://u7b315f9a-13d552c3f68--7fff-u-362f30fc-13d552c3f50--7fffC=MetaChannel[https://192.5.86.107:50000] > > > > > -> > > > > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > > > > null at id://u-362f30fc-13d552c3f50--8000-u7b315f9a-13d552c3f68--8000S=MetaChannel[service-60519] > > > > > -> BufferingChannel} > > > > > Context: service-60663 > > > > > Meta context: service-60519 > > > > > Progress: time: Sun, 10 Mar 2013 16:44:05 +0000 Selecting > > > > > site:220 Stage in:47 Stage out:1 Finished successfully:49 > > > > > Progress: time: Sun, 10 Mar 2013 16:44:07 +0000 Selecting > > > > > site:220 Stage in:47 Finished successfully:50 > > > > > Channels: > > > > > {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] > > > > > -> > > > > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > > > > /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60519] > > > > > -> BufferingChannel, > > > > > null at id://u7b315f9a-13d552c3f68--7fff-u-362f30fc-13d552c3f50--7fffC=MetaChannel[https://192.5.86.107:50000] > > > > > -> > > > > > GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], > > > > > null at id://u-362f30fc-13d552c3f50--8000-u7b315f9a-13d552c3f68--8000S=MetaChannel[service-60519] > > > > > -> BufferingChannel} > > > > > Context: service-60081 > > > > > Meta context: service-60519 > > > > > Progress: time: Sun, 10 Mar 2013 16:44:09 +0000 Selecting > > > > > site:219 Stage in:45 Submitting:1 Active:2 Finished > > > > > successfully:50 > > > > > Execution failed: > > > > > Exception in getlanduse: > > > > > Arguments: > > > > > [home/wilde/osgdemo/modis/svn/data/modis/2002/h02v11.rgb] > > > > > Host: beagle > > > > > Directory: > > > > > modis02-20130310-1639-kyb8hca9/jobs/9/getlanduse-90fyse6l > > > > > > > > > > Caused by: > > > > > Shutting down worker > > > > > getLandUse, modis02.swift, line 20 > > > > > error null > > > > > > > > > > real 4m27.007s > > > > > user 2m44.221s > > > > > sys 0m3.448s > > > > > + mv /home/wilde/.swift/runs/current/run034.1362933583 > > > > > /home/wilde/.swift/runs/completed > > > > > midway001$ > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > From: "Mihael Hategan" > > > > > > To: "Michael Wilde" > > > > > > Cc: "Swift Devel" > > > > > > Sent: Sunday, March 10, 2013 1:36:26 AM > > > > > > Subject: Re: [Swift-devel] Cant get auto-coasters to run > > > > > > from > > > > > > midway to beagle > > > > > > > > > > > > Please try now. I made some changes: > > > > > > > > > > > > 1. start the service with "-l" so that things in your > > > > > > .profile > > > > > > (such > > > > > > as > > > > > > module load sun-java) would be picked up. However, this > > > > > > also > > > > > > means > > > > > > that > > > > > > you should unset X509_* stuff or the sshcl proxy forwarding > > > > > > will > > > > > > not > > > > > > work properly. > > > > > > > > > > > > 2. I fixed a bug that caused an extra connection to the > > > > > > coaster > > > > > > service. > > > > > > Normally the service connects back to the client and both > > > > > > use > > > > > > that > > > > > > connection. However, due to some changes in the way > > > > > > credentials > > > > > > were > > > > > > set > > > > > > for jobs, and the fact that connections were looked up > > > > > > based on > > > > > > both > > > > > > hostname and credential, the coaster client would ignore > > > > > > the > > > > > > existing > > > > > > connection and create another one. The initial one with > > > > > > then > > > > > > time > > > > > > out > > > > > > at > > > > > > some point causing the service to crash. > > > > > > > > > > > > Mihael > > > > > > > > > > > > On Sat, 2013-03-09 at 17:49 -0600, Michael Wilde wrote: > > > > > > > An update on this provider staging related issue: > > > > > > > reducing > > > > > > > filesize > > > > > > > from 17MB to 600KB runs well. > > > > > > > > > > > > > > So seems like some kind of flow control or buffer > > > > > > > management > > > > > > > problem, possibly? > > > > > > > > > > > > > > May need to take that problem offline - would be a > > > > > > > perfect > > > > > > > test > > > > > > > case for Yadu to develop a new stress test for. > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > ----- Forwarded Message ----- > > > > > > > From: "Michael Wilde" > > > > > > > To: "David Kelly" > > > > > > > Sent: Saturday, March 9, 2013 5:21:49 PM > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > OK, much better: with 600K files (5x5 reduction or 25X > > > > > > > smaller) > > > > > > > it > > > > > > > works well, and fast (form midway to beagle!) > > > > > > > > > > > > > > Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified > > > > > > > locally) > > > > > > > > > > > > > > RunID: 20130309-2319-5zq0jrfg > > > > > > > Progress: time: Sat, 09 Mar 2013 23:19:45 +0000 > > > > > > > Progress: time: Sat, 09 Mar 2013 23:19:56 +0000 > > > > > > > Selecting > > > > > > > site:269 Submitting:47 Submitted:1 > > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:05 +0000 > > > > > > > Selecting > > > > > > > site:269 Stage in:1 Submitted:47 > > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:09 +0000 > > > > > > > Selecting > > > > > > > site:269 Stage in:47 Active:1 > > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:10 +0000 > > > > > > > Selecting > > > > > > > site:269 Stage in:46 Active:1 Stage out:1 > > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:11 +0000 > > > > > > > Selecting > > > > > > > site:250 Stage in:19 Active:28 Stage out:1 Finished > > > > > > > successfully:19 > > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:12 +0000 > > > > > > > Selecting > > > > > > > site:229 Stage in:18 Submitting:21 Active:1 Stage > > > > > > > out:7 > > > > > > > Finished successfully:41 > > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:13 +0000 > > > > > > > Selecting > > > > > > > site:220 Stage in:41 Submitting:1 Active:5 Stage > > > > > > > out:1 > > > > > > > Finished successfully:49 > > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:14 +0000 > > > > > > > Selecting > > > > > > > site:220 Stage in:38 Active:1 Stage out:9 Finished > > > > > > > successfully:49 > > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:15 +0000 > > > > > > > Selecting > > > > > > > site:212 Stage in:30 Submitting:8 Stage out:9 > > > > > > > Finished > > > > > > > successfully:58 > > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:16 +0000 > > > > > > > Selecting > > > > > > > site:203 Stage in:38 Submitting:8 Submitted:1 > > > > > > > Finished > > > > > > > successfully:67 > > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:18 +0000 > > > > > > > Selecting > > > > > > > site:202 Stage in:19 Stage out:28 Finished > > > > > > > successfully:68 > > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:19 +0000 > > > > > > > Selecting > > > > > > > site:172 Stage in:33 Submitting:2 Submitted:6 > > > > > > > Active:5 > > > > > > > Stage > > > > > > > out:2 Finished successfully:97 > > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:20 +0000 > > > > > > > Selecting > > > > > > > site:170 Stage in:31 Submitting:2 Stage out:14 > > > > > > > Finished > > > > > > > successfully:100 > > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:21 +0000 > > > > > > > Selecting > > > > > > > site:162 Stage in:30 Submitting:10 Stage out:6 > > > > > > > Finished > > > > > > > successfully:109 > > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:22 +0000 > > > > > > > Selecting > > > > > > > site:154 Stage in:39 Submitting:5 Submitted:3 > > > > > > > Active:1 > > > > > > > Finished successfully:115 > > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:23 +0000 > > > > > > > Selecting > > > > > > > site:154 Stage in:21 Active:10 Stage out:16 Finished > > > > > > > successfully:116 > > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:24 +0000 > > > > > > > Selecting > > > > > > > site:126 Stage in:20 Submitting:25 Submitted:1 Stage > > > > > > > out:2 > > > > > > > Finished successfully:143 > > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:25 +0000 > > > > > > > Selecting > > > > > > > site:124 Stage in:31 Active:2 Stage out:15 Finished > > > > > > > successfully:145 > > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:26 +0000 > > > > > > > Selecting > > > > > > > site:110 Stage in:30 Submitting:14 Stage out:3 > > > > > > > Finished > > > > > > > successfully:160 > > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:27 +0000 > > > > > > > Selecting > > > > > > > site:106 Stage in:43 Submitting:1 Submitted:1 > > > > > > > Active:1 > > > > > > > Stage > > > > > > > out:2 Finished successfully:163 > > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:28 +0000 > > > > > > > Selecting > > > > > > > site:104 Stage in:20 Submitting:2 Active:7 Stage > > > > > > > out:19 > > > > > > > Finished successfully:165 > > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:29 +0000 > > > > > > > Selecting > > > > > > > site:78 > > > > > > > Stage in:29 Submitting:16 Submitted:1 Stage out:2 > > > > > > > Finished > > > > > > > successfully:191 > > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:31 +0000 > > > > > > > Selecting > > > > > > > site:76 > > > > > > > Stage in:30 Stage out:17 Finished successfully:194 > > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:32 +0000 > > > > > > > Selecting > > > > > > > site:58 > > > > > > > Stage in:29 Submitting:18 Active:1 Finished > > > > > > > successfully:211 > > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:33 +0000 > > > > > > > Selecting > > > > > > > site:58 > > > > > > > Stage in:33 Active:3 Stage out:12 Finished > > > > > > > successfully:211 > > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:34 +0000 > > > > > > > Selecting > > > > > > > site:46 > > > > > > > Stage in:18 Submitting:11 Submitted:1 Active:2 Stage > > > > > > > out:14 > > > > > > > Finished successfully:225 > > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:35 +0000 > > > > > > > Selecting > > > > > > > site:30 > > > > > > > Stage in:29 Active:14 Stage out:3 Finished > > > > > > > successfully:241 > > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:36 +0000 > > > > > > > Selecting > > > > > > > site:28 > > > > > > > Stage in:28 Submitting:2 Stage out:17 Finished > > > > > > > successfully:242 > > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:37 +0000 > > > > > > > Selecting > > > > > > > site:10 > > > > > > > Stage in:30 Submitting:17 Submitted:1 Finished > > > > > > > successfully:259 > > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:38 +0000 > > > > > > > Selecting > > > > > > > site:10 > > > > > > > Stage in:35 Stage out:13 Finished successfully:259 > > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:39 +0000 Stage > > > > > > > in:21 > > > > > > > Submitting:6 Submitted:3 Stage out:15 Finished > > > > > > > successfully:272 > > > > > > > Progress: time: Sat, 09 Mar 2013 23:20:40 +0000 Stage > > > > > > > in:10 > > > > > > > Active:5 Stage out:14 Finished successfully:288 > > > > > > > Final status: Sat, 09 Mar 2013 23:20:41 +0000 Finished > > > > > > > successfully:317 > > > > > > > > > > > > > > real 0m58.953s > > > > > > > user 0m32.573s > > > > > > > sys 0m1.263s > > > > > > > + mv /home/wilde/.swift/runs/current/run029.1362871183 > > > > > > > /home/wilde/.swift/runs/completed > > > > > > > midway001$ > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > From: "David Kelly" > > > > > > > > To: "Michael Wilde" > > > > > > > > Sent: Saturday, March 9, 2013 5:12:59 PM > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > Yep - I had a version where the input files were in a > > > > > > > > very > > > > > > > > similar > > > > > > > > format (PGM, 1 byte per pixel). I'll add that back, but > > > > > > > > without > > > > > > > > the > > > > > > > > small PGM header in the files. > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > To: "David Kelly" > > > > > > > > Sent: Saturday, March 9, 2013 5:04:43 PM > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > I think we need to cut down the size of these files for > > > > > > > > a > > > > > > > > demo > > > > > > > > (although they are great for a stress test). > > > > > > > > > > > > > > > > First, the RGB format by itself uses 3 bytes per pixel > > > > > > > > when > > > > > > > > it > > > > > > > > only > > > > > > > > needs one (for land use) > > > > > > > > > > > > > > > > Second, we should cut down by a factor of 9 (3x3) or 16 > > > > > > > > (4x4). > > > > > > > > > > > > > > > > I tried that using simple convert statements, but it > > > > > > > > always > > > > > > > > seems > > > > > > > > to > > > > > > > > yield a file exactly double what it should be. > > > > > > > > > > > > > > > > More on this later; was hoping to get things working > > > > > > > > "as > > > > > > > > is" > > > > > > > > first. > > > > > > > > > > > > > > > > I assume you could get the perl code to work on > > > > > > > > one-byte-per-pixel > > > > > > > > instead of the default 3 for the convert rgb format? > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > From: "David Kelly" > > > > > > > > > To: "Michael Wilde" > > > > > > > > > Sent: Saturday, March 9, 2013 4:36:30 PM > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > That would probably be a good idea for a new script, > > > > > > > > > to > > > > > > > > > show > > > > > > > > > how to > > > > > > > > > stage apps like that. For now I updated the scripts > > > > > > > > > on > > > > > > > > > lustre.. > > > > > > > > > hopefully that helps. > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > To: "David Kelly" > > > > > > > > > Sent: Saturday, March 9, 2013 4:29:14 PM > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > OK, I see that its trying to run getlanduse.sh from > > > > > > > > > your > > > > > > > > > /lustre > > > > > > > > > dir > > > > > > > > > on beagle, which is different than the one Ive got > > > > > > > > > checked > > > > > > > > > out. > > > > > > > > > It > > > > > > > > > seems to get an error in a stderr redirect??? Let me > > > > > > > > > se > > > > > > > > > what I > > > > > > > > > need > > > > > > > > > to do to get the beagle side in sync. > > > > > > > > > > > > > > > > > > Seems like since these are perl scripts, we should > > > > > > > > > make > > > > > > > > > the > > > > > > > > > app() > > > > > > > > > /bin/sh and send the script as data, perhaps? > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > > To: "David Kelly" > > > > > > > > > > Sent: Saturday, March 9, 2013 4:19:31 PM > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > OK, making progress. Now I dialed down the throttle > > > > > > > > > > and > > > > > > > > > > node > > > > > > > > > > counts > > > > > > > > > > to 48 jobs. > > > > > > > > > > > > > > > > > > > > Now I get further, for ./demo and site=4 script=2: > > > > > > > > > > > > > > > > > > > > RunID: 20130309-2214-1oi3rvea > > > > > > > > > > Progress: time: Sat, 09 Mar 2013 22:14:06 +0000 > > > > > > > > > > Progress: time: Sat, 09 Mar 2013 22:14:17 +0000 > > > > > > > > > > Selecting > > > > > > > > > > site:269 > > > > > > > > > > Submitting:47 Submitted:1 > > > > > > > > > > Progress: time: Sat, 09 Mar 2013 22:14:22 +0000 > > > > > > > > > > Selecting > > > > > > > > > > site:269 > > > > > > > > > > Stage in:1 Submitted:47 > > > > > > > > > > Progress: time: Sat, 09 Mar 2013 22:14:28 +0000 > > > > > > > > > > Selecting > > > > > > > > > > site:269 > > > > > > > > > > Stage in:25 Submitted:23 > > > > > > > > > > Progress: time: Sat, 09 Mar 2013 22:14:36 +0000 > > > > > > > > > > Selecting > > > > > > > > > > site:269 > > > > > > > > > > Stage in:48 > > > > > > > > > > Progress: time: Sat, 09 Mar 2013 22:15:06 +0000 > > > > > > > > > > Selecting > > > > > > > > > > site:269 > > > > > > > > > > Stage in:48 > > > > > > > > > > Progress: time: Sat, 09 Mar 2013 22:15:36 +0000 > > > > > > > > > > Selecting > > > > > > > > > > site:269 > > > > > > > > > > Stage in:48 > > > > > > > > > > Progress: time: Sat, 09 Mar 2013 22:16:06 +0000 > > > > > > > > > > Selecting > > > > > > > > > > site:269 > > > > > > > > > > Stage in:48 > > > > > > > > > > Progress: time: Sat, 09 Mar 2013 22:16:26 +0000 > > > > > > > > > > Selecting > > > > > > > > > > site:269 > > > > > > > > > > Stage in:47 Active:1 > > > > > > > > > > Progress: time: Sat, 09 Mar 2013 22:16:27 +0000 > > > > > > > > > > Selecting > > > > > > > > > > site:269 > > > > > > > > > > Stage in:36 Active:12 > > > > > > > > > > Progress: time: Sat, 09 Mar 2013 22:16:29 +0000 > > > > > > > > > > Selecting > > > > > > > > > > site:269 > > > > > > > > > > Stage in:24 Active:24 > > > > > > > > > > Progress: time: Sat, 09 Mar 2013 22:16:34 +0000 > > > > > > > > > > Selecting > > > > > > > > > > site:269 > > > > > > > > > > Stage in:24 Active:23 Stage out:1 > > > > > > > > > > Progress: time: Sat, 09 Mar 2013 22:16:35 +0000 > > > > > > > > > > Selecting > > > > > > > > > > site:269 > > > > > > > > > > Stage in:14 Active:33 Stage out:1 > > > > > > > > > > Execution failed: > > > > > > > > > > Exception in getlanduse: > > > > > > > > > > Arguments: > > > > > > > > > > [home/wilde/osgdemo/modis/svn/data/modis/2002/h08v04.rgb] > > > > > > > > > > Host: beagle > > > > > > > > > > Directory: > > > > > > > > > > modis02-20130309-2214-1oi3rvea/jobs/k/getlanduse-ko5qjd6l > > > > > > > > > > > > > > > > > > > > Caused by: > > > > > > > > > > Application > > > > > > > > > > /lustre/beagle/davidk/modis/bin/getlanduse.sh > > > > > > > > > > failed > > > > > > > > > > with an exit code of 1 > > > > > > > > > > getLandUse, modis02.swift, line 20 > > > > > > > > > > > > > > > > > > > > real 2m31.463s > > > > > > > > > > user 1m33.238s > > > > > > > > > > sys 0m2.160s > > > > > > > > > > + mv > > > > > > > > > > /home/wilde/.swift/runs/current/run024.1362867244 > > > > > > > > > > /home/wilde/.swift/runs/completed > > > > > > > > > > midway001$ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > From: "David Kelly" > > > > > > > > > > > To: "Michael Wilde" > > > > > > > > > > > Sent: Saturday, March 9, 2013 3:55:30 PM > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ok, I'll take a look at that. The run dir I used > > > > > > > > > > > was > > > > > > > > > > > /scratch/midway/davidkelly999/modis/run011 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > > > To: "David Kelly" > > > > > > > > > > > Sent: Saturday, March 9, 2013 3:52:28 PM > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > I just tried this, but didnt work - same prob. > > > > > > > > > > > > > > > > > > > > > > But if its working for you now, we must be close. > > > > > > > > > > > > > > > > > > > > > > Not yet sure what the diff is... > > > > > > > > > > > > > > > > > > > > > > My run dir is > > > > > > > > > > > /home/wilde/osgdemo/modis/svn/run021 > > > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > From: "David Kelly" > > > > > > > > > > > > To: "Michael Wilde" > > > > > > > > > > > > Sent: Saturday, March 9, 2013 3:46:13 PM > > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Had to make sure I was using the IP address on > > > > > > > > > > > > eth4 > > > > > > > > > > > > (128.135.112.71 > > > > > > > > > > > > for midway-login1), not a local address or an > > > > > > > > > > > > infiniband > > > > > > > > > > > > address. > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "David Kelly" > > > > > > > > > > > > To: "Michael Wilde" > > > > > > > > > > > > Sent: Saturday, March 9, 2013 3:43:51 PM > > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I just got it working. I had to adjust for the > > > > > > > > > > > > differences in > > > > > > > > > > > > my > > > > > > > > > > > > username on Beagle/Midway, then I had to set > > > > > > > > > > > > GLOBUS_HOSTNAME > > > > > > > > > > > > on > > > > > > > > > > > > Midway to the IP address, rather than the full > > > > > > > > > > > > hostname > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > > > > To: "David Kelly" > > > > > > > > > > > > Sent: Saturday, March 9, 2013 3:40:03 PM > > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > From: "David Kelly" > > > > > > > > > > > > > To: "Michael Wilde" > > > > > > > > > > > > > Sent: Saturday, March 9, 2013 3:34:58 PM > > > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Is your username the same on beagle and > > > > > > > > > > > > > midway? > > > > > > > > > > > > > > > > > > > > > > > > Yes. And I verified that I can ssh to login4 on > > > > > > > > > > > > beagle > > > > > > > > > > > > from > > > > > > > > > > > > my > > > > > > > > > > > > midway > > > > > > > > > > > > session (as indeed the scp's of the proxy files > > > > > > > > > > > > seem > > > > > > > > > > > > to > > > > > > > > > > > > be > > > > > > > > > > > > working) > > > > > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > > > > > To: "David Kelly" > > > > > > > > > > > > > Sent: Saturday, March 9, 2013 3:34:28 PM > > > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > OK. > > > > > > > > > > > > > > > > > > > > > > > > > > Ignore what I said about "problem finding > > > > > > > > > > > > > java" - > > > > > > > > > > > > > thats > > > > > > > > > > > > > code > > > > > > > > > > > > > in > > > > > > > > > > > > > the > > > > > > > > > > > > > very long escaped shell command that gets > > > > > > > > > > > > > sent to > > > > > > > > > > > > > the > > > > > > > > > > > > > remote > > > > > > > > > > > > > side. > > > > > > > > > > > > > I > > > > > > > > > > > > > dont *think* thats the problem. > > > > > > > > > > > > > > > > > > > > > > > > > > I also verified that beagle can connect to > > > > > > > > > > > > > ports > > > > > > > > > > > > > 50001 > > > > > > > > > > > > > etc > > > > > > > > > > > > > on > > > > > > > > > > > > > swift.rcc, and that seems OK. > > > > > > > > > > > > > > > > > > > > > > > > > > I exported > > > > > > > > > > > > > GLOBUS_HOSTNAME=midway001.rcc.uchicago.edu > > > > > > > > > > > > > on > > > > > > > > > > > > > the > > > > > > > > > > > > > midway > > > > > > > > > > > > > side. And the beagle side seems to be > > > > > > > > > > > > > connecting > > > > > > > > > > > > > there. > > > > > > > > > > > > > > > > > > > > > > > > > > Im a bit confused about the timestamps I see > > > > > > > > > > > > > for > > > > > > > > > > > > > the > > > > > > > > > > > > > proxy > > > > > > > > > > > > > expiration > > > > > > > > > > > > > time, but am not yet suspicious of that > > > > > > > > > > > > > (although > > > > > > > > > > > > > it > > > > > > > > > > > > > seems > > > > > > > > > > > > > less > > > > > > > > > > > > > than > > > > > > > > > > > > > 5 hours past GMT... not sure.) > > > > > > > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > From: "David Kelly" > > > > > > > > > > > > > > > > > > > > > > > > > > > > To: "Michael Wilde" > > > > > > > > > > > > > > Sent: Saturday, March 9, 2013 3:26:32 PM > > > > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'm seeing the same error now.. looking > > > > > > > > > > > > > > into it > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > > > > > > To: "David Kelly" > > > > > > > > > > > > > > Sent: Saturday, March 9, 2013 3:21:30 PM > > > > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > Looking deeper I see that the logs show > > > > > > > > > > > > > > problems > > > > > > > > > > > > > > with > > > > > > > > > > > > > > finding > > > > > > > > > > > > > > Java, > > > > > > > > > > > > > > I > > > > > > > > > > > > > > assume on beagle, ans also service ending > > > > > > > > > > > > > > (presumably > > > > > > > > > > > > > > coaster > > > > > > > > > > > > > > service on midway host). > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'll dig into these two. > > > > > > > > > > > > > > > > > > > > > > > > > > > > I see that it scp's the proxies to beagle > > > > > > > > > > > > > > which > > > > > > > > > > > > > > I > > > > > > > > > > > > > > think > > > > > > > > > > > > > > answers > > > > > > > > > > > > > > my > > > > > > > > > > > > > > question about security. > > > > > > > > > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > > > > > > > To: "David Kelly" > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Sent: Saturday, March 9, 2013 3:15:01 PM > > > > > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > OK. Any thoughts about beagle? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Ive been experimenting but still cant get > > > > > > > > > > > > > > > it > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > work, > > > > > > > > > > > > > > > same > > > > > > > > > > > > > > > error > > > > > > > > > > > > > > > (cant connect to bootstrap port) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > WHen you tried ssh-cl to beagle with > > > > > > > > > > > > > > > automatic > > > > > > > > > > > > > > > coasters, > > > > > > > > > > > > > > > what > > > > > > > > > > > > > > > configuration (sites env etc) did you > > > > > > > > > > > > > > > use? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I verified that beagle can connect back > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > midway > > > > > > > > > > > > > > > hosts > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > ports. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Do we need to specify security or create > > > > > > > > > > > > > > > a > > > > > > > > > > > > > > > proxy > > > > > > > > > > > > > > > etc? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > From: "David Kelly" > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > To: "Michael Wilde" > > > > > > > > > > > > > > > > Sent: Saturday, March 9, 2013 3:08:58 > > > > > > > > > > > > > > > > PM > > > > > > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > One way you can override/customize the > > > > > > > > > > > > > > > > default > > > > > > > > > > > > > > > > templates > > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > create > > > > > > > > > > > > > > > > them in $HOME/.swift/sites (I'm not > > > > > > > > > > > > > > > > sure if > > > > > > > > > > > > > > > > that's > > > > > > > > > > > > > > > > what > > > > > > > > > > > > > > > > you > > > > > > > > > > > > > > > > mean > > > > > > > > > > > > > > > > by > > > > > > > > > > > > > > > > a local sites dir or not). But you are > > > > > > > > > > > > > > > > right > > > > > > > > > > > > > > > > about > > > > > > > > > > > > > > > > Midway > > > > > > > > > > > > > > > > - > > > > > > > > > > > > > > > > I > > > > > > > > > > > > > > > > have > > > > > > > > > > > > > > > > noticed that when using modis it will > > > > > > > > > > > > > > > > sometimes > > > > > > > > > > > > > > > > get > > > > > > > > > > > > > > > > stuck > > > > > > > > > > > > > > > > when > > > > > > > > > > > > > > > > it > > > > > > > > > > > > > > > > goes to a queue that is busy. Ideally > > > > > > > > > > > > > > > > swift > > > > > > > > > > > > > > > > replication > > > > > > > > > > > > > > > > would > > > > > > > > > > > > > > > > be > > > > > > > > > > > > > > > > able to help better handle that, but I > > > > > > > > > > > > > > > > haven't > > > > > > > > > > > > > > > > had > > > > > > > > > > > > > > > > much > > > > > > > > > > > > > > > > luck > > > > > > > > > > > > > > > > with > > > > > > > > > > > > > > > > that yet. Another way around this may > > > > > > > > > > > > > > > > be to > > > > > > > > > > > > > > > > add > > > > > > > > > > > > > > > > this > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > template: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > key="slurm.exclusive">false > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The swift.log issue was never fixed. It > > > > > > > > > > > > > > > > went > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > swift-devel > > > > > > > > > > > > > > > > for > > > > > > > > > > > > > > > > discussion but was never fixed. I think > > > > > > > > > > > > > > > > it > > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > relatively > > > > > > > > > > > > > > > > simple > > > > > > > > > > > > > > > > though.. probably worth fixing before > > > > > > > > > > > > > > > > release. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > To: "David Kelly" > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Sent: Saturday, March 9, 2013 1:38:47 > > > > > > > > > > > > > > > > PM > > > > > > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > OK, sounds good re the trip plan. Feel > > > > > > > > > > > > > > > > free > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > stay > > > > > > > > > > > > > > > > Tue > > > > > > > > > > > > > > > > night > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > avoid a 4hr drive after a long day. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Im trying the modis demo. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I tried to create a local sites/ dir so > > > > > > > > > > > > > > > > I > > > > > > > > > > > > > > > > can > > > > > > > > > > > > > > > > modify > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > sites > > > > > > > > > > > > > > > > templates; thats not working for me > > > > > > > > > > > > > > > > either > > > > > > > > > > > > > > > > yet. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > For midway, need to force to westmere > > > > > > > > > > > > > > > > or > > > > > > > > > > > > > > > > sandyb > > > > > > > > > > > > > > > > (but > > > > > > > > > > > > > > > > not > > > > > > > > > > > > > > > > both) > > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > ensure 1-node jobs, because either > > > > > > > > > > > > > > > > queue > > > > > > > > > > > > > > > > can > > > > > > > > > > > > > > > > get > > > > > > > > > > > > > > > > filled > > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > not > > > > > > > > > > > > > > > > yield an idle node for a long time. > > > > > > > > > > > > > > > > maybe > > > > > > > > > > > > > > > > need to > > > > > > > > > > > > > > > > fiddle > > > > > > > > > > > > > > > > jobsPerNode > > > > > > > > > > > > > > > > to get at least 1 core when the system > > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > busy > > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > *pretend* > > > > > > > > > > > > > > > > that > > > > > > > > > > > > > > > > its a node. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > So to get response I tried beagle-ssh; > > > > > > > > > > > > > > > > That > > > > > > > > > > > > > > > > isnt > > > > > > > > > > > > > > > > working > > > > > > > > > > > > > > > > because > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > template sites file is wrong in swift > > > > > > > > > > > > > > > > 0.94 > > > > > > > > > > > > > > > > rc4. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I also see that swift.log is still > > > > > > > > > > > > > > > > getting > > > > > > > > > > > > > > > > produced - > > > > > > > > > > > > > > > > I > > > > > > > > > > > > > > > > thought > > > > > > > > > > > > > > > > we > > > > > > > > > > > > > > > > eliminated that. Did it come back due > > > > > > > > > > > > > > > > to a > > > > > > > > > > > > > > > > problem > > > > > > > > > > > > > > > > with > > > > > > > > > > > > > > > > that > > > > > > > > > > > > > > > > fix? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'll keep hacking; suggestions welcome. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > From: "David Kelly" > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > To: "Michael Wilde" > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Sent: Saturday, March 9, 2013 > > > > > > > > > > > > > > > > > 12:20:00 PM > > > > > > > > > > > > > > > > > Subject: Re: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Mike, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Looking more closely at the agenda, I > > > > > > > > > > > > > > > > > think > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > most > > > > > > > > > > > > > > > > > interesting/useful talks will be on > > > > > > > > > > > > > > > > > Tuesday. > > > > > > > > > > > > > > > > > Monday > > > > > > > > > > > > > > > > > I'll > > > > > > > > > > > > > > > > > come > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > Argonne to work on any loose ends and > > > > > > > > > > > > > > > > > put > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > finishing > > > > > > > > > > > > > > > > > touches > > > > > > > > > > > > > > > > > on > > > > > > > > > > > > > > > > > any slides/runs/scripts, then drive > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > Indianapolis > > > > > > > > > > > > > > > > > on > > > > > > > > > > > > > > > > > Monday > > > > > > > > > > > > > > > > > afternoon/evening. I have a hotel > > > > > > > > > > > > > > > > > booked > > > > > > > > > > > > > > > > > for > > > > > > > > > > > > > > > > > Monday > > > > > > > > > > > > > > > > > night. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'll do some runs using the routes we > > > > > > > > > > > > > > > > > talked > > > > > > > > > > > > > > > > > about. > > > > > > > > > > > > > > > > > I'm > > > > > > > > > > > > > > > > > pretty > > > > > > > > > > > > > > > > > sure > > > > > > > > > > > > > > > > > I > > > > > > > > > > > > > > > > > have working configurations for > > > > > > > > > > > > > > > > > everything > > > > > > > > > > > > > > > > > we > > > > > > > > > > > > > > > > > talked > > > > > > > > > > > > > > > > > about, > > > > > > > > > > > > > > > > > so > > > > > > > > > > > > > > > > > I > > > > > > > > > > > > > > > > > think it's really just a matter of > > > > > > > > > > > > > > > > > plugging > > > > > > > > > > > > > > > > > in > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > apps. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > David > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Michael Wilde" > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > To: "David Kelly" > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Sent: Saturday, March 9, 2013 > > > > > > > > > > > > > > > > > 11:03:15 AM > > > > > > > > > > > > > > > > > Subject: runs for OSG talk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi David, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I just wanted to let you know that Im > > > > > > > > > > > > > > > > > looking > > > > > > > > > > > > > > > > > into > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > run > > > > > > > > > > > > > > > > > options > > > > > > > > > > > > > > > > > now. Im hoping to try a few... WIll > > > > > > > > > > > > > > > > > see > > > > > > > > > > > > > > > > > how > > > > > > > > > > > > > > > > > much > > > > > > > > > > > > > > > > > help > > > > > > > > > > > > > > > > > I > > > > > > > > > > > > > > > > > need. > > > > > > > > > > > > > > > > > Have > > > > > > > > > > > > > > > > > you decided on a driving time and > > > > > > > > > > > > > > > > > made > > > > > > > > > > > > > > > > > hotel > > > > > > > > > > > > > > > > > arrangements? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I would feel free to stay for > > > > > > > > > > > > > > > > > whatever > > > > > > > > > > > > > > > > > portion > > > > > > > > > > > > > > > > > of > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > OSG > > > > > > > > > > > > > > > > > meeting > > > > > > > > > > > > > > > > > you > > > > > > > > > > > > > > > > > feel is of value. The only thing I > > > > > > > > > > > > > > > > > ask is > > > > > > > > > > > > > > > > > that > > > > > > > > > > > > > > > > > for > > > > > > > > > > > > > > > > > Wed > > > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > > Thu > > > > > > > > > > > > > > > > > you > > > > > > > > > > > > > > > > > stay available online for > > > > > > > > > > > > > > > > > user-support or > > > > > > > > > > > > > > > > > other > > > > > > > > > > > > > > > > > assistance > > > > > > > > > > > > > > > > > needs > > > > > > > > > > > > > > > > > that come up here. And that you > > > > > > > > > > > > > > > > > engage > > > > > > > > > > > > > > > > > with > > > > > > > > > > > > > > > > > people > > > > > > > > > > > > > > > > > that > > > > > > > > > > > > > > > > > can > > > > > > > > > > > > > > > > > help > > > > > > > > > > > > > > > > > us > > > > > > > > > > > > > > > > > develop the Swift user community and > > > > > > > > > > > > > > > > > reliable > > > > > > > > > > > > > > > > > OSG > > > > > > > > > > > > > > > > > usage. > > > > > > > > > > > > > > > > > Rob, > > > > > > > > > > > > > > > > > Marco, > > > > > > > > > > > > > > > > > Lincoln, and Suchandra would be good > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > hang > > > > > > > > > > > > > > > > > out > > > > > > > > > > > > > > > > > with > > > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > > they > > > > > > > > > > > > > > > > > can > > > > > > > > > > > > > > > > > introduce you to good contacts. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Of course we will cover your expenses > > > > > > > > > > > > > > > > > via > > > > > > > > > > > > > > > > > a > > > > > > > > > > > > > > > > > UChicago > > > > > > > > > > > > > > > > > travel > > > > > > > > > > > > > > > > > expense > > > > > > > > > > > > > > > > > report. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > We'll be starting a project with a > > > > > > > > > > > > > > > > > tiny > > > > > > > > > > > > > > > > > bit > > > > > > > > > > > > > > > > > of > > > > > > > > > > > > > > > > > additional > > > > > > > > > > > > > > > > > ExTENCI > > > > > > > > > > > > > > > > > funds to make Swift do smarter data > > > > > > > > > > > > > > > > > management > > > > > > > > > > > > > > > > > on > > > > > > > > > > > > > > > > > OSG > > > > > > > > > > > > > > > > > sites > > > > > > > > > > > > > > > > > (and > > > > > > > > > > > > > > > > > in > > > > > > > > > > > > > > > > > general) so anything you learn about > > > > > > > > > > > > > > > > > OSG > > > > > > > > > > > > > > > > > storage > > > > > > > > > > > > > > > > > elements/services/tools will be > > > > > > > > > > > > > > > > > valuable > > > > > > > > > > > > > > > > > for > > > > > > > > > > > > > > > > > that > > > > > > > > > > > > > > > > > (srmcp, > > > > > > > > > > > > > > > > > lcgcp, > > > > > > > > > > > > > > > > > etc). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Between now and your talk, lets just > > > > > > > > > > > > > > > > > focus > > > > > > > > > > > > > > > > > on > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > talk, > > > > > > > > > > > > > > > > > OK? > > > > > > > > > > > > > > > > > Im > > > > > > > > > > > > > > > > > hoping > > > > > > > > > > > > > > > > > we have slides frozen by Monday. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > While I fiddle, if you could do catsn > > > > > > > > > > > > > > > > > or > > > > > > > > > > > > > > > > > other > > > > > > > > > > > > > > > > > hello-world-like > > > > > > > > > > > > > > > > > tests > > > > > > > > > > > > > > > > > to cover the "routes" we discussed, > > > > > > > > > > > > > > > > > that > > > > > > > > > > > > > > > > > would > > > > > > > > > > > > > > > > > pave > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > way > > > > > > > > > > > > > > > > > for > > > > > > > > > > > > > > > > > plugging in the real app examples. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Sound good? Let me know of any > > > > > > > > > > > > > > > > > concerns > > > > > > > > > > > > > > > > > (other > > > > > > > > > > > > > > > > > than > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > fact > > > > > > > > > > > > > > > > > that > > > > > > > > > > > > > > > > > this is a tad rushed ;) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks and regards, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > > Michael Wilde > > > > > > > > > > > > > > > > > Computation Institute, University of > > > > > > > > > > > > > > > > > Chicago > > > > > > > > > > > > > > > > > Mathematics and Computer Science > > > > > > > > > > > > > > > > > Division > > > > > > > > > > > > > > > > > Argonne National Laboratory > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > Swift-devel mailing list > > > > > > > Swift-devel at ci.uchicago.edu > > > > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > From hategan at mcs.anl.gov Mon Mar 11 01:06:26 2013 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 10 Mar 2013 23:06:26 -0700 Subject: [Swift-devel] Cant get auto-coasters to run from midway to beagle In-Reply-To: <584154174.1299489.1362976355139.JavaMail.root@mcs.anl.gov> References: <584154174.1299489.1362976355139.JavaMail.root@mcs.anl.gov> Message-ID: <1362981986.3267.6.camel@echo> How do you get java on that machine? Mihael From hategan at mcs.anl.gov Mon Mar 11 01:47:07 2013 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 10 Mar 2013 23:47:07 -0700 Subject: [Swift-devel] Cant get auto-coasters to run from midway to beagle In-Reply-To: <1362981986.3267.6.camel@echo> References: <584154174.1299489.1362976355139.JavaMail.root@mcs.anl.gov> <1362981986.3267.6.camel@echo> Message-ID: <1362984427.3637.4.camel@echo> Nevermind. Found java and some other goodies in your home directory. I'm seeing something very weird on that machine. The internets are just crazy. Like connections to svn repos fail more often than not (while working fine from anywhere else), ssh to beagle fails randomly, downloads of coaster jars fail randomly (I have not personally seen this one before). All with "connection timed out". Did you see anything similar or am I just being lucky right now? Mihael On Sun, 2013-03-10 at 23:06 -0700, Mihael Hategan wrote: > How do you get java on that machine? > > Mihael > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel From davidk at ci.uchicago.edu Mon Mar 11 01:53:48 2013 From: davidk at ci.uchicago.edu (David Kelly) Date: Mon, 11 Mar 2013 01:53:48 -0500 (CDT) Subject: [Swift-devel] Cant get auto-coasters to run from midway to beagle In-Reply-To: <1362984427.3637.4.camel@echo> Message-ID: <488676748.1965033.1362984828774.JavaMail.root@ci.uchicago.edu> I did see some svn weirdness this weekend on midway.. seemed like checkouts/commits failed about half of the time. ----- Original Message ----- > From: "Mihael Hategan" > To: "Michael Wilde" > Cc: "Swift Devel" > Sent: Monday, March 11, 2013 1:47:07 AM > Subject: Re: [Swift-devel] Cant get auto-coasters to run from midway > to beagle > Nevermind. Found java and some other goodies in your home directory. > I'm seeing something very weird on that machine. The internets are > just > crazy. Like connections to svn repos fail more often than not (while > working fine from anywhere else), ssh to beagle fails randomly, > downloads of coaster jars fail randomly (I have not personally seen > this > one before). All with "connection timed out". > Did you see anything similar or am I just being lucky right now? > Mihael > On Sun, 2013-03-10 at 23:06 -0700, Mihael Hategan wrote: > > How do you get java on that machine? > > > > Mihael > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From benc at hawaga.org.uk Mon Mar 11 04:33:33 2013 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 11 Mar 2013 09:33:33 +0000 (UTC) Subject: [Swift-devel] Swift reference manual and syntax specification In-Reply-To: References: <1649371076.1294063.1362958013186.JavaMail.root@mcs.anl.gov> Message-ID: Yeah that looks pretty much like it. I guess I should've written it down before then I could have had a paper... Nice to see that someone else thinks my idea is not complete crackheadedness, though. One think that I like from 1987 is this, which is not to do with parallel/hpc programming but has similar partial-ordered variables: http://www.cs.ucla.edu/~stott/pop/ > There's been someone at Indiana working on semantics of monotonic variables, which is pretty much exactly > what Ben described: > http://www.cs.indiana.edu/~rrnewton/papers/2012-lambdapar-draft.pdf > > On Sun, Mar 10, 2013 at 6:46 PM, Ben Clifford wrote: > > > Your example is a good one, in that it (and probably several others) > > would be needed to understand the topic of array closing. ?The current > > Swift/T Guide says only "Arrays are part of Swift dataflow semantics. An > > array is closed when all possible insertions to it are complete" but it > > doesnt say how to understand that clause "...when all possible > > insertions to it are complete", in particular what "all possible" means. > > More examples are needed to understand it fully. Ideally, precise rules > > and examples complement each other. > > wrt this, one approach to array closing that I had before was regarding > the state of the array as constrained to move on a directed graph of > states until they reach an end state (the closed state) - that gives not > really a state machine, but perhaps something a bit like it, onto which > maybe you could say "this kind of Swift code moves an array into this > state". I never wrote that more formally, and I don't think it lines up > entirely with the way that Swift does things, but it might be interesting > to pursue. > > -- > > > > From wilde at mcs.anl.gov Mon Mar 11 08:03:41 2013 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 11 Mar 2013 08:03:41 -0500 (CDT) Subject: [Swift-devel] Cant get auto-coasters to run from midway to beagle In-Reply-To: <488676748.1965033.1362984828774.JavaMail.root@ci.uchicago.edu> Message-ID: <1805190927.11638.1363007021988.JavaMail.root@mcs.anl.gov> I did not see anything like this, that Im aware of. What login host were you using on Midway? Maybe one is bad? I was running from swift.rcc.uchicago.edu, which is midway001.rcc.uchicago.edu. - Mike ----- Original Message ----- > From: "David Kelly" > To: "Mihael Hategan" > Cc: "Swift Devel" , "Michael Wilde" > Sent: Monday, March 11, 2013 1:53:48 AM > Subject: Re: [Swift-devel] Cant get auto-coasters to run from midway to beagle > > > I did see some svn weirdness this weekend on midway.. seemed like > checkouts/commits failed about half of the time. > > ----- Original Message ----- > > > From: "Mihael Hategan" > To: "Michael Wilde" > Cc: "Swift Devel" > Sent: Monday, March 11, 2013 1:47:07 AM > Subject: Re: [Swift-devel] Cant get auto-coasters to run from midway > to beagle > > Nevermind. Found java and some other goodies in your home directory. > > I'm seeing something very weird on that machine. The internets are > just > crazy. Like connections to svn repos fail more often than not (while > working fine from anywhere else), ssh to beagle fails randomly, > downloads of coaster jars fail randomly (I have not personally seen > this > one before). All with "connection timed out". > > Did you see anything similar or am I just being lucky right now? > > Mihael > > On Sun, 2013-03-10 at 23:06 -0700, Mihael Hategan wrote: > > How do you get java on that machine? > > > > Mihael > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > From wilde at mcs.anl.gov Mon Mar 11 08:12:06 2013 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 11 Mar 2013 08:12:06 -0500 (CDT) Subject: [Swift-devel] Cant get auto-coasters to run from midway to beagle In-Reply-To: <1362984427.3637.4.camel@echo> Message-ID: <1693599861.15199.1363007526187.JavaMail.root@mcs.anl.gov> On midway you use the module command to load packages like java, as on beagle and most CI machines. My login-time module load commands for rcc machines are in /home/wilde/.modules, which does: module load java module load ant ...which gets java from /software/java-1.7-x86_64/bin/java and ant from /software/ant-1.8.4-all/bin/ant On beagle, my .modules does: module load java/jdk1.7.0_07 ...which gets java from /opt/java/jdk1.7.0_07/bin/java - Mike ----- Original Message ----- > From: "Mihael Hategan" > To: "Michael Wilde" > Cc: "Swift Devel" > Sent: Monday, March 11, 2013 1:47:07 AM > Subject: Re: [Swift-devel] Cant get auto-coasters to run from midway to beagle > > Nevermind. Found java and some other goodies in your home directory. > > I'm seeing something very weird on that machine. The internets are > just > crazy. Like connections to svn repos fail more often than not (while > working fine from anywhere else), ssh to beagle fails randomly, > downloads of coaster jars fail randomly (I have not personally seen > this > one before). All with "connection timed out". > > Did you see anything similar or am I just being lucky right now? > > Mihael > > On Sun, 2013-03-10 at 23:06 -0700, Mihael Hategan wrote: > > How do you get java on that machine? > > > > Mihael > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > From davidk at ci.uchicago.edu Mon Mar 11 08:27:17 2013 From: davidk at ci.uchicago.edu (David Kelly) Date: Mon, 11 Mar 2013 08:27:17 -0500 (CDT) Subject: [Swift-devel] Cant get auto-coasters to run from midway to beagle In-Reply-To: <1805190927.11638.1363007021988.JavaMail.root@mcs.anl.gov> Message-ID: <911684712.1996490.1363008437375.JavaMail.root@ci.uchicago.edu> I was connecting to midway.rcc.uchicago.edu (which I think connected me to midway-login2). I haven't been able to connect to swift.rcc.uchicago.edu in the last few days - Pierre and I get an error that say " shell request failed on channel 0" when we try to connect. I have an email in with rcc support about that. ----- Original Message ----- > From: "Michael Wilde" > To: "David Kelly" > Cc: "Swift Devel" , "Mihael Hategan" > > Sent: Monday, March 11, 2013 8:03:41 AM > Subject: Re: [Swift-devel] Cant get auto-coasters to run from midway > to beagle > I did not see anything like this, that Im aware of. > What login host were you using on Midway? Maybe one is bad? > I was running from swift.rcc.uchicago.edu, which is > midway001.rcc.uchicago.edu. > - Mike > ----- Original Message ----- > > From: "David Kelly" > > To: "Mihael Hategan" > > Cc: "Swift Devel" , "Michael Wilde" > > > > Sent: Monday, March 11, 2013 1:53:48 AM > > Subject: Re: [Swift-devel] Cant get auto-coasters to run from > > midway to beagle > > > > > > I did see some svn weirdness this weekend on midway.. seemed like > > checkouts/commits failed about half of the time. > > > > ----- Original Message ----- > > > > > > From: "Mihael Hategan" > > To: "Michael Wilde" > > Cc: "Swift Devel" > > Sent: Monday, March 11, 2013 1:47:07 AM > > Subject: Re: [Swift-devel] Cant get auto-coasters to run from > > midway > > to beagle > > > > Nevermind. Found java and some other goodies in your home > > directory. > > > > I'm seeing something very weird on that machine. The internets are > > just > > crazy. Like connections to svn repos fail more often than not > > (while > > working fine from anywhere else), ssh to beagle fails randomly, > > downloads of coaster jars fail randomly (I have not personally seen > > this > > one before). All with "connection timed out". > > > > Did you see anything similar or am I just being lucky right now? > > > > Mihael > > > > On Sun, 2013-03-10 at 23:06 -0700, Mihael Hategan wrote: > > > How do you get java on that machine? > > > > > > Mihael > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yadudoc1729 at gmail.com Mon Mar 11 09:16:34 2013 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Mon, 11 Mar 2013 19:46:34 +0530 Subject: [Swift-devel] Nested loops misbehaving [0.94 & Faster] Message-ID: Hi, I have 5 loops (foreach and iterate mixed) and I see indices repeating in the innermost loop body. This is causing the script to crash as the array content is already closed. This was run on 0.94 (rev 6364). On swift-faster this script fails to compile (I don't see any issue with the script, especially when it compiles and run on 0.94) I could have this checked for other nested loop combinations if that could help. The script and output are attached. -- Regards, Yadu Nand B -------------- next part -------------- A non-text attachment was scrubbed... Name: nested_loops.output Type: application/octet-stream Size: 36364 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test_run.swift Type: application/octet-stream Size: 591 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: faster_nested_loops.output Type: application/octet-stream Size: 2879 bytes Desc: not available URL: From tim.g.armstrong at gmail.com Mon Mar 11 09:39:59 2013 From: tim.g.armstrong at gmail.com (Tim Armstrong) Date: Mon, 11 Mar 2013 09:39:59 -0500 Subject: [Swift-devel] Nested loops misbehaving [0.94 & Faster] In-Reply-To: References: Message-ID: For 0.94, isn't it correct that it prints each combination 10 times because of the 6th innermost foreach loop? On Mon, Mar 11, 2013 at 9:16 AM, Yadu Nand wrote: > Hi, > > I have 5 loops (foreach and iterate mixed) and I see indices repeating in > the > innermost loop body. This is causing the script to crash as the array > content > is already closed. This was run on 0.94 (rev 6364). > > On swift-faster this script fails to compile (I don't see any issue > with the script, > especially when it compiles and run on 0.94) > > I could have this checked for other nested loop combinations if that could > help. > > The script and output are attached. > > -- > Regards, > Yadu Nand B > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yadudoc1729 at gmail.com Mon Mar 11 09:51:05 2013 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Mon, 11 Mar 2013 20:21:05 +0530 Subject: [Swift-devel] Swift crashing for runs with 1M calls In-Reply-To: <1362974969.1383.11.camel@echo> References: <1362974969.1383.11.camel@echo> Message-ID: Hi Mihael, This kind of information helps a lot. > Yeah, the algorithm for finding loops is recursive. Since it has to deal > with 1M threads, it does fail. The code may not be recursive, but the > dependency is, so when the hang checker builds the dependency graph, it > will essentially try to build a graph that tracks where, say, > array[1000000] is. The complexity of that is 2^n in this case, so, > yeah... So, any loop large enough with a dependency on previous values could trigger this error ? Throwing more memory into the system or giving more time wouldn't help this, right ? I'm also seeing swift hung, after I removed the dependency, probably a different issue. Here's what I see : http://pastebin.com/J9V3xyqU I killed the process after 15 mins. > One solution may be to stop the hang checker when things get too big. > That way it can still be useful for normal stuff. > > The hang checker's invocation here is unfortunate though. the assignment > int range[] = [2:limit:1]; does the silly thing of actually creating an > array with 1M elements (if you say foreach v in range..., that does not > happen). It may be useful to change setFieldValue to do a simple > reference assignment instead of copying the entire array, but that might > not be easy given the way that assignment is implemented (i.e. not when > the array is created). Did you mean to say that foreach v in [2:limit:1] would avoid having the 1 million array elements created in range[] ? In what way would this affect the execution of the script, make it faster ? -- Thanks, Yadu Nand B From yadudoc1729 at gmail.com Mon Mar 11 09:53:47 2013 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Mon, 11 Mar 2013 20:23:47 +0530 Subject: [Swift-devel] Nested loops misbehaving [0.94 & Faster] In-Reply-To: References: Message-ID: Oops. My mistake. Script work fine after correction. Thanks Tim :) -Yadu On Mon, Mar 11, 2013 at 8:09 PM, Tim Armstrong wrote: > For 0.94, isn't it correct that it prints each combination 10 times because > of the 6th innermost foreach loop? > > On Mon, Mar 11, 2013 at 9:16 AM, Yadu Nand wrote: >> >> Hi, >> >> I have 5 loops (foreach and iterate mixed) and I see indices repeating in >> the >> innermost loop body. This is causing the script to crash as the array >> content >> is already closed. This was run on 0.94 (rev 6364). >> >> On swift-faster this script fails to compile (I don't see any issue >> with the script, >> especially when it compiles and run on 0.94) >> >> I could have this checked for other nested loop combinations if that could >> help. >> >> The script and output are attached. >> >> -- >> Regards, >> Yadu Nand B >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel >> > From eskogen at g.clemson.edu Mon Mar 11 13:17:03 2013 From: eskogen at g.clemson.edu (Eric Skogen) Date: Mon, 11 Mar 2013 14:17:03 -0400 Subject: [Swift-devel] interpreting swift error message Message-ID: The below code gave me an error message from swift telling me my output file didn't exist. As I understand my code, the file shouldn't need to exist. Can someone explain what happened? code: type messagefile; app (messagefile o) max(messagefile i) { java "MaxTemperature" @filename(i) stdout=@filename(o); } messagefile input <"1901">; messagefile out <"out.txt">; out = max(input); Error RunID: 20130311-1410-5i605r58 Progress: time: Mon, 11 Mar 2013 14:10:39 -0400 Progress: time: Mon, 11 Mar 2013 14:10:42 -0400 Stage in:1 Execution failed: File not found: /var/tmp/example2.2-20130311-1410-5i605r58/shared/out.txt I'm writing this code as an example so although I'm very interested in how to make this work, I'm more interest in how to figure out the problem based on the error message. Thank you for your time. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Mon Mar 11 13:32:48 2013 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 11 Mar 2013 11:32:48 -0700 Subject: [Swift-devel] Swift crashing for runs with 1M calls In-Reply-To: References: <1362974969.1383.11.camel@echo> Message-ID: <1363026768.5632.4.camel@echo> On Mon, 2013-03-11 at 20:21 +0530, Yadu Nand wrote: > Hi Mihael, > > This kind of information helps a lot. > > > Yeah, the algorithm for finding loops is recursive. Since it has to deal > > with 1M threads, it does fail. The code may not be recursive, but the > > dependency is, so when the hang checker builds the dependency graph, it > > will essentially try to build a graph that tracks where, say, > > array[1000000] is. The complexity of that is 2^n in this case, so, > > yeah... > > So, any loop large enough with a dependency on previous values could > trigger this error ? Throwing more memory into the system or giving more > time wouldn't help this, right ? Right. Though I was wrong about 2^n, it's O(n) instead, but still enough to cause a stack overflow. > > I'm also seeing swift hung, after I removed the dependency, probably a > different issue. > Here's what I see : http://pastebin.com/J9V3xyqU > I killed the process after 15 mins. Not sure what you mean by "removing dependency". > > > One solution may be to stop the hang checker when things get too big. > > That way it can still be useful for normal stuff. > > > > The hang checker's invocation here is unfortunate though. the assignment > > int range[] = [2:limit:1]; does the silly thing of actually creating an > > array with 1M elements (if you say foreach v in range..., that does not > > happen). It may be useful to change setFieldValue to do a simple > > reference assignment instead of copying the entire array, but that might > > not be easy given the way that assignment is implemented (i.e. not when > > the array is created). > > Did you mean to say that foreach v in [2:limit:1] would avoid having the > 1 million array elements created in range[] ? In what way would this > affect the execution of the script, make it faster ? That's right. Range returns a lazy array. Basically each element in the array is initialized when foreach tries to access it. The problem is in the array = range() assignment. The creation of 1M swift array data elements is slow. Slow enough to trigger the hang checker. At least that's my take on it. Mihael From ketancmaheshwari at gmail.com Mon Mar 11 13:56:57 2013 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Mon, 11 Mar 2013 13:56:57 -0500 Subject: [Swift-devel] interpreting swift error message In-Reply-To: References: Message-ID: Eric, I think the message does not mean the file was not found by Swift. It means something failed in the application and it was not able to produce the output file out.txt. The file not found appears as part of the stage out routine which expects that file to be present after Swift completes execution. Could you post your sites.xml and tc files. Thanks, Ketan On Mon, Mar 11, 2013 at 1:17 PM, Eric Skogen wrote: > The below code gave me an error message from swift telling me my output > file didn't exist. As I understand my code, the file shouldn't need to > exist. Can someone explain what happened? > > code: > type messagefile; > > app (messagefile o) max(messagefile i) { > java "MaxTemperature" @filename(i) stdout=@filename(o); > } > messagefile input <"1901">; > messagefile out <"out.txt">; > out = max(input); > > Error > RunID: 20130311-1410-5i605r58 > Progress: time: Mon, 11 Mar 2013 14:10:39 -0400 > Progress: time: Mon, 11 Mar 2013 14:10:42 -0400 Stage in:1 > Execution failed: > File not found: > /var/tmp/example2.2-20130311-1410-5i605r58/shared/out.txt > > > I'm writing this code as an example so although I'm very interested in how > to make this work, I'm more interest in how to figure out the problem based > on the error message. Thank you for your time. > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Mon Mar 11 14:32:38 2013 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 11 Mar 2013 14:32:38 -0500 (CDT) Subject: [Swift-devel] interpreting swift error message In-Reply-To: Message-ID: <2058262902.74940.1363030358989.JavaMail.root@mcs.anl.gov> There is something strange going on in your test which we need to diagnose. In your example, the app ("java") should be writing something to stdout. So its hard to see how swift could produce the error you show. We'll need to look into and and will need all the files you used, in addition to what Ketan asked for. I tried two tests to see whats happening. In my first test, I had a simple shell app that, like your java example, expected something on stdout. I tested with a app() that did nothing on stdout. It yielded an empty file, as I would expect it to (not an error). I also tried to do what it looks like you are trying, but just using shell commands. That also worked as expected, and is below. So something is odd somewhere in your test, either a swift bug or something subtle in the calling of java. Here's my working "max temp" example: $ cat max.swift type file; app (file o) maxT(file i) { sh "-c" @strcat("sort -nr ", @filename(i), " | head -1") stdout=@filename(o); } file maxtemp<"maxtemp.txt">; file intemps<"temperatures.data">; maxtemp = maxT(intemps); $ cat tc.sh localhost sh /bin/sh null null null $ cat temperatures.data 98.6 101.2 97.1 104.4 32.0 212.1 $ rm maxtemp.txt $ swift -tc.file tc.sh max.swift Swift 0.94 swift-r6362 cog-r3637 RunID: 20130311-1926-9mre8yxg Progress: time: Mon, 11 Mar 2013 19:26:34 +0000 Final status: Mon, 11 Mar 2013 19:26:35 +0000 Finished successfully:1 $ cat maxtemp.txt 212.1 $ ----- Original Message ----- > From: "Ketan Maheshwari" > To: "Eric Skogen" > Cc: "Swift Devel" > Sent: Monday, March 11, 2013 1:56:57 PM > Subject: Re: [Swift-devel] interpreting swift error message > > > > Eric, > > > I think the message does not mean the file was not found by Swift. It > means something failed in the application and it was not able to > produce the output file out.txt. > > > The file not found appears as part of the stage out routine which > expects that file to be present after Swift completes execution. > > > Could you post your sites.xml and tc files. > > > Thanks, > Ketan > > > > On Mon, Mar 11, 2013 at 1:17 PM, Eric Skogen < eskogen at g.clemson.edu > > wrote: > > > The below code gave me an error message from swift telling me my > output file didn't exist. As I understand my code, the file > shouldn't need to exist. Can someone explain what happened? > > code: > type messagefile; > > app (messagefile o) max(messagefile i) { > java "MaxTemperature" @filename(i) stdout=@filename(o); > } > messagefile input <"1901">; > messagefile out <"out.txt">; > out = max(input); > > Error > RunID: 20130311-1410-5i605r58 > Progress: time: Mon, 11 Mar 2013 14:10:39 -0400 > Progress: time: Mon, 11 Mar 2013 14:10:42 -0400 Stage in:1 > Execution failed: > File not found: > /var/tmp/example2.2-20130311-1410-5i605r58/shared/out.txt > > > I'm writing this code as an example so although I'm very interested > in how to make this work, I'm more interest in how to figure out the > problem based on the error message. Thank you for your time. > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > -- > Ketan > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > From hategan at mcs.anl.gov Mon Mar 11 14:54:10 2013 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 11 Mar 2013 12:54:10 -0700 Subject: [Swift-devel] Nested loops misbehaving [0.94 & Faster] In-Reply-To: References: Message-ID: <1363031650.5632.6.camel@echo> Wait, do we really have "iterate {} until (cond)" loop around WHILE the condition is true? On Mon, 2013-03-11 at 20:23 +0530, Yadu Nand wrote: > Oops. My mistake. Script work fine after correction. > > Thanks Tim :) > > -Yadu > > On Mon, Mar 11, 2013 at 8:09 PM, Tim Armstrong > wrote: > > For 0.94, isn't it correct that it prints each combination 10 times because > > of the 6th innermost foreach loop? > > > > On Mon, Mar 11, 2013 at 9:16 AM, Yadu Nand wrote: > >> > >> Hi, > >> > >> I have 5 loops (foreach and iterate mixed) and I see indices repeating in > >> the > >> innermost loop body. This is causing the script to crash as the array > >> content > >> is already closed. This was run on 0.94 (rev 6364). > >> > >> On swift-faster this script fails to compile (I don't see any issue > >> with the script, > >> especially when it compiles and run on 0.94) > >> > >> I could have this checked for other nested loop combinations if that could > >> help. > >> > >> The script and output are attached. > >> > >> -- > >> Regards, > >> Yadu Nand B > >> > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > >> > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel From hategan at mcs.anl.gov Mon Mar 11 15:00:44 2013 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 11 Mar 2013 13:00:44 -0700 Subject: [Swift-devel] interpreting swift error message In-Reply-To: References: Message-ID: <1363032044.5632.7.camel@echo> On Mon, 2013-03-11 at 14:17 -0400, Eric Skogen wrote: > Execution failed: > File not found: > /var/tmp/example2.2-20130311-1410-5i605r58/shared/out.txt > Could it be that /var/tmp does not exist and you don't have rights to create it? Mihael From yadudoc1729 at gmail.com Mon Mar 11 15:36:25 2013 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Tue, 12 Mar 2013 02:06:25 +0530 Subject: [Swift-devel] Swift crashing for runs with 1M calls In-Reply-To: <1363026768.5632.4.camel@echo> References: <1362974969.1383.11.camel@echo> <1363026768.5632.4.camel@echo> Message-ID: >> So, any loop large enough with a dependency on previous values could >> trigger this error ? Throwing more memory into the system or giving more >> time wouldn't help this, right ? > > Right. Though I was wrong about 2^n, it's O(n) instead, but still enough > to cause a stack overflow. Okay. Hmm.. so we could measure how much more loops we can afford with more memory. >> I'm also seeing swift hung, after I removed the dependency, probably a >> different issue. >> Here's what I see : http://pastebin.com/J9V3xyqU >> I killed the process after 15 mins. > > Not sure what you mean by "removing dependency". I just removed the dependency on array values filled in by previous threads. instead of array[n] = array[n-1] + array[n-2], I went with array[n] = n*n; So hang checker would determine that the threads of execution are independent, right ? >> > One solution may be to stop the hang checker when things get too big. >> > That way it can still be useful for normal stuff. >> > >> > The hang checker's invocation here is unfortunate though. the assignment >> > int range[] = [2:limit:1]; does the silly thing of actually creating an >> > array with 1M elements (if you say foreach v in range..., that does not >> > happen). It may be useful to change setFieldValue to do a simple >> > reference assignment instead of copying the entire array, but that might >> > not be easy given the way that assignment is implemented (i.e. not when >> > the array is created). >> >> Did you mean to say that foreach v in [2:limit:1] would avoid having the >> 1 million array elements created in range[] ? In what way would this >> affect the execution of the script, make it faster ? > > That's right. Range returns a lazy array. Basically each element in the > array is initialized when foreach tries to access it. > > The problem is in the array = range() assignment. The creation of 1M > swift array data elements is slow. Slow enough to trigger the hang > checker. At least that's my take on it. Oh cool! The code without the range assignment, ran 1M loops. 1M in 1m22.327s (I think this is a big improvement) 10M -> crashes with error (http://pastebin.com/bac6FEms) -- Thanks and Regards, Yadu Nand B From hategan at mcs.anl.gov Mon Mar 11 15:42:04 2013 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 11 Mar 2013 13:42:04 -0700 Subject: [Swift-devel] Swift crashing for runs with 1M calls In-Reply-To: References: <1362974969.1383.11.camel@echo> <1363026768.5632.4.camel@echo> Message-ID: <1363034524.10260.3.camel@echo> On Tue, 2013-03-12 at 02:06 +0530, Yadu Nand wrote: > >> So, any loop large enough with a dependency on previous values could > >> trigger this error ? Throwing more memory into the system or giving more > >> time wouldn't help this, right ? > > > > Right. Though I was wrong about 2^n, it's O(n) instead, but still enough > > to cause a stack overflow. > > Okay. Hmm.. so we could measure how much more loops we can afford > with more memory. Well, no. The hang checker should be limited to searching for limited size loops. The usefulness of a message showing the dependency graph of the fib series with 1M elements will not be very useful. > > >> I'm also seeing swift hung, after I removed the dependency, probably a > >> different issue. > >> Here's what I see : http://pastebin.com/J9V3xyqU > >> I killed the process after 15 mins. > > > > Not sure what you mean by "removing dependency". > > I just removed the dependency on array values filled in by previous threads. > instead of array[n] = array[n-1] + array[n-2], I went with array[n] = n*n; > So hang checker would determine that the threads of execution are independent, > right ? Correct. You should not get a stack overflow even if the hang checker is invoked. > > >[...] > > > > The problem is in the array = range() assignment. The creation of 1M > > swift array data elements is slow. Slow enough to trigger the hang > > checker. At least that's my take on it. > > Oh cool! The code without the range assignment, ran 1M loops. > 1M in 1m22.327s (I think this is a big improvement) > 10M -> crashes with error (http://pastebin.com/bac6FEms) I'm not surprised 10M threads is a lot. Mihael From eskogen at g.clemson.edu Mon Mar 11 19:10:13 2013 From: eskogen at g.clemson.edu (Eric Skogen) Date: Mon, 11 Mar 2013 20:10:13 -0400 Subject: [Swift-devel] interpreting swift error message In-Reply-To: <1363032044.5632.7.camel@echo> References: <1363032044.5632.7.camel@echo> Message-ID: It does seem to be a java thing. It worked fine for me with shell commands too. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- MaxTemperatureMapper 1901 -------------- next part -------------- A non-text attachment was scrubbed... Name: 1901 Type: application/octet-stream Size: 888189 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: example2.2.swift Type: application/octet-stream Size: 197 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: tc.data Type: application/octet-stream Size: 1277 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: sites.xml Type: text/xml Size: 4550 bytes Desc: not available URL: From eskogen at g.clemson.edu Mon Mar 11 19:12:57 2013 From: eskogen at g.clemson.edu (Eric Skogen) Date: Mon, 11 Mar 2013 20:12:57 -0400 Subject: [Swift-devel] interpreting swift error message In-Reply-To: References: <1363032044.5632.7.camel@echo> Message-ID: oops forgot the java file in question. For the record, I know java's not the best way to do this, I was doing it to illustrate calling external applications and java was one of the things on the list I planned to use. I'd just switch languages, but the error interested me. On Mon, Mar 11, 2013 at 8:10 PM, Eric Skogen wrote: > It does seem to be a java thing. It worked fine for me with shell > commands too. > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: MaxTemperature.java Type: application/octet-stream Size: 914 bytes Desc: not available URL: From wilde at mcs.anl.gov Mon Mar 11 19:31:29 2013 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 11 Mar 2013 19:31:29 -0500 (CDT) Subject: [Swift-devel] interpreting swift error message In-Reply-To: Message-ID: <1930886571.110832.1363048289919.JavaMail.root@mcs.anl.gov> How about a python script to start with, instead of Java? You can readily point tc.data to a python script that starts with: #! /usr/bin/env python You can also point tc.data to /usr/bin/python and pass the script as an input file. I suspect there is something strange about the way your java app is being called; Im sure we can debug that too. - Mike ----- Original Message ----- > From: "Eric Skogen" > To: swift-devel at ci.uchicago.edu > Sent: Monday, March 11, 2013 7:12:57 PM > Subject: Re: [Swift-devel] interpreting swift error message > > > oops forgot the java file in question. For the record, I know java's > not the best way to do this, I was doing it to illustrate calling > external applications and java was one of the things on the list I > planned to use. I'd just switch languages, but the error interested > me. > > > On Mon, Mar 11, 2013 at 8:10 PM, Eric Skogen < eskogen at g.clemson.edu > > wrote: > > > It does seem to be a java thing. It worked fine for me with shell > commands too. > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > From ketancmaheshwari at gmail.com Mon Mar 11 22:59:19 2013 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Mon, 11 Mar 2013 22:59:19 -0500 Subject: [Swift-devel] interpreting swift error message In-Reply-To: References: <1363032044.5632.7.camel@echo> Message-ID: Eric, I could run your script after setting the CLASSPATH variable to the directory path where your java class is located: [thwomp:Eric_scripts]$ swift -tc.file tc.data -sites.file sites.xml example2.2.swift Swift trunk swift-r6362 cog-r3637 RunID: 20130311-2257-kn122lk3 Progress: time: Mon, 11 Mar 2013 22:57:27 -0500 Execution failed: Exception in java: Arguments: [MaxTemperature, in.txt] Host: localhost Directory: example2.2-20130311-2257-kn122lk3/jobs/q/java-qtxl8h6l stderr.txt: Exception in thread "main" java.lang.NoClassDefFoundError: MaxTemperature Caused by: java.lang.ClassNotFoundException: MaxTemperature at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:321) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:266) Could not find the main class: MaxTemperature. Program will exit. stdout.txt: Caused by: File not found: /var/tmp/example2.2-20130311-2257-kn122lk3/shared/out.txt max, example2.2.swift, line 8 [thwomp:Eric_scripts]$ export CLASSPATH=/homes/ketan/Eric_scripts:$CLASSPATH [thwomp:Eric_scripts]$ swift -tc.file tc.data -sites.file sites.xml example2.2.swift Swift trunk swift-r6362 cog-r3637 RunID: 20130311-2257-6s3ixfr5 Progress: time: Mon, 11 Mar 2013 22:57:52 -0500 Final status: Mon, 11 Mar 2013 22:57:53 -0500 Finished successfully:1 See if this works for you. Regards, Ketan On Mon, Mar 11, 2013 at 7:12 PM, Eric Skogen wrote: > oops forgot the java file in question. For the record, I know java's not > the best way to do this, I was doing it to illustrate calling external > applications and java was one of the things on the list I planned to use. > I'd just switch languages, but the error interested me. > > On Mon, Mar 11, 2013 at 8:10 PM, Eric Skogen wrote: > >> It does seem to be a java thing. It worked fine for me with shell >> commands too. >> > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From yadudoc1729 at gmail.com Tue Mar 12 07:25:53 2013 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Tue, 12 Mar 2013 17:55:53 +0530 Subject: [Swift-devel] Jenkins - Open questions | issues Message-ID: Hi, >From the meeting we had on Friday, a couple of questions remain open regarding Jenkins. 1. On Jenkins, once a remote node is added as a slave to the pool using some credentials, all access to that node is made using the credentials that it was created with. - We can limit access to creation, configure and run privileges to avoid misuse. - We could mark certain slaves as tied to certain jobs, and restrict access (here again, only users with configure, create permission can modify jobs). - Could label slaves to run only on explicitly tied slave nodes. 2. What if the results need to be replicated or inspected ? - All artifacts from a run could be archived for inspection. - If we need to replicate the test under some other credentials, the nightly script could be run independently without Jenkins. We could have the scripts themselves put in svn for this kind of situations, leaving the dependency on Jenkins for a test minimal. 3. Jenkins needs java (raised by Ken in a separate mail Justin had sent). - Wouldn't swift need java? In which case can't we safely forget slaves with no java for testing ? 4. Jenkins on MCS or our own separate instance ? - Justin, the answer from Ken about adding other remote systems to the Jenkins at MCS was negative, right ? Some of these are not perfect solutions, but whether they are acceptable is a question for Mike. (a few are still questions, sorry about that). If I missed any other Jenkins related question/issue, please add to this thread. -Yadu From wilde at mcs.anl.gov Tue Mar 12 08:51:01 2013 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 12 Mar 2013 08:51:01 -0500 (CDT) Subject: [Swift-devel] Coaster run to UC3 dies with channel timeout Message-ID: <1716155600.154516.1363096261076.JavaMail.root@mcs.anl.gov> This demo (for OSG all-hands) was running fairly reliably, 100's to a few thousand 30-second tasks to UC3 with flocking to OSG and other pools. But just got a failure, so it looks like sporadic problems remain. Running Swift 0.94 latest rev. Log is on midway in: /home/wilde/osgdemo/modis/svn/swiftdemo/test.uc3 -rw-rw-r-- 1 wilde wilde 11632001 Mar 12 08:42 saved/modis-20130312-1335-p30ylps9.log I'll file a ticket once we get a sense of the frequency. - Mike Progress: time: Tue, 12 Mar 2013 13:42:28 +0000 Selecting site:461 Stage in:10 Submitted:782 Active:204 Stage out:4 Finished successfully:1539 Progress: time: Tue, 12 Mar 2013 13:42:29 +0000 Selecting site:453 Stage in:6 Submitted:779 Active:215 Finished successfully:1547 Progress: time: Tue, 12 Mar 2013 13:42:30 +0000 Selecting site:439 Stage in:16 Submitting:1 Submitted:776 Active:204 Stage out:2 Finished successfully:1562 Execution failed: Exception in perl: Arguments: [getlanduse.pl, input/h06v33.rgb] Host: uc3 Directory: modis-20130312-1335-p30ylps9/jobs/7/perl-7s7qvh6l Caused by: Task failed: null org.globus.cog.karajan.workflow.service.TimeoutException: Channel timed out. lastTime=130312-084030.762, now=130312-084231.763, channel=TCP-0312-3508510-000259-000000 at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.checkTimeouts(AbstractKarajanChannel.java:131) at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel$1.run(AbstractKarajanChannel.java:122) at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505) getLandUse, modis.swift, line 24 swift$ pwd /home/wilde/osgdemo/modis/svn/swiftdemo/test.uc3 swift$ ls cf input/ modis-20130312-1335-p30ylps9.0.rlog modis-20130312-1335-p30ylps9.log saved/ tc getlanduse.pl* landuse/ modis-20130312-1335-p30ylps9.d/ run* swift.log uc3.xml swift$ e ../save swift$ save swift$ ls saved modis-20130312-1326-n9rofj6e.d/ modis-20130312-1329-f2a2eic4.log modis-20130312-1335-p30ylps9.log modis-20130312-1326-n9rofj6e.log modis-20130312-1335-p30ylps9.0.rlog swift.log modis-20130312-1329-f2a2eic4.d/ modis-20130312-1335-p30ylps9.d/ swift$ ls saved/modis-20130312-1335-p30ylps9.log saved/modis-20130312-1335-p30ylps9.log swift$ pwd; ls -l saved/modis-20130312-1335-p30ylps9.log /home/wilde/osgdemo/modis/svn/swiftdemo/test.uc3 -rw-rw-r-- 1 wilde wilde 11632001 Mar 12 08:42 saved/modis-20130312-1335-p30ylps9.log swift$ -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wozniak at mcs.anl.gov Tue Mar 12 11:10:01 2013 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Tue, 12 Mar 2013 11:10:01 -0500 (CDT) Subject: [Swift-devel] Jenkins - Open questions | issues In-Reply-To: Message-ID: <1995895913.2513909.1363104601218.JavaMail.root@mcs.anl.gov> Right, Ken said we would not be able to access remote machines from the MCS jenkins. Do you have an idea on how to do it from our own instance? If you have an idea on how to do this, Ken might be willing to support us by letting us run a different jenkins on the same machine on a different port or something. Otherwise we would need to find a server. ----- Original Message ----- From: "Yadu Nand" To: "swift-devel" Sent: Tuesday, March 12, 2013 7:25:53 AM Subject: [Swift-devel] Jenkins - Open questions | issues Hi, >From the meeting we had on Friday, a couple of questions remain open regarding Jenkins. 1. On Jenkins, once a remote node is added as a slave to the pool using some credentials, all access to that node is made using the credentials that it was created with. - We can limit access to creation, configure and run privileges to avoid misuse. - We could mark certain slaves as tied to certain jobs, and restrict access (here again, only users with configure, create permission can modify jobs). - Could label slaves to run only on explicitly tied slave nodes. 2. What if the results need to be replicated or inspected ? - All artifacts from a run could be archived for inspection. - If we need to replicate the test under some other credentials, the nightly script could be run independently without Jenkins. We could have the scripts themselves put in svn for this kind of situations, leaving the dependency on Jenkins for a test minimal. 3. Jenkins needs java (raised by Ken in a separate mail Justin had sent). - Wouldn't swift need java? In which case can't we safely forget slaves with no java for testing ? 4. Jenkins on MCS or our own separate instance ? - Justin, the answer from Ken about adding other remote systems to the Jenkins at MCS was negative, right ? Some of these are not perfect solutions, but whether they are acceptable is a question for Mike. (a few are still questions, sorry about that). If I missed any other Jenkins related question/issue, please add to this thread. -Yadu _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel From yadudoc1729 at gmail.com Tue Mar 12 11:40:02 2013 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Tue, 12 Mar 2013 22:10:02 +0530 Subject: [Swift-devel] Jenkins - Open questions | issues In-Reply-To: <1995895913.2513909.1363104601218.JavaMail.root@mcs.anl.gov> References: <1995895913.2513909.1363104601218.JavaMail.root@mcs.anl.gov> Message-ID: Hi Justin, I know how to add remote systems to Jenkins. If we could get an instance at MCS with admin privileges I can set up the resource pool and start adding test jobs. -Yadu On Tue, Mar 12, 2013 at 9:40 PM, Justin M Wozniak wrote: > > Right, Ken said we would not be able to access remote machines from the MCS jenkins. Do you have an idea on how to do it from our own instance? If you have an idea on how to do this, Ken might be willing to support us by letting us run a different jenkins on the same machine on a different port or something. Otherwise we would need to find a server. > > ----- Original Message ----- > From: "Yadu Nand" > To: "swift-devel" > Sent: Tuesday, March 12, 2013 7:25:53 AM > Subject: [Swift-devel] Jenkins - Open questions | issues > > Hi, > > From the meeting we had on Friday, a couple of questions remain open > regarding Jenkins. > > 1. On Jenkins, once a remote node is added as a slave to the pool > using some credentials, > all access to that node is made using the credentials that it was created with. > - We can limit access to creation, configure and run privileges to avoid misuse. > - We could mark certain slaves as tied to certain jobs, and restrict > access (here again, only > users with configure, create permission can modify jobs). > - Could label slaves to run only on explicitly tied slave nodes. > > 2. What if the results need to be replicated or inspected ? > - All artifacts from a run could be archived for inspection. > - If we need to replicate the test under some other credentials, the > nightly script could be > run independently without Jenkins. We could have the scripts > themselves put in svn for > this kind of situations, leaving the dependency on Jenkins for a test minimal. > > 3. Jenkins needs java (raised by Ken in a separate mail Justin had sent). > - Wouldn't swift need java? In which case can't we safely forget > slaves with no java > for testing ? > > 4. Jenkins on MCS or our own separate instance ? > - Justin, the answer from Ken about adding other remote systems to the > Jenkins at MCS > was negative, right ? > > Some of these are not perfect solutions, but whether they are > acceptable is a question > for Mike. (a few are still questions, sorry about that). > > If I missed any other Jenkins related question/issue, please add to this thread. > > -Yadu > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel -- Thanks and Regards, Yadu Nand B From wozniak at mcs.anl.gov Thu Mar 14 15:55:56 2013 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Thu, 14 Mar 2013 15:55:56 -0500 Subject: [Swift-devel] Swift Weekly code discussions In-Reply-To: <5134C841.50505@mcs.anl.gov> References: <5134C841.50505@mcs.anl.gov> Message-ID: <5142395C.7010506@mcs.anl.gov> There was a single winning time slot: Friday at 2pm. So, our first session will be Friday, May 22 at 2pm. Tim will present MPI and ADLB, which are key technologies used by Swift/T. (I will be on travel.) Let's plan to meet at the CI. I will set up a room. On 03/04/2013 10:13 AM, Justin M Wozniak wrote: > Hi all > As discussed on swift-devel, we are going to start weekly code > discussion meetings. We will do screen sharing and recording for > everyone's benefit, regardless of location. I set up a Doodle poll > for the time; link is below. We will set the time by the end of this > week so we can start next week, that is, on/after March 11. > Justin > > http://doodle.com/8ddu6zptb2uf7fgw > -- Justin M Wozniak From yadudoc1729 at gmail.com Tue Mar 19 17:35:41 2013 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Wed, 20 Mar 2013 04:05:41 +0530 Subject: [Swift-devel] MODIS freezes on Midway Message-ID: Hi, I've been running the modis tests on Midway, from the demo that mike had shared: /home/wilde/osgdemo/modis/svn/swiftdemo.v04.tgz In the logs(please see attachment) I see a fail message : "sbatch: error: Batch job submission failed: Requested reservation is invalid" The fact that no error messages are shown on stdout doesn't help, plus, swift just seems to hang forever. * Please help! * I see test.midway show no progress, with just the same status for about 20mins: Progress: time: Tue, 19 Mar 2013 20:21:18 +0000 Selecting site:35 Submitted:65 After this, I tried to kill by Ctrl+C and then I get a few error messages : Failed to shut down block: Block 0319-5807480-000000 (16x3540.000s) org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Can only cancel an active task at org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.cancel(AbstractExecutor.java:196) at org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.cancel(AbstractJobSubmissionTaskHandler.java:85) at org.globus.cog.abstraction.impl.common.AbstractTaskHandler.cancel(AbstractTaskHandler.java:69) at org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.cancel(ExecutionTaskHandler.java:106) at org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.cancel(ExecutionTaskHandler.java:95) at org.globus.cog.abstraction.coaster.service.job.manager.BlockTaskSubmitter.cancel(BlockTaskSubmitter.java:46) at org.globus.cog.abstraction.coaster.service.job.manager.Block.forceShutdown(Block.java:332) at org.globus.cog.abstraction.coaster.service.job.manager.Block.shutdown(Block.java:312) at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.shutdownBlocks(BlockQueueProcessor.java:800) at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.shutdown(BlockQueueProcessor.java:789) at org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.shutdown(JobQueue.java:119) at org.globus.cog.abstraction.coaster.service.CoasterService.shutdown(CoasterService.java:271) at org.globus.cog.abstraction.coaster.service.ServiceShutdownHandler.requestComplete(ServiceShutdownHandler.java:28) at org.globus.cog.karajan.workflow.service.handlers.RequestHandler.receiveCompleted(RequestHandler.java:88) at org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleRequest(AbstractKarajanChannel.java:519) at org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel.actualSend(AbstractPipedChannel.java:86) at org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel$Sender.run(AbstractPipedChannel.java:115) /home/wilde/osgdemo/modis/svn/swiftdemo.v04.tgz -- Yadu Nand B -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: modis-20130319-1958-jgq018sg.log Type: application/octet-stream Size: 144682 bytes Desc: not available URL: From davidk at ci.uchicago.edu Tue Mar 19 17:57:32 2013 From: davidk at ci.uchicago.edu (David Kelly) Date: Tue, 19 Mar 2013 17:57:32 -0500 (CDT) Subject: [Swift-devel] MODIS freezes on Midway In-Reply-To: Message-ID: <1035476791.2136343.1363733852113.JavaMail.root@ci.uchicago.edu> Yadu, Could you please also send the following files? /home/yadunand/.globus/scripts/Slurm9205114084934863737.submit /home/yadunand/.globus/scripts/Slurm9069957957290161585.submit /home/yadunand/.globus/scripts/Slurm741523248459325194.submit /home/yadunand/.globus/scripts/Slurm8627038042414459018.submit Thanks, David -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidk at ci.uchicago.edu Tue Mar 19 18:00:53 2013 From: davidk at ci.uchicago.edu (David Kelly) Date: Tue, 19 Mar 2013 18:00:53 -0500 (CDT) Subject: [Swift-devel] MODIS freezes on Midway In-Reply-To: Message-ID: <200982633.2137033.1363734053295.JavaMail.root@ci.uchicago.edu> Yadu, The setup.sh script sets this environment variable: export SBATCH_RESERVATION=osg I believe sbatch is picking up on this and trying to run with this reservation, which is likely expired. Can you try unsetting SBATCH_RESERVATION, commenting out that line in setup.sh and trying again? Thanks, David ----- Original Message ----- > From: "Yadu Nand" > To: "swift-devel" > Sent: Tuesday, March 19, 2013 5:35:41 PM > Subject: [Swift-devel] MODIS freezes on Midway > Hi, > I've been running the modis tests on Midway, from the demo that mike > had shared: > /home/wilde/osgdemo/modis/svn/swiftdemo.v04.tgz > In the logs(please see attachment) I see a fail message : " sbatch: > error: Batch job > submission failed: Requested reservation is i nvalid" > The fact that no error messages are shown on stdout doesn't help, > plus, swift just > seems to hang forever. * Please help! * > I see test.midway show no progress, with just the same status for > about 20mins: > Progress: time: Tue, 19 Mar 2013 20:21:18 +0000 Selecting site:35 > Submitted:65 > After this, I tried to kill by Ctrl+C and then I get a few error > messages : > Failed to shut down block: Block 0319-5807480-000000 (16x3540.000s) > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > Can only cancel an active task > at > org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.cancel(AbstractExecutor.java:196) > at > org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.cancel(AbstractJobSubmissionTaskHandler.java:85) > at > org.globus.cog.abstraction.impl.common.AbstractTaskHandler.cancel(AbstractTaskHandler.java:69) > at > org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.cancel(ExecutionTaskHandler.java:106) > at > org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.cancel(ExecutionTaskHandler.java:95) > at > org.globus.cog.abstraction.coaster.service.job.manager.BlockTaskSubmitter.cancel(BlockTaskSubmitter.java:46) > at > org.globus.cog.abstraction.coaster.service.job.manager.Block.forceShutdown(Block.java:332) > at > org.globus.cog.abstraction.coaster.service.job.manager.Block.shutdown(Block.java:312) > at > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.shutdownBlocks(BlockQueueProcessor.java:800) > at > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.shutdown(BlockQueueProcessor.java:789) > at > org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.shutdown(JobQueue.java:119) > at > org.globus.cog.abstraction.coaster.service.CoasterService.shutdown(CoasterService.java:271) > at > org.globus.cog.abstraction.coaster.service.ServiceShutdownHandler.requestComplete(ServiceShutdownHandler.java:28) > at > org.globus.cog.karajan.workflow.service.handlers.RequestHandler.receiveCompleted(RequestHandler.java:88) > at > org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleRequest(AbstractKarajanChannel.java:519) > at > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel.actualSend(AbstractPipedChannel.java:86) > at > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel$Sender.run(AbstractPipedChannel.java:115) > /home/wilde/osgdemo/modis/svn/swiftdemo.v04.tgz > -- > Yadu Nand B > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketancmaheshwari at gmail.com Tue Mar 19 18:04:44 2013 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Tue, 19 Mar 2013 18:04:44 -0500 Subject: [Swift-devel] MODIS freezes on Midway In-Reply-To: <200982633.2137033.1363734053295.JavaMail.root@ci.uchicago.edu> References: <200982633.2137033.1363734053295.JavaMail.root@ci.uchicago.edu> Message-ID: In addition to what David mentioned, from logs, it seems that your sites file is missing this line: Or David, correct me if this is not required in this configuration. On Tue, Mar 19, 2013 at 6:00 PM, David Kelly wrote: > Yadu, > > The setup.sh script sets this environment variable: > > export SBATCH_RESERVATION=osg > > I believe sbatch is picking up on this and trying to run with this > reservation, which is likely expired. Can you try unsetting > SBATCH_RESERVATION, commenting out that line in setup.sh and trying again? > > Thanks, > David > > ------------------------------ > > *From: *"Yadu Nand" > *To: *"swift-devel" > *Sent: *Tuesday, March 19, 2013 5:35:41 PM > *Subject: *[Swift-devel] MODIS freezes on Midway > > > Hi, > > I've been running the modis tests on Midway, from the demo that mike had > shared: > /home/wilde/osgdemo/modis/svn/swiftdemo.v04.tgz > > In the logs(please see attachment) I see a fail message : "sbatch: > error: Batch job > submission failed: Requested reservation is invalid" > > The fact that no error messages are shown on stdout doesn't help, plus, > swift just > seems to hang forever. * Please help! * > > I see test.midway show no progress, with just the same status for about > 20mins: > Progress: time: Tue, 19 Mar 2013 20:21:18 +0000 Selecting site:35 > Submitted:65 > > After this, I tried to kill by Ctrl+C and then I get a few error messages : > Failed to shut down block: Block 0319-5807480-000000 (16x3540.000s) > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Can > only cancel an active task > at > org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.cancel(AbstractExecutor.java:196) > at > org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.cancel(AbstractJobSubmissionTaskHandler.java:85) > at > org.globus.cog.abstraction.impl.common.AbstractTaskHandler.cancel(AbstractTaskHandler.java:69) > at > org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.cancel(ExecutionTaskHandler.java:106) > at > org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.cancel(ExecutionTaskHandler.java:95) > at > org.globus.cog.abstraction.coaster.service.job.manager.BlockTaskSubmitter.cancel(BlockTaskSubmitter.java:46) > at > org.globus.cog.abstraction.coaster.service.job.manager.Block.forceShutdown(Block.java:332) > at > org.globus.cog.abstraction.coaster.service.job.manager.Block.shutdown(Block.java:312) > at > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.shutdownBlocks(BlockQueueProcessor.java:800) > at > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.shutdown(BlockQueueProcessor.java:789) > at > org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.shutdown(JobQueue.java:119) > at > org.globus.cog.abstraction.coaster.service.CoasterService.shutdown(CoasterService.java:271) > at > org.globus.cog.abstraction.coaster.service.ServiceShutdownHandler.requestComplete(ServiceShutdownHandler.java:28) > at > org.globus.cog.karajan.workflow.service.handlers.RequestHandler.receiveCompleted(RequestHandler.java:88) > at > org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleRequest(AbstractKarajanChannel.java:519) > at > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel.actualSend(AbstractPipedChannel.java:86) > at > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel$Sender.run(AbstractPipedChannel.java:115) > /home/wilde/osgdemo/modis/svn/swiftdemo.v04.tgz > > -- > Yadu Nand B > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Tue Mar 19 18:56:19 2013 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 19 Mar 2013 18:56:19 -0500 (CDT) Subject: [Swift-devel] MODIS freezes on Midway In-Reply-To: Message-ID: <100957623.1453227.1363737379598.JavaMail.root@mcs.anl.gov> Likely not needed if its using provider staging. ----- Original Message ----- > From: "Ketan Maheshwari" > To: "David Kelly" > Cc: "swift-devel" > Sent: Tuesday, March 19, 2013 6:04:44 PM > Subject: Re: [Swift-devel] MODIS freezes on Midway > > > > In addition to what David mentioned, from logs, it seems that your > sites file is missing this line: > > > > > > > Or David, correct me if this is not required in this configuration. > > > > On Tue, Mar 19, 2013 at 6:00 PM, David Kelly < davidk at ci.uchicago.edu > > wrote: > > > > > Yadu, > > > The setup.sh script sets this environment variable: > > > export SBATCH_RESERVATION=osg > > > I believe sbatch is picking up on this and trying to run with this > reservation, which is likely expired. Can you try unsetting > SBATCH_RESERVATION, commenting out that line in setup.sh and trying > again? > > > Thanks, > David > > > > > From: "Yadu Nand" < yadudoc1729 at gmail.com > > To: "swift-devel" < swift-devel at ci.uchicago.edu > > Sent: Tuesday, March 19, 2013 5:35:41 PM > Subject: [Swift-devel] MODIS freezes on Midway > > > > > Hi, > > I've been running the modis tests on Midway, from the demo that mike > had shared: > > /home/wilde/osgdemo/modis/svn/swiftdemo.v04.tgz > > > In the logs(please see attachment) I see a fail message : " sbatch: > error: Batch job > submission failed: Requested reservation is i nvalid" > > > The fact that no error messages are shown on stdout doesn't help, > plus, swift just > seems to hang forever. * Please help! * > > > I see test.midway show no progress, with just the same status for > about 20mins: > > Progress: time: Tue, 19 Mar 2013 20:21:18 +0000 Selecting site:35 > Submitted:65 > > > After this, I tried to kill by Ctrl+C and then I get a few error > messages : > > Failed to shut down block: Block 0319-5807480-000000 (16x3540.000s) > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > Can only cancel an active task > at > org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.cancel(AbstractExecutor.java:196) > at > org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.cancel(AbstractJobSubmissionTaskHandler.java:85) > at > org.globus.cog.abstraction.impl.common.AbstractTaskHandler.cancel(AbstractTaskHandler.java:69) > at > org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.cancel(ExecutionTaskHandler.java:106) > at > org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.cancel(ExecutionTaskHandler.java:95) > at > org.globus.cog.abstraction.coaster.service.job.manager.BlockTaskSubmitter.cancel(BlockTaskSubmitter.java:46) > at > org.globus.cog.abstraction.coaster.service.job.manager.Block.forceShutdown(Block.java:332) > at > org.globus.cog.abstraction.coaster.service.job.manager.Block.shutdown(Block.java:312) > at > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.shutdownBlocks(BlockQueueProcessor.java:800) > at > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.shutdown(BlockQueueProcessor.java:789) > at > org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.shutdown(JobQueue.java:119) > at > org.globus.cog.abstraction.coaster.service.CoasterService.shutdown(CoasterService.java:271) > at > org.globus.cog.abstraction.coaster.service.ServiceShutdownHandler.requestComplete(ServiceShutdownHandler.java:28) > at > org.globus.cog.karajan.workflow.service.handlers.RequestHandler.receiveCompleted(RequestHandler.java:88) > at > org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleRequest(AbstractKarajanChannel.java:519) > at > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel.actualSend(AbstractPipedChannel.java:86) > at > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel$Sender.run(AbstractPipedChannel.java:115) > /home/wilde/osgdemo/modis/svn/swiftdemo.v04.tgz > > -- > Yadu Nand B > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > -- > Ketan > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > From wilde at mcs.anl.gov Wed Mar 20 11:58:59 2013 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 20 Mar 2013 11:58:59 -0500 (CDT) Subject: [Swift-devel] SWIFT jobs in C/X states on uc3-sub In-Reply-To: <744F0739-16E2-4429-A168-457D020232EA@uchicago.edu> Message-ID: <1431338870.1621411.1363798739929.JavaMail.root@mcs.anl.gov> My jobs must be fossils; David, Yadu, we should test whether and why Swift doesnt always clean up. I realize that if Swift hangs and needs to be SIGKILL'ed then it cant. But lets see if the Condor provider is cleaning up when Swift gets a catchable signal. Lincoln, lease remove the "wilde" jobs if you can do that. Thanks, - Mike ----- Original Message ----- > From: "Lincoln Bryant" > To: "David Kelly" > Cc: "Michael Wilde" > Sent: Wednesday, March 20, 2013 11:26:30 AM > Subject: SWIFT jobs in C/X states on uc3-sub > > Hi David / Mike, > > I notice there are a lot of old jobs sitting in the UC3 queue. > They're sitting in either "X" (removed) or "C" (completed). Sample > below: > > > 70980.0 wilde 3/12 11:42 0+00:18:59 C 0 43.9 perl > > cscript906755 > > 70981.0 wilde 3/12 11:42 0+00:03:14 C 0 46.4 perl > > cscript906755 > > 70982.0 wilde 3/12 11:42 0+00:03:14 C 0 46.4 perl > > cscript906755 > > > 71652.0 davidk 3/13 19:28 0+00:00:01 X 0 0.0 perl > > cscript500002 > > 71653.0 davidk 3/13 19:28 0+00:00:01 X 0 0.0 perl > > cscript500002 > > 71896.0 davidk 3/13 19:38 0+00:00:01 X 0 0.0 perl > > cscript339551 > > Are these jobs completing OK on your side? > > Occasionally I go in and purge old jobs, but I'm curious if there's > something in your submit files that is causing them to stick. > > Cheers, > Lincoln From yadudoc1729 at gmail.com Wed Mar 20 12:19:01 2013 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Wed, 20 Mar 2013 22:49:01 +0530 Subject: [Swift-devel] MODIS freezes on Midway In-Reply-To: <100957623.1453227.1363737379598.JavaMail.root@mcs.anl.gov> References: <100957623.1453227.1363737379598.JavaMail.root@mcs.anl.gov> Message-ID: Hi everyone, David, please find the submit files attached to the mail. I am running the 5 different variants of modis, (local,midway,beagle,uc3, multiple) from the test system we have. I am not setting the SBATCH_RESERVATION variable in the setup scripts. Ketan, for modis_local and modis_midway, I am setting and interestingly, modis_local works fine from the stress test apps group now, the rest fail though. I think there is a different issue here now. It looks like most failures I'm seeing now is from perl and tc.data issues. I've attached the modis.stdout from the 5 testcases, if you'd like to take a look. The tc.data supplied is the same as the ones that came with swiftdemo. -Yadu On Wed, Mar 20, 2013 at 5:26 AM, Michael Wilde wrote: > Likely not needed if its using provider staging. > > ----- Original Message ----- > > From: "Ketan Maheshwari" > > To: "David Kelly" > > Cc: "swift-devel" > > Sent: Tuesday, March 19, 2013 6:04:44 PM > > Subject: Re: [Swift-devel] MODIS freezes on Midway > > > > > > > > In addition to what David mentioned, from logs, it seems that your > > sites file is missing this line: > > > > > > > > > > > > > > Or David, correct me if this is not required in this configuration. > > > > > > > > On Tue, Mar 19, 2013 at 6:00 PM, David Kelly < davidk at ci.uchicago.edu > > > wrote: > > > > > > > > > > Yadu, > > > > > > The setup.sh script sets this environment variable: > > > > > > export SBATCH_RESERVATION=osg > > > > > > I believe sbatch is picking up on this and trying to run with this > > reservation, which is likely expired. Can you try unsetting > > SBATCH_RESERVATION, commenting out that line in setup.sh and trying > > again? > > > > > > Thanks, > > David > > > > > > > > > > From: "Yadu Nand" < yadudoc1729 at gmail.com > > > To: "swift-devel" < swift-devel at ci.uchicago.edu > > > Sent: Tuesday, March 19, 2013 5:35:41 PM > > Subject: [Swift-devel] MODIS freezes on Midway > > > > > > > > > > Hi, > > > > I've been running the modis tests on Midway, from the demo that mike > > had shared: > > > > /home/wilde/osgdemo/modis/svn/swiftdemo.v04.tgz > > > > > > In the logs(please see attachment) I see a fail message : " sbatch: > > error: Batch job > > submission failed: Requested reservation is i nvalid" > > > > > > The fact that no error messages are shown on stdout doesn't help, > > plus, swift just > > seems to hang forever. * Please help! * > > > > > > I see test.midway show no progress, with just the same status for > > about 20mins: > > > > Progress: time: Tue, 19 Mar 2013 20:21:18 +0000 Selecting site:35 > > Submitted:65 > > > > > > After this, I tried to kill by Ctrl+C and then I get a few error > > messages : > > > > Failed to shut down block: Block 0319-5807480-000000 (16x3540.000s) > > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > > Can only cancel an active task > > at > > > org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.cancel(AbstractExecutor.java:196) > > at > > > org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.cancel(AbstractJobSubmissionTaskHandler.java:85) > > at > > > org.globus.cog.abstraction.impl.common.AbstractTaskHandler.cancel(AbstractTaskHandler.java:69) > > at > > > org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.cancel(ExecutionTaskHandler.java:106) > > at > > > org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.cancel(ExecutionTaskHandler.java:95) > > at > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockTaskSubmitter.cancel(BlockTaskSubmitter.java:46) > > at > > > org.globus.cog.abstraction.coaster.service.job.manager.Block.forceShutdown(Block.java:332) > > at > > > org.globus.cog.abstraction.coaster.service.job.manager.Block.shutdown(Block.java:312) > > at > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.shutdownBlocks(BlockQueueProcessor.java:800) > > at > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.shutdown(BlockQueueProcessor.java:789) > > at > > > org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.shutdown(JobQueue.java:119) > > at > > > org.globus.cog.abstraction.coaster.service.CoasterService.shutdown(CoasterService.java:271) > > at > > > org.globus.cog.abstraction.coaster.service.ServiceShutdownHandler.requestComplete(ServiceShutdownHandler.java:28) > > at > > > org.globus.cog.karajan.workflow.service.handlers.RequestHandler.receiveCompleted(RequestHandler.java:88) > > at > > > org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleRequest(AbstractKarajanChannel.java:519) > > at > > > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel.actualSend(AbstractPipedChannel.java:86) > > at > > > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel$Sender.run(AbstractPipedChannel.java:115) > > /home/wilde/osgdemo/modis/svn/swiftdemo.v04.tgz > > > > -- > > Yadu Nand B > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > > > > > > > -- > > Ketan > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > -- Yadu Nand B -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1.submit Type: application/octet-stream Size: 923 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 2.submit Type: application/octet-stream Size: 923 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 3.submit Type: application/octet-stream Size: 921 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 4.submit Type: application/octet-stream Size: 923 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: modis.stdout.all Type: application/octet-stream Size: 36973 bytes Desc: not available URL: From davidk at ci.uchicago.edu Wed Mar 20 14:05:56 2013 From: davidk at ci.uchicago.edu (David Kelly) Date: Wed, 20 Mar 2013 14:05:56 -0500 (CDT) Subject: [Swift-devel] MODIS freezes on Midway In-Reply-To: Message-ID: <1218836492.2710588.1363806356329.JavaMail.root@ci.uchicago.edu> Yadu, As a test, I just untarred swiftdemo.v04.tgz, removed the SBATCH_RESERVATION line from setup.sh and was able to run on midway. Send me a message on Skype when you have a few minutes and we can take a closer look at this. David ----- Original Message ----- > From: "Yadu Nand" > To: "Michael Wilde" > Cc: "swift-devel" > Sent: Wednesday, March 20, 2013 12:19:01 PM > Subject: Re: [Swift-devel] MODIS freezes on Midway > Hi everyone, > David, please find the submit files attached to the mail. > I am running the 5 different variants of modis, > (local,midway,beagle,uc3, multiple) > from the test system we have. I am not setting the SBATCH_RESERVATION > variable > in the setup scripts. > Ketan, for modis_local and modis_midway, I am setting provider="local"/> > and interestingly, modis_local works fine from the stress test apps > group now, the rest > fail though. I think there is a different issue here now. > It looks like most failures I'm seeing now is from perl and tc.data > issues. I've attached the > modis.stdout from the 5 testcases, if you'd like to take a look. The > tc.data supplied is the > same as the ones that came with swiftdemo. > -Yadu > On Wed, Mar 20, 2013 at 5:26 AM, Michael Wilde < wilde at mcs.anl.gov > > wrote: > > Likely not needed if its using provider staging. > > > ----- Original Message ----- > > > > From: "Ketan Maheshwari" < ketancmaheshwari at gmail.com > > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > > Cc: "swift-devel" < swift-devel at ci.uchicago.edu > > > > > Sent: Tuesday, March 19, 2013 6:04:44 PM > > > > Subject: Re: [Swift-devel] MODIS freezes on Midway > > > > > > > > > > > > > > > > In addition to what David mentioned, from logs, it seems that > > > your > > > > sites file is missing this line: > > > > > > > > > > > > > > > > > > > > > > > > > > > > Or David, correct me if this is not required in this > > > configuration. > > > > > > > > > > > > > > > > On Tue, Mar 19, 2013 at 6:00 PM, David Kelly < > > > davidk at ci.uchicago.edu > > > > > wrote: > > > > > > > > > > > > > > > > > > > > Yadu, > > > > > > > > > > > > The setup.sh script sets this environment variable: > > > > > > > > > > > > export SBATCH_RESERVATION=osg > > > > > > > > > > > > I believe sbatch is picking up on this and trying to run with > > > this > > > > reservation, which is likely expired. Can you try unsetting > > > > SBATCH_RESERVATION, commenting out that line in setup.sh and > > > trying > > > > again? > > > > > > > > > > > > Thanks, > > > > David > > > > > > > > > > > > > > > > > > > > From: "Yadu Nand" < yadudoc1729 at gmail.com > > > > > To: "swift-devel" < swift-devel at ci.uchicago.edu > > > > > Sent: Tuesday, March 19, 2013 5:35:41 PM > > > > Subject: [Swift-devel] MODIS freezes on Midway > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > I've been running the modis tests on Midway, from the demo that > > > mike > > > > had shared: > > > > > > > > /home/wilde/osgdemo/modis/svn/swiftdemo.v04.tgz > > > > > > > > > > > > In the logs(please see attachment) I see a fail message : " > > > sbatch: > > > > error: Batch job > > > > submission failed: Requested reservation is i nvalid" > > > > > > > > > > > > The fact that no error messages are shown on stdout doesn't help, > > > > plus, swift just > > > > seems to hang forever. * Please help! * > > > > > > > > > > > > I see test.midway show no progress, with just the same status for > > > > about 20mins: > > > > > > > > Progress: time: Tue, 19 Mar 2013 20:21:18 +0000 Selecting site:35 > > > > Submitted:65 > > > > > > > > > > > > After this, I tried to kill by Ctrl+C and then I get a few error > > > > messages : > > > > > > > > Failed to shut down block: Block 0319-5807480-000000 > > > (16x3540.000s) > > > > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > > > > Can only cancel an active task > > > > at > > > > org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.cancel(AbstractExecutor.java:196) > > > > at > > > > org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.cancel(AbstractJobSubmissionTaskHandler.java:85) > > > > at > > > > org.globus.cog.abstraction.impl.common.AbstractTaskHandler.cancel(AbstractTaskHandler.java:69) > > > > at > > > > org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.cancel(ExecutionTaskHandler.java:106) > > > > at > > > > org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.cancel(ExecutionTaskHandler.java:95) > > > > at > > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockTaskSubmitter.cancel(BlockTaskSubmitter.java:46) > > > > at > > > > org.globus.cog.abstraction.coaster.service.job.manager.Block.forceShutdown(Block.java:332) > > > > at > > > > org.globus.cog.abstraction.coaster.service.job.manager.Block.shutdown(Block.java:312) > > > > at > > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.shutdownBlocks(BlockQueueProcessor.java:800) > > > > at > > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.shutdown(BlockQueueProcessor.java:789) > > > > at > > > > org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.shutdown(JobQueue.java:119) > > > > at > > > > org.globus.cog.abstraction.coaster.service.CoasterService.shutdown(CoasterService.java:271) > > > > at > > > > org.globus.cog.abstraction.coaster.service.ServiceShutdownHandler.requestComplete(ServiceShutdownHandler.java:28) > > > > at > > > > org.globus.cog.karajan.workflow.service.handlers.RequestHandler.receiveCompleted(RequestHandler.java:88) > > > > at > > > > org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleRequest(AbstractKarajanChannel.java:519) > > > > at > > > > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel.actualSend(AbstractPipedChannel.java:86) > > > > at > > > > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel$Sender.run(AbstractPipedChannel.java:115) > > > > /home/wilde/osgdemo/modis/svn/swiftdemo.v04.tgz > > > > > > > > -- > > > > Yadu Nand B > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Ketan > > > > > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > -- > Yadu Nand B > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Wed Mar 20 14:13:26 2013 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 20 Mar 2013 14:13:26 -0500 (CDT) Subject: [Swift-devel] MODIS freezes on Midway In-Reply-To: <1218836492.2710588.1363806356329.JavaMail.root@ci.uchicago.edu> Message-ID: <452491664.1729681.1363806806887.JavaMail.root@mcs.anl.gov> Also the code is now in svn: https://svn.ci.uchicago.edu/svn/vdl2/SwiftTutorials/OSG_2013-03-11/MODIS - Mike ----- Original Message ----- > From: "David Kelly" > To: "Yadu Nand" > Cc: "swift-devel" , "Michael Wilde" > Sent: Wednesday, March 20, 2013 2:05:56 PM > Subject: Re: [Swift-devel] MODIS freezes on Midway > > > Yadu, > > > As a test, I just untarred swiftdemo.v04.tgz, removed the > SBATCH_RESERVATION line from setup.sh and was able to run on midway. > Send me a message on Skype when you have a few minutes and we can > take a closer look at this. > > David > > > ----- Original Message ----- > > > From: "Yadu Nand" > To: "Michael Wilde" > Cc: "swift-devel" > Sent: Wednesday, March 20, 2013 12:19:01 PM > Subject: Re: [Swift-devel] MODIS freezes on Midway > > Hi everyone, > > > David, please find the submit files attached to the mail. > I am running the 5 different variants of modis, > (local,midway,beagle,uc3, multiple) > from the test system we have. I am not setting the SBATCH_RESERVATION > variable > in the setup scripts. > > > Ketan, for modis_local and modis_midway, I am setting provider="local"/> > and interestingly, modis_local works fine from the stress test apps > group now, the rest > fail though. I think there is a different issue here now. > > > It looks like most failures I'm seeing now is from perl and tc.data > issues. I've attached the > modis.stdout from the 5 testcases, if you'd like to take a look. The > tc.data supplied is the > same as the ones that came with swiftdemo. > > > -Yadu > > > > > On Wed, Mar 20, 2013 at 5:26 AM, Michael Wilde < wilde at mcs.anl.gov > > wrote: > > > Likely not needed if its using provider staging. > > > ----- Original Message ----- > > From: "Ketan Maheshwari" < ketancmaheshwari at gmail.com > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > Cc: "swift-devel" < swift-devel at ci.uchicago.edu > > > Sent: Tuesday, March 19, 2013 6:04:44 PM > > Subject: Re: [Swift-devel] MODIS freezes on Midway > > > > > > > > In addition to what David mentioned, from logs, it seems that your > > sites file is missing this line: > > > > > > > > > > > > > > Or David, correct me if this is not required in this configuration. > > > > > > > > On Tue, Mar 19, 2013 at 6:00 PM, David Kelly < > > davidk at ci.uchicago.edu > > > wrote: > > > > > > > > > > Yadu, > > > > > > The setup.sh script sets this environment variable: > > > > > > export SBATCH_RESERVATION=osg > > > > > > I believe sbatch is picking up on this and trying to run with this > > reservation, which is likely expired. Can you try unsetting > > SBATCH_RESERVATION, commenting out that line in setup.sh and trying > > again? > > > > > > Thanks, > > David > > > > > > > > > > > > From: "Yadu Nand" < yadudoc1729 at gmail.com > > > To: "swift-devel" < swift-devel at ci.uchicago.edu > > > Sent: Tuesday, March 19, 2013 5:35:41 PM > > Subject: [Swift-devel] MODIS freezes on Midway > > > > > > > > > > Hi, > > > > I've been running the modis tests on Midway, from the demo that > > mike > > had shared: > > > > /home/wilde/osgdemo/modis/svn/swiftdemo.v04.tgz > > > > > > In the logs(please see attachment) I see a fail message : " sbatch: > > error: Batch job > > submission failed: Requested reservation is i nvalid" > > > > > > The fact that no error messages are shown on stdout doesn't help, > > plus, swift just > > seems to hang forever. * Please help! * > > > > > > I see test.midway show no progress, with just the same status for > > about 20mins: > > > > Progress: time: Tue, 19 Mar 2013 20:21:18 +0000 Selecting site:35 > > Submitted:65 > > > > > > After this, I tried to kill by Ctrl+C and then I get a few error > > messages : > > > > Failed to shut down block: Block 0319-5807480-000000 (16x3540.000s) > > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > > Can only cancel an active task > > at > > org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.cancel(AbstractExecutor.java:196) > > at > > org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.cancel(AbstractJobSubmissionTaskHandler.java:85) > > at > > org.globus.cog.abstraction.impl.common.AbstractTaskHandler.cancel(AbstractTaskHandler.java:69) > > at > > org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.cancel(ExecutionTaskHandler.java:106) > > at > > org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.cancel(ExecutionTaskHandler.java:95) > > at > > org.globus.cog.abstraction.coaster.service.job.manager.BlockTaskSubmitter.cancel(BlockTaskSubmitter.java:46) > > at > > org.globus.cog.abstraction.coaster.service.job.manager.Block.forceShutdown(Block.java:332) > > at > > org.globus.cog.abstraction.coaster.service.job.manager.Block.shutdown(Block.java:312) > > at > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.shutdownBlocks(BlockQueueProcessor.java:800) > > at > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.shutdown(BlockQueueProcessor.java:789) > > at > > org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.shutdown(JobQueue.java:119) > > at > > org.globus.cog.abstraction.coaster.service.CoasterService.shutdown(CoasterService.java:271) > > at > > org.globus.cog.abstraction.coaster.service.ServiceShutdownHandler.requestComplete(ServiceShutdownHandler.java:28) > > at > > org.globus.cog.karajan.workflow.service.handlers.RequestHandler.receiveCompleted(RequestHandler.java:88) > > at > > org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleRequest(AbstractKarajanChannel.java:519) > > at > > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel.actualSend(AbstractPipedChannel.java:86) > > at > > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel$Sender.run(AbstractPipedChannel.java:115) > > /home/wilde/osgdemo/modis/svn/swiftdemo.v04.tgz > > > > -- > > Yadu Nand B > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > > > > > > > -- > > Ketan > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > -- > Yadu Nand B > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > From yadudoc1729 at gmail.com Wed Mar 20 14:46:03 2013 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Thu, 21 Mar 2013 01:16:03 +0530 Subject: [Swift-devel] MODIS freezes on Midway In-Reply-To: <452491664.1729681.1363806806887.JavaMail.root@mcs.anl.gov> References: <1218836492.2710588.1363806356329.JavaMail.root@ci.uchicago.edu> <452491664.1729681.1363806806887.JavaMail.root@mcs.anl.gov> Message-ID: Quick update. The sites.xml file for modis.midway was messed up while copying. Now the tests for local and midway are running fine from the test suite. I'm seeing the test.beagle fail because there isn't a folder in my name on lustre... It fails with Could not submit job Could not start coaster service Task ended before registration was received. Failed to download bootstrap jar. On test.uc3, I can see jobs getting submitted and completed, but it looks like an exception is thrown in perl, halting the test. Sorry about the formatting, I'm mailing from my phone. -yadu On Mar 21, 2013 12:43 AM, "Michael Wilde" wrote: > Also the code is now in svn: > > https://svn.ci.uchicago.edu/svn/vdl2/SwiftTutorials/OSG_2013-03-11/MODIS > > - Mike > > ----- Original Message ----- > > From: "David Kelly" > > To: "Yadu Nand" > > Cc: "swift-devel" , "Michael Wilde" < > wilde at mcs.anl.gov> > > Sent: Wednesday, March 20, 2013 2:05:56 PM > > Subject: Re: [Swift-devel] MODIS freezes on Midway > > > > > > Yadu, > > > > > > As a test, I just untarred swiftdemo.v04.tgz, removed the > > SBATCH_RESERVATION line from setup.sh and was able to run on midway. > > Send me a message on Skype when you have a few minutes and we can > > take a closer look at this. > > > > David > > > > > > ----- Original Message ----- > > > > > > From: "Yadu Nand" > > To: "Michael Wilde" > > Cc: "swift-devel" > > Sent: Wednesday, March 20, 2013 12:19:01 PM > > Subject: Re: [Swift-devel] MODIS freezes on Midway > > > > Hi everyone, > > > > > > David, please find the submit files attached to the mail. > > I am running the 5 different variants of modis, > > (local,midway,beagle,uc3, multiple) > > from the test system we have. I am not setting the SBATCH_RESERVATION > > variable > > in the setup scripts. > > > > > > Ketan, for modis_local and modis_midway, I am setting > provider="local"/> > > and interestingly, modis_local works fine from the stress test apps > > group now, the rest > > fail though. I think there is a different issue here now. > > > > > > It looks like most failures I'm seeing now is from perl and tc.data > > issues. I've attached the > > modis.stdout from the 5 testcases, if you'd like to take a look. The > > tc.data supplied is the > > same as the ones that came with swiftdemo. > > > > > > -Yadu > > > > > > > > > > On Wed, Mar 20, 2013 at 5:26 AM, Michael Wilde < wilde at mcs.anl.gov > > > wrote: > > > > > > Likely not needed if its using provider staging. > > > > > > ----- Original Message ----- > > > From: "Ketan Maheshwari" < ketancmaheshwari at gmail.com > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > Cc: "swift-devel" < swift-devel at ci.uchicago.edu > > > > Sent: Tuesday, March 19, 2013 6:04:44 PM > > > Subject: Re: [Swift-devel] MODIS freezes on Midway > > > > > > > > > > > > In addition to what David mentioned, from logs, it seems that your > > > sites file is missing this line: > > > > > > > > > > > > > > > > > > > > > Or David, correct me if this is not required in this configuration. > > > > > > > > > > > > On Tue, Mar 19, 2013 at 6:00 PM, David Kelly < > > > davidk at ci.uchicago.edu > > > > wrote: > > > > > > > > > > > > > > > Yadu, > > > > > > > > > The setup.sh script sets this environment variable: > > > > > > > > > export SBATCH_RESERVATION=osg > > > > > > > > > I believe sbatch is picking up on this and trying to run with this > > > reservation, which is likely expired. Can you try unsetting > > > SBATCH_RESERVATION, commenting out that line in setup.sh and trying > > > again? > > > > > > > > > Thanks, > > > David > > > > > > > > > > > > > > > > > > > From: "Yadu Nand" < yadudoc1729 at gmail.com > > > > To: "swift-devel" < swift-devel at ci.uchicago.edu > > > > Sent: Tuesday, March 19, 2013 5:35:41 PM > > > Subject: [Swift-devel] MODIS freezes on Midway > > > > > > > > > > > > > > > Hi, > > > > > > I've been running the modis tests on Midway, from the demo that > > > mike > > > had shared: > > > > > > /home/wilde/osgdemo/modis/svn/swiftdemo.v04.tgz > > > > > > > > > In the logs(please see attachment) I see a fail message : " sbatch: > > > error: Batch job > > > submission failed: Requested reservation is i nvalid" > > > > > > > > > The fact that no error messages are shown on stdout doesn't help, > > > plus, swift just > > > seems to hang forever. * Please help! * > > > > > > > > > I see test.midway show no progress, with just the same status for > > > about 20mins: > > > > > > Progress: time: Tue, 19 Mar 2013 20:21:18 +0000 Selecting site:35 > > > Submitted:65 > > > > > > > > > After this, I tried to kill by Ctrl+C and then I get a few error > > > messages : > > > > > > Failed to shut down block: Block 0319-5807480-000000 (16x3540.000s) > > > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > > > Can only cancel an active task > > > at > > > > org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.cancel(AbstractExecutor.java:196) > > > at > > > > org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.cancel(AbstractJobSubmissionTaskHandler.java:85) > > > at > > > > org.globus.cog.abstraction.impl.common.AbstractTaskHandler.cancel(AbstractTaskHandler.java:69) > > > at > > > > org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.cancel(ExecutionTaskHandler.java:106) > > > at > > > > org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.cancel(ExecutionTaskHandler.java:95) > > > at > > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockTaskSubmitter.cancel(BlockTaskSubmitter.java:46) > > > at > > > > org.globus.cog.abstraction.coaster.service.job.manager.Block.forceShutdown(Block.java:332) > > > at > > > > org.globus.cog.abstraction.coaster.service.job.manager.Block.shutdown(Block.java:312) > > > at > > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.shutdownBlocks(BlockQueueProcessor.java:800) > > > at > > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.shutdown(BlockQueueProcessor.java:789) > > > at > > > > org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.shutdown(JobQueue.java:119) > > > at > > > > org.globus.cog.abstraction.coaster.service.CoasterService.shutdown(CoasterService.java:271) > > > at > > > > org.globus.cog.abstraction.coaster.service.ServiceShutdownHandler.requestComplete(ServiceShutdownHandler.java:28) > > > at > > > > org.globus.cog.karajan.workflow.service.handlers.RequestHandler.receiveCompleted(RequestHandler.java:88) > > > at > > > > org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleRequest(AbstractKarajanChannel.java:519) > > > at > > > > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel.actualSend(AbstractPipedChannel.java:86) > > > at > > > > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel$Sender.run(AbstractPipedChannel.java:115) > > > /home/wilde/osgdemo/modis/svn/swiftdemo.v04.tgz > > > > > > -- > > > Yadu Nand B > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > -- > > > Ketan > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > > > > > -- > > Yadu Nand B > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Wed Mar 20 15:05:14 2013 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 20 Mar 2013 15:05:14 -0500 (CDT) Subject: [Swift-devel] MODIS freezes on Midway In-Reply-To: Message-ID: <1126490418.1755197.1363809914791.JavaMail.root@mcs.anl.gov> Yadu, I dont think the beagle test requires you to have any directories on beagle. But it does expect that you have set up your ssh keys so that from the midway login host you can do a password-less ssh to beagle. Test that manually before trying test.beagle. - Mike ----- Original Message ----- > From: "Yadu Nand" > To: "Michael Wilde" > Cc: "David Kelly" , "swift-devel" > Sent: Wednesday, March 20, 2013 2:46:03 PM > Subject: Re: [Swift-devel] MODIS freezes on Midway > > > > Quick update. > > The sites.xml file for modis.midway was messed up while copying. Now > the tests for local and midway are running fine from the test suite. > > I'm seeing the test.beagle fail because there isn't a folder in my > name on lustre... It fails with > Could not submit job > Could not start coaster service > Task ended before registration was received. Failed to download > bootstrap jar. > > On test.uc3, I can see jobs getting submitted and completed, but it > looks like an exception is thrown in perl, halting the test. > > > Sorry about the formatting, I'm mailing from my phone. > > -yadu > On Mar 21, 2013 12:43 AM, "Michael Wilde" < wilde at mcs.anl.gov > > wrote: > > > Also the code is now in svn: > > https://svn.ci.uchicago.edu/svn/vdl2/SwiftTutorials/OSG_2013-03-11/MODIS > > - Mike > > ----- Original Message ----- > > From: "David Kelly" < davidk at ci.uchicago.edu > > > To: "Yadu Nand" < yadudoc1729 at gmail.com > > > Cc: "swift-devel" < swift-devel at ci.uchicago.edu >, "Michael Wilde" > > < wilde at mcs.anl.gov > > > Sent: Wednesday, March 20, 2013 2:05:56 PM > > Subject: Re: [Swift-devel] MODIS freezes on Midway > > > > > > Yadu, > > > > > > As a test, I just untarred swiftdemo.v04.tgz, removed the > > SBATCH_RESERVATION line from setup.sh and was able to run on > > midway. > > Send me a message on Skype when you have a few minutes and we can > > take a closer look at this. > > > > David > > > > > > ----- Original Message ----- > > > > > > From: "Yadu Nand" < yadudoc1729 at gmail.com > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > Cc: "swift-devel" < swift-devel at ci.uchicago.edu > > > Sent: Wednesday, March 20, 2013 12:19:01 PM > > Subject: Re: [Swift-devel] MODIS freezes on Midway > > > > Hi everyone, > > > > > > David, please find the submit files attached to the mail. > > I am running the 5 different variants of modis, > > (local,midway,beagle,uc3, multiple) > > from the test system we have. I am not setting the > > SBATCH_RESERVATION > > variable > > in the setup scripts. > > > > > > Ketan, for modis_local and modis_midway, I am setting > provider="local"/> > > and interestingly, modis_local works fine from the stress test apps > > group now, the rest > > fail though. I think there is a different issue here now. > > > > > > It looks like most failures I'm seeing now is from perl and tc.data > > issues. I've attached the > > modis.stdout from the 5 testcases, if you'd like to take a look. > > The > > tc.data supplied is the > > same as the ones that came with swiftdemo. > > > > > > -Yadu > > > > > > > > > > On Wed, Mar 20, 2013 at 5:26 AM, Michael Wilde < wilde at mcs.anl.gov > > > > > wrote: > > > > > > Likely not needed if its using provider staging. > > > > > > ----- Original Message ----- > > > From: "Ketan Maheshwari" < ketancmaheshwari at gmail.com > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > Cc: "swift-devel" < swift-devel at ci.uchicago.edu > > > > Sent: Tuesday, March 19, 2013 6:04:44 PM > > > Subject: Re: [Swift-devel] MODIS freezes on Midway > > > > > > > > > > > > In addition to what David mentioned, from logs, it seems that > > > your > > > sites file is missing this line: > > > > > > > > > > > > > > > > > > > > > Or David, correct me if this is not required in this > > > configuration. > > > > > > > > > > > > On Tue, Mar 19, 2013 at 6:00 PM, David Kelly < > > > davidk at ci.uchicago.edu > > > > wrote: > > > > > > > > > > > > > > > Yadu, > > > > > > > > > The setup.sh script sets this environment variable: > > > > > > > > > export SBATCH_RESERVATION=osg > > > > > > > > > I believe sbatch is picking up on this and trying to run with > > > this > > > reservation, which is likely expired. Can you try unsetting > > > SBATCH_RESERVATION, commenting out that line in setup.sh and > > > trying > > > again? > > > > > > > > > Thanks, > > > David > > > > > > > > > > > > > > > > > > > From: "Yadu Nand" < yadudoc1729 at gmail.com > > > > To: "swift-devel" < swift-devel at ci.uchicago.edu > > > > Sent: Tuesday, March 19, 2013 5:35:41 PM > > > Subject: [Swift-devel] MODIS freezes on Midway > > > > > > > > > > > > > > > Hi, > > > > > > I've been running the modis tests on Midway, from the demo that > > > mike > > > had shared: > > > > > > /home/wilde/osgdemo/modis/svn/swiftdemo.v04.tgz > > > > > > > > > In the logs(please see attachment) I see a fail message : " > > > sbatch: > > > error: Batch job > > > submission failed: Requested reservation is i nvalid" > > > > > > > > > The fact that no error messages are shown on stdout doesn't help, > > > plus, swift just > > > seems to hang forever. * Please help! * > > > > > > > > > I see test.midway show no progress, with just the same status for > > > about 20mins: > > > > > > Progress: time: Tue, 19 Mar 2013 20:21:18 +0000 Selecting site:35 > > > Submitted:65 > > > > > > > > > After this, I tried to kill by Ctrl+C and then I get a few error > > > messages : > > > > > > Failed to shut down block: Block 0319-5807480-000000 > > > (16x3540.000s) > > > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > > > Can only cancel an active task > > > at > > > org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.cancel(AbstractExecutor.java:196) > > > at > > > org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.cancel(AbstractJobSubmissionTaskHandler.java:85) > > > at > > > org.globus.cog.abstraction.impl.common.AbstractTaskHandler.cancel(AbstractTaskHandler.java:69) > > > at > > > org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.cancel(ExecutionTaskHandler.java:106) > > > at > > > org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.cancel(ExecutionTaskHandler.java:95) > > > at > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockTaskSubmitter.cancel(BlockTaskSubmitter.java:46) > > > at > > > org.globus.cog.abstraction.coaster.service.job.manager.Block.forceShutdown(Block.java:332) > > > at > > > org.globus.cog.abstraction.coaster.service.job.manager.Block.shutdown(Block.java:312) > > > at > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.shutdownBlocks(BlockQueueProcessor.java:800) > > > at > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.shutdown(BlockQueueProcessor.java:789) > > > at > > > org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.shutdown(JobQueue.java:119) > > > at > > > org.globus.cog.abstraction.coaster.service.CoasterService.shutdown(CoasterService.java:271) > > > at > > > org.globus.cog.abstraction.coaster.service.ServiceShutdownHandler.requestComplete(ServiceShutdownHandler.java:28) > > > at > > > org.globus.cog.karajan.workflow.service.handlers.RequestHandler.receiveCompleted(RequestHandler.java:88) > > > at > > > org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleRequest(AbstractKarajanChannel.java:519) > > > at > > > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel.actualSend(AbstractPipedChannel.java:86) > > > at > > > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel$Sender.run(AbstractPipedChannel.java:115) > > > /home/wilde/osgdemo/modis/svn/swiftdemo.v04.tgz > > > > > > -- > > > Yadu Nand B > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > -- > > > Ketan > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > > > > > -- > > Yadu Nand B > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > From yadudoc1729 at gmail.com Wed Mar 20 15:20:22 2013 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Thu, 21 Mar 2013 01:50:22 +0530 Subject: [Swift-devel] MODIS freezes on Midway In-Reply-To: <1126490418.1755197.1363809914791.JavaMail.root@mcs.anl.gov> References: <1126490418.1755197.1363809914791.JavaMail.root@mcs.anl.gov> Message-ID: Hi Mike, I can do passwordless-ssh to beagle and uc3 from midway. I am attaching a log from the failures I'm seeing, and David just confirmed that he is also seeing the same. -Yadu On Thu, Mar 21, 2013 at 1:35 AM, Michael Wilde wrote: > Yadu, I dont think the beagle test requires you to have any directories on > beagle. > > But it does expect that you have set up your ssh keys so that from the > midway login host you can do a password-less ssh to beagle. Test that > manually before trying test.beagle. > > - Mike > > ----- Original Message ----- > > From: "Yadu Nand" > > To: "Michael Wilde" > > Cc: "David Kelly" , "swift-devel" < > swift-devel at ci.uchicago.edu> > > Sent: Wednesday, March 20, 2013 2:46:03 PM > > Subject: Re: [Swift-devel] MODIS freezes on Midway > > > > > > > > Quick update. > > > > The sites.xml file for modis.midway was messed up while copying. Now > > the tests for local and midway are running fine from the test suite. > > > > I'm seeing the test.beagle fail because there isn't a folder in my > > name on lustre... It fails with > > Could not submit job > > Could not start coaster service > > Task ended before registration was received. Failed to download > > bootstrap jar. > > > > On test.uc3, I can see jobs getting submitted and completed, but it > > looks like an exception is thrown in perl, halting the test. > > > > > > Sorry about the formatting, I'm mailing from my phone. > > > > -yadu > > On Mar 21, 2013 12:43 AM, "Michael Wilde" < wilde at mcs.anl.gov > > > wrote: > > > > > > Also the code is now in svn: > > > > https://svn.ci.uchicago.edu/svn/vdl2/SwiftTutorials/OSG_2013-03-11/MODIS > > > > - Mike > > > > ----- Original Message ----- > > > From: "David Kelly" < davidk at ci.uchicago.edu > > > > To: "Yadu Nand" < yadudoc1729 at gmail.com > > > > Cc: "swift-devel" < swift-devel at ci.uchicago.edu >, "Michael Wilde" > > > < wilde at mcs.anl.gov > > > > Sent: Wednesday, March 20, 2013 2:05:56 PM > > > Subject: Re: [Swift-devel] MODIS freezes on Midway > > > > > > > > > Yadu, > > > > > > > > > As a test, I just untarred swiftdemo.v04.tgz, removed the > > > SBATCH_RESERVATION line from setup.sh and was able to run on > > > midway. > > > Send me a message on Skype when you have a few minutes and we can > > > take a closer look at this. > > > > > > David > > > > > > > > > ----- Original Message ----- > > > > > > > > > From: "Yadu Nand" < yadudoc1729 at gmail.com > > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > > Cc: "swift-devel" < swift-devel at ci.uchicago.edu > > > > Sent: Wednesday, March 20, 2013 12:19:01 PM > > > Subject: Re: [Swift-devel] MODIS freezes on Midway > > > > > > Hi everyone, > > > > > > > > > David, please find the submit files attached to the mail. > > > I am running the 5 different variants of modis, > > > (local,midway,beagle,uc3, multiple) > > > from the test system we have. I am not setting the > > > SBATCH_RESERVATION > > > variable > > > in the setup scripts. > > > > > > > > > Ketan, for modis_local and modis_midway, I am setting > > provider="local"/> > > > and interestingly, modis_local works fine from the stress test apps > > > group now, the rest > > > fail though. I think there is a different issue here now. > > > > > > > > > It looks like most failures I'm seeing now is from perl and tc.data > > > issues. I've attached the > > > modis.stdout from the 5 testcases, if you'd like to take a look. > > > The > > > tc.data supplied is the > > > same as the ones that came with swiftdemo. > > > > > > > > > -Yadu > > > > > > > > > > > > > > > On Wed, Mar 20, 2013 at 5:26 AM, Michael Wilde < wilde at mcs.anl.gov > > > > > > > wrote: > > > > > > > > > Likely not needed if its using provider staging. > > > > > > > > > ----- Original Message ----- > > > > From: "Ketan Maheshwari" < ketancmaheshwari at gmail.com > > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > > Cc: "swift-devel" < swift-devel at ci.uchicago.edu > > > > > Sent: Tuesday, March 19, 2013 6:04:44 PM > > > > Subject: Re: [Swift-devel] MODIS freezes on Midway > > > > > > > > > > > > > > > > In addition to what David mentioned, from logs, it seems that > > > > your > > > > sites file is missing this line: > > > > > > > > > > > > > > > > > > > > > > > > > > > > Or David, correct me if this is not required in this > > > > configuration. > > > > > > > > > > > > > > > > On Tue, Mar 19, 2013 at 6:00 PM, David Kelly < > > > > davidk at ci.uchicago.edu > > > > > wrote: > > > > > > > > > > > > > > > > > > > > Yadu, > > > > > > > > > > > > The setup.sh script sets this environment variable: > > > > > > > > > > > > export SBATCH_RESERVATION=osg > > > > > > > > > > > > I believe sbatch is picking up on this and trying to run with > > > > this > > > > reservation, which is likely expired. Can you try unsetting > > > > SBATCH_RESERVATION, commenting out that line in setup.sh and > > > > trying > > > > again? > > > > > > > > > > > > Thanks, > > > > David > > > > > > > > > > > > > > > > > > > > > > > > > > From: "Yadu Nand" < yadudoc1729 at gmail.com > > > > > To: "swift-devel" < swift-devel at ci.uchicago.edu > > > > > Sent: Tuesday, March 19, 2013 5:35:41 PM > > > > Subject: [Swift-devel] MODIS freezes on Midway > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > I've been running the modis tests on Midway, from the demo that > > > > mike > > > > had shared: > > > > > > > > /home/wilde/osgdemo/modis/svn/swiftdemo.v04.tgz > > > > > > > > > > > > In the logs(please see attachment) I see a fail message : " > > > > sbatch: > > > > error: Batch job > > > > submission failed: Requested reservation is i nvalid" > > > > > > > > > > > > The fact that no error messages are shown on stdout doesn't help, > > > > plus, swift just > > > > seems to hang forever. * Please help! * > > > > > > > > > > > > I see test.midway show no progress, with just the same status for > > > > about 20mins: > > > > > > > > Progress: time: Tue, 19 Mar 2013 20:21:18 +0000 Selecting site:35 > > > > Submitted:65 > > > > > > > > > > > > After this, I tried to kill by Ctrl+C and then I get a few error > > > > messages : > > > > > > > > Failed to shut down block: Block 0319-5807480-000000 > > > > (16x3540.000s) > > > > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > > > > Can only cancel an active task > > > > at > > > > > org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.cancel(AbstractExecutor.java:196) > > > > at > > > > > org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.cancel(AbstractJobSubmissionTaskHandler.java:85) > > > > at > > > > > org.globus.cog.abstraction.impl.common.AbstractTaskHandler.cancel(AbstractTaskHandler.java:69) > > > > at > > > > > org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.cancel(ExecutionTaskHandler.java:106) > > > > at > > > > > org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.cancel(ExecutionTaskHandler.java:95) > > > > at > > > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockTaskSubmitter.cancel(BlockTaskSubmitter.java:46) > > > > at > > > > > org.globus.cog.abstraction.coaster.service.job.manager.Block.forceShutdown(Block.java:332) > > > > at > > > > > org.globus.cog.abstraction.coaster.service.job.manager.Block.shutdown(Block.java:312) > > > > at > > > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.shutdownBlocks(BlockQueueProcessor.java:800) > > > > at > > > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.shutdown(BlockQueueProcessor.java:789) > > > > at > > > > > org.globus.cog.abstraction.coaster.service.job.manager.JobQueue.shutdown(JobQueue.java:119) > > > > at > > > > > org.globus.cog.abstraction.coaster.service.CoasterService.shutdown(CoasterService.java:271) > > > > at > > > > > org.globus.cog.abstraction.coaster.service.ServiceShutdownHandler.requestComplete(ServiceShutdownHandler.java:28) > > > > at > > > > > org.globus.cog.karajan.workflow.service.handlers.RequestHandler.receiveCompleted(RequestHandler.java:88) > > > > at > > > > > org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleRequest(AbstractKarajanChannel.java:519) > > > > at > > > > > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel.actualSend(AbstractPipedChannel.java:86) > > > > at > > > > > org.globus.cog.karajan.workflow.service.channels.AbstractPipedChannel$Sender.run(AbstractPipedChannel.java:115) > > > > /home/wilde/osgdemo/modis/svn/swiftdemo.v04.tgz > > > > > > > > -- > > > > Yadu Nand B > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Ketan > > > > > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > -- > > > Yadu Nand B > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > > > > -- Yadu Nand B -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Midway-Modis-UC3-Beagle Type: application/octet-stream Size: 13753 bytes Desc: not available URL: From hategan at mcs.anl.gov Wed Mar 20 15:59:54 2013 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 20 Mar 2013 13:59:54 -0700 Subject: [Swift-devel] MODIS freezes on Midway In-Reply-To: <1126490418.1755197.1363809914791.JavaMail.root@mcs.anl.gov> References: <1126490418.1755197.1363809914791.JavaMail.root@mcs.anl.gov> Message-ID: <1363813194.28485.7.camel@echo> On Wed, 2013-03-20 at 15:05 -0500, Michael Wilde wrote: > Yadu, I dont think the beagle test requires you to have any directories on beagle. We do the SWIFT_USERHOME thing on beagle, and that requires the modified user home to exist I think. Might be possible to attempt to create that directory automatically. Mihael From wilde at mcs.anl.gov Wed Mar 20 16:59:59 2013 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 20 Mar 2013 16:59:59 -0500 (CDT) Subject: [Swift-devel] [Swift-user] What system calls do the mappers use? In-Reply-To: <1363816337.14167.1.camel@echo> Message-ID: <1350534392.1832467.1363816799515.JavaMail.root@mcs.anl.gov> Did this all out of the User Guide or did it never get in there? Either way, could you (or better yet, Yadu) add it (back)? Just curious, what keyType values make sense? Not boolean or float, I assume. Not arrays, files, structures either, I assume? So really just int and strings are practical? Thanks, - Mike ----- Original Message ----- > From: "Mihael Hategan" > To: "Michael Wilde" > Cc: "Lorenzo Pesce" , "Swift User Discussion List" > Sent: Wednesday, March 20, 2013 4:52:17 PM > Subject: Re: [Swift-user] What system calls do the mappers use? > > They are in trunk. They should also be in 0.94. > > You declare them as: > valueType[keyType] arrayName; > For example: > int[string] a; > a["one"] = 1; > > Mihael > > On Wed, 2013-03-20 at 16:04 -0500, Michael Wilde wrote: > > Lorenzo, > > > > All Swift arrays are varying in size: you dont declare the array > > size in the declaration. Further, they can be sparse (because the > > implementation is in fact a hashtable). > > > > Swift has code that supports user-level hashes by by declaring > > arrays with string instead of integer keys. I thought this made it > > to the User Guide but I see now that it did not. Its > > possible/likely thats because the code is not in trunk yet. > > > > Can anyone on the devel team reply with the status of associative > > arrays? > > > > Thanks, > > > > - Mike > > > > ----- Original Message ----- > > > From: "Lorenzo Pesce" > > > To: "Michael Wilde" > > > Cc: "Swift User Discussion List" > > > Sent: Wednesday, March 20, 2013 3:43:19 PM > > > Subject: Re: [Swift-user] What system calls do the mappers use? > > > > > > Can one make hashes of arrays in or arrays of arrays of different > > > sizes in swift? > > > e.g., and array of an array type of variable size? > > > > > > On Mar 20, 2013, at 3:41 PM, Michael Wilde wrote: > > > > > > > > > > > Also, to answer your question more directly: "I dont know". > > > > You > > > > can try to answer this by writing some very simple swift > > > > scripts > > > > that do the kinds of built-in mappings you are looking at, and > > > > use > > > > strace() wuth suitable filtering and grepping do see what Swift > > > > (via Java) is doing to implement the mapping. > > > > > > > > Mihael may be able to point you to the Java classes that do the > > > > mapping to distill this process further. > > > > > > > > - Mike > > > > > > > > > > > > ----- Original Message ----- > > > >> From: "Lorenzo Pesce" > > > >> To: "Swift User Discussion List" > > > >> Sent: Wednesday, March 20, 2013 3:27:05 PM > > > >> Subject: [Swift-user] What system calls do the mappers use? > > > >> > > > >> Hi -- > > > >> > > > >> I am working with mappers that might be repeated thousands of > > > >> times > > > >> in each workflow run. > > > >> Lustre doesn't like that type of search when it is based on > > > >> approaches similar to "ls", on the other hand "find" works > > > >> fine. > > > >> > > > >> I could conceivably find a work around, but I would rather not > > > >> have > > > >> to do it. > > > >> > > > >> Lorenzo > > > >> _______________________________________________ > > > >> Swift-user mailing list > > > >> Swift-user at ci.uchicago.edu > > > >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > >> > > > > > > > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > From lpesce at uchicago.edu Wed Mar 20 18:24:24 2013 From: lpesce at uchicago.edu (Lorenzo Pesce) Date: Wed, 20 Mar 2013 18:24:24 -0500 Subject: [Swift-devel] [Swift-user] What system calls do the mappers use? In-Reply-To: <1350534392.1832467.1363816799515.JavaMail.root@mcs.anl.gov> References: <1350534392.1832467.1363816799515.JavaMail.root@mcs.anl.gov> Message-ID: Hashes are in the guide. You have a point, if they are all hashtables anyway, it makes little difference. I will try to work this into the code tomorrow. I will also figure out how to run a function passing to it an array input. BTW, the tests done so far with the new approach for the GS problem worked very well. We did a medium 740,000 job and it went through without a hitch. On Mar 20, 2013, at 4:59 PM, Michael Wilde wrote: > Did this all out of the User Guide or did it never get in there? > > Either way, could you (or better yet, Yadu) add it (back)? > > Just curious, what keyType values make sense? Not boolean or float, I assume. > > Not arrays, files, structures either, I assume? So really just int and strings are practical? > > Thanks, > > - Mike > > > ----- Original Message ----- >> From: "Mihael Hategan" >> To: "Michael Wilde" >> Cc: "Lorenzo Pesce" , "Swift User Discussion List" >> Sent: Wednesday, March 20, 2013 4:52:17 PM >> Subject: Re: [Swift-user] What system calls do the mappers use? >> >> They are in trunk. They should also be in 0.94. >> >> You declare them as: >> valueType[keyType] arrayName; >> For example: >> int[string] a; >> a["one"] = 1; >> >> Mihael >> >> On Wed, 2013-03-20 at 16:04 -0500, Michael Wilde wrote: >>> Lorenzo, >>> >>> All Swift arrays are varying in size: you dont declare the array >>> size in the declaration. Further, they can be sparse (because the >>> implementation is in fact a hashtable). >>> >>> Swift has code that supports user-level hashes by by declaring >>> arrays with string instead of integer keys. I thought this made it >>> to the User Guide but I see now that it did not. Its >>> possible/likely thats because the code is not in trunk yet. >>> >>> Can anyone on the devel team reply with the status of associative >>> arrays? >>> >>> Thanks, >>> >>> - Mike >>> >>> ----- Original Message ----- >>>> From: "Lorenzo Pesce" >>>> To: "Michael Wilde" >>>> Cc: "Swift User Discussion List" >>>> Sent: Wednesday, March 20, 2013 3:43:19 PM >>>> Subject: Re: [Swift-user] What system calls do the mappers use? >>>> >>>> Can one make hashes of arrays in or arrays of arrays of different >>>> sizes in swift? >>>> e.g., and array of an array type of variable size? >>>> >>>> On Mar 20, 2013, at 3:41 PM, Michael Wilde wrote: >>>> >>>>> >>>>> Also, to answer your question more directly: "I dont know". >>>>> You >>>>> can try to answer this by writing some very simple swift >>>>> scripts >>>>> that do the kinds of built-in mappings you are looking at, and >>>>> use >>>>> strace() wuth suitable filtering and grepping do see what Swift >>>>> (via Java) is doing to implement the mapping. >>>>> >>>>> Mihael may be able to point you to the Java classes that do the >>>>> mapping to distill this process further. >>>>> >>>>> - Mike >>>>> >>>>> >>>>> ----- Original Message ----- >>>>>> From: "Lorenzo Pesce" >>>>>> To: "Swift User Discussion List" >>>>>> Sent: Wednesday, March 20, 2013 3:27:05 PM >>>>>> Subject: [Swift-user] What system calls do the mappers use? >>>>>> >>>>>> Hi -- >>>>>> >>>>>> I am working with mappers that might be repeated thousands of >>>>>> times >>>>>> in each workflow run. >>>>>> Lustre doesn't like that type of search when it is based on >>>>>> approaches similar to "ls", on the other hand "find" works >>>>>> fine. >>>>>> >>>>>> I could conceivably find a work around, but I would rather not >>>>>> have >>>>>> to do it. >>>>>> >>>>>> Lorenzo >>>>>> _______________________________________________ >>>>>> Swift-user mailing list >>>>>> Swift-user at ci.uchicago.edu >>>>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >>>>>> >>>> >>>> >>> _______________________________________________ >>> Swift-user mailing list >>> Swift-user at ci.uchicago.edu >>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >> >> >> > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel From wilde at mcs.anl.gov Wed Mar 20 21:40:06 2013 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 20 Mar 2013 21:40:06 -0500 (CDT) Subject: [Swift-devel] [Swift-user] What system calls do the mappers use? In-Reply-To: Message-ID: <1464783846.4609.1363833606328.JavaMail.root@mcs.anl.gov> > Hashes are in the guide. Ah, then I wasn't imagining things! Turns out that they are documented in the 0.94 user guide but not in the trunk users guide. I wonder if there's been any other drift between these two User Guide versions (ie other things missing form the Trunk version)? - Mike From yadudoc1729 at gmail.com Thu Mar 21 12:31:08 2013 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Thu, 21 Mar 2013 23:01:08 +0530 Subject: [Swift-devel] MODIS freezes on Midway In-Reply-To: <1363813194.28485.7.camel@echo> References: <1126490418.1755197.1363809914791.JavaMail.root@mcs.anl.gov> <1363813194.28485.7.camel@echo> Message-ID: @Ketan, I believe I spoke too early yesterday about /lustre/beagle/yadunand folder. The error, No such file or directory went away after yadunand/swiftwork directory was created manually. @Mihael, I think it would really help if the directory creation could be handled by swift. Yesterday, I tried some things David asked me to try setting the following : GLOBUS_HOSTNAME=128.135.112.73 GLOBUS_TCP_PORT_RANGE=50000,51000 The issue here as I understood it, is that, the workers on beagle try to connect back to midway to access the files, and the IP resolves to internal IP which it can't access. This is happening despite setting the GLOBUS_HOSTNAME explicitly. Even the port range that I've set does not seem to be used: Failed to start channel GSSCChannel-https://192.5.86.107:60851(2)[ https://192.5.86.107:60851] I'm also attaching a log from the beagle run. The files used for the run are on midway /home/yadunand/swiftdemo-trunk/test.beagle -Yadu On Thu, Mar 21, 2013 at 2:29 AM, Mihael Hategan wrote: > On Wed, 2013-03-20 at 15:05 -0500, Michael Wilde wrote: > > Yadu, I dont think the beagle test requires you to have any directories > on beagle. > > We do the SWIFT_USERHOME thing on beagle, and that requires the modified > user home to exist I think. > > Might be possible to attempt to create that directory automatically. > > Mihael > > -- Yadu Nand B -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketancmaheshwari at gmail.com Thu Mar 21 12:46:10 2013 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Thu, 21 Mar 2013 12:46:10 -0500 Subject: [Swift-devel] MODIS freezes on Midway In-Reply-To: References: <1126490418.1755197.1363809914791.JavaMail.root@mcs.anl.gov> <1363813194.28485.7.camel@echo> Message-ID: Yadu, netcat/nc has been helpful in the past resolving network/port connectivity issues. I tried to write to midway port 50000 from beagle which worked: on midway: $ nc -l 50000 on beagle: $ netcat 128.135.112.71 50000 hello from beagle Gives on midway: $ nc -l 50000 hello from beagle Hope this helps with debugging. On Thu, Mar 21, 2013 at 12:31 PM, Yadu Nand wrote: > @Ketan, I believe I spoke too early yesterday about > /lustre/beagle/yadunand folder. > The error, No such file or directory went away after yadunand/swiftwork > directory was > created manually. > @Mihael, I think it would really help if the directory creation could be > handled by swift. > > Yesterday, I tried some things David asked me to try setting the following > : > GLOBUS_HOSTNAME=128.135.112.73 > GLOBUS_TCP_PORT_RANGE=50000,51000 > The issue here as I understood it, is that, the workers on beagle try to > connect back > to midway to access the files, and the IP resolves to internal IP which it > can't access. > This is happening despite setting the GLOBUS_HOSTNAME explicitly. Even the > port > range that I've set does not seem to be used: > > Failed to start channel GSSCChannel-https://192.5.86.107:60851(2)[ > https://192.5.86.107:60851] > > I'm also attaching a log from the beagle run. > The files used for the run are on > midway /home/yadunand/swiftdemo-trunk/test.beagle > > -Yadu > > On Thu, Mar 21, 2013 at 2:29 AM, Mihael Hategan wrote: > >> On Wed, 2013-03-20 at 15:05 -0500, Michael Wilde wrote: >> > Yadu, I dont think the beagle test requires you to have any directories >> on beagle. >> >> We do the SWIFT_USERHOME thing on beagle, and that requires the modified >> user home to exist I think. >> >> Might be possible to attempt to create that directory automatically. >> >> Mihael >> >> > > > -- > Yadu Nand B > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.g.armstrong at gmail.com Fri Mar 22 13:48:10 2013 From: tim.g.armstrong at gmail.com (Tim Armstrong) Date: Fri, 22 Mar 2013 13:48:10 -0500 Subject: [Swift-devel] Weekly training sessions In-Reply-To: <605535474.2169579.1362138442159.JavaMail.root@mcs.anl.gov> References: <605535474.2169579.1362138442159.JavaMail.root@mcs.anl.gov> Message-ID: I'm heading over to Searle 240b now to give an intro to Swift/T, ADLB and MPI. Is anyone joining remotely? - Tim On Fri, Mar 1, 2013 at 5:47 AM, Justin M Wozniak wrote: > > Great! I will set up a poll for the time. We'll work something out for > screen sharing and/or recording. > Justin > > ----- Original Message ----- > From: "Yadu Nand" > To: "Tim Armstrong" > Cc: "Glen Hocky" , "Swift Devel" < > swift-devel at ci.uchicago.edu> > Sent: Friday, March 1, 2013 2:23:30 AM > Subject: Re: [Swift-devel] Weekly training sessions > > > Could we please, have the these over live/recorded-video ? > > -Yadu > > > On Fri, Mar 1, 2013 at 4:49 AM, Tim Armstrong < tim.g.armstrong at gmail.com> wrote: > > > I could probably put together a run-through of Swift/T at some point - a > high level overview of the compiler/runtime stack, talk about some of the > current limitations plus some of the language features we've been playing > with. > > It would be good to communicate more with you guys so we can get each > other more up to speed on the state of play, and maybe think through future > directions for development. > > - Tim > > > > > On Thu, Feb 28, 2013 at 4:46 PM, Glen Hocky < hockyg at gmail.com > wrote: > > > > As a remote outsider, I'd be interested in watching a video (live or > recorded, but i guess pref both?) > -Glen > > > > > > On Thu, Feb 28, 2013 at 5:44 PM, Scott Krieder < skrieder at iit.edu > wrote: > > > As an outsider (if I get a vote) I would definitely be interested in > attending. > > > -Scott > > > > > On Thu, Feb 28, 2013 at 3:08 PM, David Kelly < davidk at ci.uchicago.edu > > wrote: > > > > > > > Hello, > > > I was thinking it might be a good time to bring back the weekly > developer/training sessions we used to have. I always found them very > useful. I think Justin has a list of future topics somewhere (which I can't > seem to find at the moment), and I'm sure there are a lot of new things to > discuss as well. Some topics that come to mind: > > > Swift-T tutorial > An overview of coaster configurations (automatic/passive/persistent/etc) > MPI > Modis > Understanding mappers > Methods for approaching file I/O > Running swift on EC2 > Running swift on OSG > > > Maybe we could alternate who gives the training each week so that no one > developer has to spend too much time on it. > > > Any interest? > > > > > > > David Michael Kelly > Systems Programmer > University of Chicago Computation Institute > 5735 S. Ellis Ave. Chicago, IL 60637 > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > -- > Scott J. Krieder > > > C: 419-685-0410 > > E: skrieder at iit.edu > > http://datasys.cs.iit.edu/~skrieder/ > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > -- > Thanks and Regards, > Yadu Nand B > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketancmaheshwari at gmail.com Fri Mar 22 13:53:01 2013 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Fri, 22 Mar 2013 13:53:01 -0500 Subject: [Swift-devel] Weekly training sessions In-Reply-To: References: <605535474.2169579.1362138442159.JavaMail.root@mcs.anl.gov> Message-ID: Yes, me from Argonne. Is this going to be broadcast? On Fri, Mar 22, 2013 at 1:48 PM, Tim Armstrong wrote: > I'm heading over to Searle 240b now to give an intro to Swift/T, ADLB and > MPI. > > Is anyone joining remotely? > > - Tim > > > On Fri, Mar 1, 2013 at 5:47 AM, Justin M Wozniak wrote: > >> >> Great! I will set up a poll for the time. We'll work something out for >> screen sharing and/or recording. >> Justin >> >> ----- Original Message ----- >> From: "Yadu Nand" >> To: "Tim Armstrong" >> Cc: "Glen Hocky" , "Swift Devel" < >> swift-devel at ci.uchicago.edu> >> Sent: Friday, March 1, 2013 2:23:30 AM >> Subject: Re: [Swift-devel] Weekly training sessions >> >> >> Could we please, have the these over live/recorded-video ? >> >> -Yadu >> >> >> On Fri, Mar 1, 2013 at 4:49 AM, Tim Armstrong < tim.g.armstrong at gmail.com> wrote: >> >> >> I could probably put together a run-through of Swift/T at some point - a >> high level overview of the compiler/runtime stack, talk about some of the >> current limitations plus some of the language features we've been playing >> with. >> >> It would be good to communicate more with you guys so we can get each >> other more up to speed on the state of play, and maybe think through future >> directions for development. >> >> - Tim >> >> >> >> >> On Thu, Feb 28, 2013 at 4:46 PM, Glen Hocky < hockyg at gmail.com > wrote: >> >> >> >> As a remote outsider, I'd be interested in watching a video (live or >> recorded, but i guess pref both?) >> -Glen >> >> >> >> >> >> On Thu, Feb 28, 2013 at 5:44 PM, Scott Krieder < skrieder at iit.edu > >> wrote: >> >> >> As an outsider (if I get a vote) I would definitely be interested in >> attending. >> >> >> -Scott >> >> >> >> >> On Thu, Feb 28, 2013 at 3:08 PM, David Kelly < davidk at ci.uchicago.edu > >> wrote: >> >> >> >> >> >> >> Hello, >> >> >> I was thinking it might be a good time to bring back the weekly >> developer/training sessions we used to have. I always found them very >> useful. I think Justin has a list of future topics somewhere (which I can't >> seem to find at the moment), and I'm sure there are a lot of new things to >> discuss as well. Some topics that come to mind: >> >> >> Swift-T tutorial >> An overview of coaster configurations (automatic/passive/persistent/etc) >> MPI >> Modis >> Understanding mappers >> Methods for approaching file I/O >> Running swift on EC2 >> Running swift on OSG >> >> >> Maybe we could alternate who gives the training each week so that no one >> developer has to spend too much time on it. >> >> >> Any interest? >> >> >> >> >> >> >> David Michael Kelly >> Systems Programmer >> University of Chicago Computation Institute >> 5735 S. Ellis Ave. Chicago, IL 60637 >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel >> >> >> >> >> >> -- >> Scott J. Krieder >> >> >> C: 419-685-0410 >> >> E: skrieder at iit.edu >> >> http://datasys.cs.iit.edu/~skrieder/ >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel >> >> >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel >> >> >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel >> >> >> >> >> >> -- >> Thanks and Regards, >> Yadu Nand B >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel >> > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidk at ci.uchicago.edu Fri Mar 22 13:53:59 2013 From: davidk at ci.uchicago.edu (David Kelly) Date: Fri, 22 Mar 2013 13:53:59 -0500 (CDT) Subject: [Swift-devel] Weekly training sessions In-Reply-To: Message-ID: <880889204.684786.1363978439155.JavaMail.root@ci.uchicago.edu> I'd like to join remotely.. is screen sharing possible? ----- Original Message ----- > From: "Tim Armstrong" > To: "Justin M Wozniak" > Cc: "Glen Hocky" , "Swift Devel" > > Sent: Friday, March 22, 2013 1:48:10 PM > Subject: Re: [Swift-devel] Weekly training sessions > I'm heading over to Searle 240b now to give an intro to Swift/T, ADLB > and MPI. > Is anyone joining remotely? > - Tim > On Fri, Mar 1, 2013 at 5:47 AM, Justin M Wozniak < > wozniak at mcs.anl.gov > wrote: > > Great! I will set up a poll for the time. We'll work something out > > for screen sharing and/or recording. > > > Justin > > > ----- Original Message ----- > > > From: "Yadu Nand" < yadudoc1729 at gmail.com > > > > To: "Tim Armstrong" < tim.g.armstrong at gmail.com > > > > Cc: "Glen Hocky" < hockyg at gmail.com >, "Swift Devel" < > > swift-devel at ci.uchicago.edu > > > > Sent: Friday, March 1, 2013 2:23:30 AM > > > Subject: Re: [Swift-devel] Weekly training sessions > > > Could we please, have the these over live/recorded-video ? > > > -Yadu > > > On Fri, Mar 1, 2013 at 4:49 AM, Tim Armstrong < > > tim.g.armstrong at gmail.com > wrote: > > > I could probably put together a run-through of Swift/T at some > > point > > - a high level overview of the compiler/runtime stack, talk about > > some of the current limitations plus some of the language features > > we've been playing with. > > > It would be good to communicate more with you guys so we can get > > each > > other more up to speed on the state of play, and maybe think > > through > > future directions for development. > > > - Tim > > > On Thu, Feb 28, 2013 at 4:46 PM, Glen Hocky < hockyg at gmail.com > > > wrote: > > > As a remote outsider, I'd be interested in watching a video (live > > or > > recorded, but i guess pref both?) > > > -Glen > > > On Thu, Feb 28, 2013 at 5:44 PM, Scott Krieder < skrieder at iit.edu > > > wrote: > > > As an outsider (if I get a vote) I would definitely be interested > > in > > attending. > > > -Scott > > > On Thu, Feb 28, 2013 at 3:08 PM, David Kelly < > > davidk at ci.uchicago.edu > > > wrote: > > > Hello, > > > I was thinking it might be a good time to bring back the weekly > > developer/training sessions we used to have. I always found them > > very useful. I think Justin has a list of future topics somewhere > > (which I can't seem to find at the moment), and I'm sure there are > > a > > lot of new things to discuss as well. Some topics that come to > > mind: > > > Swift-T tutorial > > > An overview of coaster configurations > > (automatic/passive/persistent/etc) > > > MPI > > > Modis > > > Understanding mappers > > > Methods for approaching file I/O > > > Running swift on EC2 > > > Running swift on OSG > > > Maybe we could alternate who gives the training each week so that > > no > > one developer has to spend too much time on it. > > > Any interest? > > > David Michael Kelly > > > Systems Programmer > > > University of Chicago Computation Institute > > > 5735 S. Ellis Ave. Chicago, IL 60637 > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > -- > > > Scott J. Krieder > > > C: 419-685-0410 > > > E: skrieder at iit.edu > > > http://datasys.cs.iit.edu/~skrieder/ > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > -- > > > Thanks and Regards, > > > Yadu Nand B > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.g.armstrong at gmail.com Fri Mar 22 14:03:51 2013 From: tim.g.armstrong at gmail.com (Tim Armstrong) Date: Fri, 22 Mar 2013 14:03:51 -0500 Subject: [Swift-devel] Weekly training sessions In-Reply-To: <880889204.684786.1363978439155.JavaMail.root@ci.uchicago.edu> References: <880889204.684786.1363978439155.JavaMail.root@ci.uchicago.edu> Message-ID: I'm going to try join.me The code is https://join.me/358-098-242 On Fri, Mar 22, 2013 at 1:53 PM, David Kelly wrote: > I'd like to join remotely.. is screen sharing possible? > > ------------------------------ > > *From: *"Tim Armstrong" > *To: *"Justin M Wozniak" > > *Cc: *"Glen Hocky" , "Swift Devel" < > swift-devel at ci.uchicago.edu> > *Sent: *Friday, March 22, 2013 1:48:10 PM > > *Subject: *Re: [Swift-devel] Weekly training sessions > > I'm heading over to Searle 240b now to give an intro to Swift/T, ADLB and > MPI. > > Is anyone joining remotely? > > - Tim > > On Fri, Mar 1, 2013 at 5:47 AM, Justin M Wozniak wrote: > >> >> Great! I will set up a poll for the time. We'll work something out for >> screen sharing and/or recording. >> Justin >> >> ----- Original Message ----- >> From: "Yadu Nand" >> To: "Tim Armstrong" >> Cc: "Glen Hocky" , "Swift Devel" < >> swift-devel at ci.uchicago.edu> >> Sent: Friday, March 1, 2013 2:23:30 AM >> Subject: Re: [Swift-devel] Weekly training sessions >> >> >> Could we please, have the these over live/recorded-video ? >> >> -Yadu >> >> >> On Fri, Mar 1, 2013 at 4:49 AM, Tim Armstrong < tim.g.armstrong at gmail.com> wrote: >> >> >> I could probably put together a run-through of Swift/T at some point - a >> high level overview of the compiler/runtime stack, talk about some of the >> current limitations plus some of the language features we've been playing >> with. >> >> It would be good to communicate more with you guys so we can get each >> other more up to speed on the state of play, and maybe think through future >> directions for development. >> >> - Tim >> >> >> >> >> On Thu, Feb 28, 2013 at 4:46 PM, Glen Hocky < hockyg at gmail.com > wrote: >> >> >> >> As a remote outsider, I'd be interested in watching a video (live or >> recorded, but i guess pref both?) >> -Glen >> >> >> >> >> >> On Thu, Feb 28, 2013 at 5:44 PM, Scott Krieder < skrieder at iit.edu > >> wrote: >> >> >> As an outsider (if I get a vote) I would definitely be interested in >> attending. >> >> >> -Scott >> >> >> >> >> On Thu, Feb 28, 2013 at 3:08 PM, David Kelly < davidk at ci.uchicago.edu > >> wrote: >> >> >> >> >> >> >> Hello, >> >> >> I was thinking it might be a good time to bring back the weekly >> developer/training sessions we used to have. I always found them very >> useful. I think Justin has a list of future topics somewhere (which I can't >> seem to find at the moment), and I'm sure there are a lot of new things to >> discuss as well. Some topics that come to mind: >> >> >> Swift-T tutorial >> An overview of coaster configurations (automatic/passive/persistent/etc) >> MPI >> Modis >> Understanding mappers >> Methods for approaching file I/O >> Running swift on EC2 >> Running swift on OSG >> >> >> Maybe we could alternate who gives the training each week so that no one >> developer has to spend too much time on it. >> >> >> Any interest? >> >> >> >> >> >> >> David Michael Kelly >> Systems Programmer >> University of Chicago Computation Institute >> 5735 S. Ellis Ave. Chicago, IL 60637 >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel >> >> >> >> >> >> -- >> Scott J. Krieder >> >> >> C: 419-685-0410 >> >> E: skrieder at iit.edu >> >> http://datasys.cs.iit.edu/~skrieder/ >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel >> >> >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel >> >> >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel >> >> >> >> >> >> -- >> Thanks and Regards, >> Yadu Nand B >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel >> > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lpesce at uchicago.edu Fri Mar 22 14:31:40 2013 From: lpesce at uchicago.edu (Lorenzo Pesce) Date: Fri, 22 Mar 2013 14:31:40 -0500 Subject: [Swift-devel] [Swift-user] What system calls do the mappers use? In-Reply-To: <1350534392.1832467.1363816799515.JavaMail.root@mcs.anl.gov> References: <1350534392.1832467.1363816799515.JavaMail.root@mcs.anl.gov> Message-ID: > Just curious, what keyType values make sense? Not boolean or float, I assume. They are strings, used later to locate files. Basically there is a Sample (which is a structure), which among its parts has an array type Sample{ string sampleID; string dir; string [] RGfiles ; } I would then read it via a readData run (I am figuring out how to do that when the last thing to be read is an array whose length is unknown, but one problem at a time;-)) > Not arrays, files, structures either, I assume? So really just int and strings are practical? The ints, function calls and floats will come in later ;-) > > Thanks, > > - Mike > > > ----- Original Message ----- >> From: "Mihael Hategan" >> To: "Michael Wilde" >> Cc: "Lorenzo Pesce" , "Swift User Discussion List" >> Sent: Wednesday, March 20, 2013 4:52:17 PM >> Subject: Re: [Swift-user] What system calls do the mappers use? >> >> They are in trunk. They should also be in 0.94. >> >> You declare them as: >> valueType[keyType] arrayName; >> For example: >> int[string] a; >> a["one"] = 1; >> >> Mihael >> >> On Wed, 2013-03-20 at 16:04 -0500, Michael Wilde wrote: >>> Lorenzo, >>> >>> All Swift arrays are varying in size: you dont declare the array >>> size in the declaration. Further, they can be sparse (because the >>> implementation is in fact a hashtable). >>> >>> Swift has code that supports user-level hashes by by declaring >>> arrays with string instead of integer keys. I thought this made it >>> to the User Guide but I see now that it did not. Its >>> possible/likely thats because the code is not in trunk yet. >>> >>> Can anyone on the devel team reply with the status of associative >>> arrays? >>> >>> Thanks, >>> >>> - Mike >>> >>> ----- Original Message ----- >>>> From: "Lorenzo Pesce" >>>> To: "Michael Wilde" >>>> Cc: "Swift User Discussion List" >>>> Sent: Wednesday, March 20, 2013 3:43:19 PM >>>> Subject: Re: [Swift-user] What system calls do the mappers use? >>>> >>>> Can one make hashes of arrays in or arrays of arrays of different >>>> sizes in swift? >>>> e.g., and array of an array type of variable size? >>>> >>>> On Mar 20, 2013, at 3:41 PM, Michael Wilde wrote: >>>> >>>>> >>>>> Also, to answer your question more directly: "I dont know". >>>>> You >>>>> can try to answer this by writing some very simple swift >>>>> scripts >>>>> that do the kinds of built-in mappings you are looking at, and >>>>> use >>>>> strace() wuth suitable filtering and grepping do see what Swift >>>>> (via Java) is doing to implement the mapping. >>>>> >>>>> Mihael may be able to point you to the Java classes that do the >>>>> mapping to distill this process further. >>>>> >>>>> - Mike >>>>> >>>>> >>>>> ----- Original Message ----- >>>>>> From: "Lorenzo Pesce" >>>>>> To: "Swift User Discussion List" >>>>>> Sent: Wednesday, March 20, 2013 3:27:05 PM >>>>>> Subject: [Swift-user] What system calls do the mappers use? >>>>>> >>>>>> Hi -- >>>>>> >>>>>> I am working with mappers that might be repeated thousands of >>>>>> times >>>>>> in each workflow run. >>>>>> Lustre doesn't like that type of search when it is based on >>>>>> approaches similar to "ls", on the other hand "find" works >>>>>> fine. >>>>>> >>>>>> I could conceivably find a work around, but I would rather not >>>>>> have >>>>>> to do it. >>>>>> >>>>>> Lorenzo >>>>>> _______________________________________________ >>>>>> Swift-user mailing list >>>>>> Swift-user at ci.uchicago.edu >>>>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >>>>>> >>>> >>>> >>> _______________________________________________ >>> Swift-user mailing list >>> Swift-user at ci.uchicago.edu >>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >> >> >> > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel From tim.g.armstrong at gmail.com Fri Mar 22 15:15:49 2013 From: tim.g.armstrong at gmail.com (Tim Armstrong) Date: Fri, 22 Mar 2013 15:15:49 -0500 Subject: [Swift-devel] Weekly training sessions In-Reply-To: References: <880889204.684786.1363978439155.JavaMail.root@ci.uchicago.edu> Message-ID: Here are the slides and the code examples I used for the presentation. Cheers, Tim On Fri, Mar 22, 2013 at 2:03 PM, Tim Armstrong wrote: > I'm going to try join.me > The code is https://join.me/358-098-242 > > > On Fri, Mar 22, 2013 at 1:53 PM, David Kelly wrote: > >> I'd like to join remotely.. is screen sharing possible? >> >> ------------------------------ >> >> *From: *"Tim Armstrong" >> *To: *"Justin M Wozniak" >> >> *Cc: *"Glen Hocky" , "Swift Devel" < >> swift-devel at ci.uchicago.edu> >> *Sent: *Friday, March 22, 2013 1:48:10 PM >> >> *Subject: *Re: [Swift-devel] Weekly training sessions >> >> I'm heading over to Searle 240b now to give an intro to Swift/T, ADLB and >> MPI. >> >> Is anyone joining remotely? >> >> - Tim >> >> On Fri, Mar 1, 2013 at 5:47 AM, Justin M Wozniak wrote: >> >>> >>> Great! I will set up a poll for the time. We'll work something out for >>> screen sharing and/or recording. >>> Justin >>> >>> ----- Original Message ----- >>> From: "Yadu Nand" >>> To: "Tim Armstrong" >>> Cc: "Glen Hocky" , "Swift Devel" < >>> swift-devel at ci.uchicago.edu> >>> Sent: Friday, March 1, 2013 2:23:30 AM >>> Subject: Re: [Swift-devel] Weekly training sessions >>> >>> >>> Could we please, have the these over live/recorded-video ? >>> >>> -Yadu >>> >>> >>> On Fri, Mar 1, 2013 at 4:49 AM, Tim Armstrong < >>> tim.g.armstrong at gmail.com > wrote: >>> >>> >>> I could probably put together a run-through of Swift/T at some point - a >>> high level overview of the compiler/runtime stack, talk about some of the >>> current limitations plus some of the language features we've been playing >>> with. >>> >>> It would be good to communicate more with you guys so we can get each >>> other more up to speed on the state of play, and maybe think through future >>> directions for development. >>> >>> - Tim >>> >>> >>> >>> >>> On Thu, Feb 28, 2013 at 4:46 PM, Glen Hocky < hockyg at gmail.com > wrote: >>> >>> >>> >>> As a remote outsider, I'd be interested in watching a video (live or >>> recorded, but i guess pref both?) >>> -Glen >>> >>> >>> >>> >>> >>> On Thu, Feb 28, 2013 at 5:44 PM, Scott Krieder < skrieder at iit.edu > >>> wrote: >>> >>> >>> As an outsider (if I get a vote) I would definitely be interested in >>> attending. >>> >>> >>> -Scott >>> >>> >>> >>> >>> On Thu, Feb 28, 2013 at 3:08 PM, David Kelly < davidk at ci.uchicago.edu > >>> wrote: >>> >>> >>> >>> >>> >>> >>> Hello, >>> >>> >>> I was thinking it might be a good time to bring back the weekly >>> developer/training sessions we used to have. I always found them very >>> useful. I think Justin has a list of future topics somewhere (which I can't >>> seem to find at the moment), and I'm sure there are a lot of new things to >>> discuss as well. Some topics that come to mind: >>> >>> >>> Swift-T tutorial >>> An overview of coaster configurations (automatic/passive/persistent/etc) >>> MPI >>> Modis >>> Understanding mappers >>> Methods for approaching file I/O >>> Running swift on EC2 >>> Running swift on OSG >>> >>> >>> Maybe we could alternate who gives the training each week so that no one >>> developer has to spend too much time on it. >>> >>> >>> Any interest? >>> >>> >>> >>> >>> >>> >>> David Michael Kelly >>> Systems Programmer >>> University of Chicago Computation Institute >>> 5735 S. Ellis Ave. Chicago, IL 60637 >>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel >>> >>> >>> >>> >>> >>> -- >>> Scott J. Krieder >>> >>> >>> C: 419-685-0410 >>> >>> E: skrieder at iit.edu >>> >>> http://datasys.cs.iit.edu/~skrieder/ >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel >>> >>> >>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel >>> >>> >>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel >>> >>> >>> >>> >>> >>> -- >>> Thanks and Regards, >>> Yadu Nand B >>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel >>> >> >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: swiftt-devel-demo.zip Type: application/zip Size: 4539 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: swift-t-pres.pdf Type: application/pdf Size: 479390 bytes Desc: not available URL: From yadudoc1729 at gmail.com Tue Mar 26 11:18:20 2013 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Tue, 26 Mar 2013 21:48:20 +0530 Subject: [Swift-devel] MODIS freezes on Midway In-Reply-To: References: <1126490418.1755197.1363809914791.JavaMail.root@mcs.anl.gov> <1363813194.28485.7.camel@echo> Message-ID: Hi, I think I've found the reason for the modis tests failing on beagle. I had no default project set on beagle. The jobs go through once I've set the projects to : CI-CCR000013 Michael Wilde The Swift Parallel Scripting System -Thanks, Yadu On Thu, Mar 21, 2013 at 11:16 PM, Ketan Maheshwari < ketancmaheshwari at gmail.com> wrote: > Yadu, > > netcat/nc has been helpful in the past resolving network/port connectivity > issues. I tried to write to midway port 50000 from beagle which worked: > > on midway: > > $ nc -l 50000 > > on beagle: > > $ netcat 128.135.112.71 50000 > hello from beagle > > Gives on midway: > $ nc -l 50000 > hello from beagle > > Hope this helps with debugging. > > > On Thu, Mar 21, 2013 at 12:31 PM, Yadu Nand wrote: > >> @Ketan, I believe I spoke too early yesterday about >> /lustre/beagle/yadunand folder. >> The error, No such file or directory went away after yadunand/swiftwork >> directory was >> created manually. >> @Mihael, I think it would really help if the directory creation could be >> handled by swift. >> >> Yesterday, I tried some things David asked me to try setting the >> following : >> GLOBUS_HOSTNAME=128.135.112.73 >> GLOBUS_TCP_PORT_RANGE=50000,51000 >> The issue here as I understood it, is that, the workers on beagle try to >> connect back >> to midway to access the files, and the IP resolves to internal IP which >> it can't access. >> This is happening despite setting the GLOBUS_HOSTNAME explicitly. Even >> the port >> range that I've set does not seem to be used: >> >> Failed to start channel GSSCChannel-https://192.5.86.107:60851(2)[ >> https://192.5.86.107:60851] >> >> I'm also attaching a log from the beagle run. >> The files used for the run are on >> midway /home/yadunand/swiftdemo-trunk/test.beagle >> >> -Yadu >> >> On Thu, Mar 21, 2013 at 2:29 AM, Mihael Hategan wrote: >> >>> On Wed, 2013-03-20 at 15:05 -0500, Michael Wilde wrote: >>> > Yadu, I dont think the beagle test requires you to have any >>> directories on beagle. >>> >>> We do the SWIFT_USERHOME thing on beagle, and that requires the modified >>> user home to exist I think. >>> >>> Might be possible to attempt to create that directory automatically. >>> >>> Mihael >>> >>> >> >> >> -- >> Yadu Nand B >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel >> >> > > > -- > Ketan > > -- Yadu Nand B -------------- next part -------------- An HTML attachment was scrubbed... URL: From iraicu at cs.iit.edu Fri Mar 29 23:19:55 2013 From: iraicu at cs.iit.edu (Ioan Raicu) Date: Fri, 29 Mar 2013 23:19:55 -0500 Subject: [Swift-devel] CALL FOR PARTICIPATION: IEEE/ACM CCGrid 2013 Message-ID: <515667EB.90406@cs.iit.edu> **** CALL FOR PARTICIPATION **** *********************************************************** *** *** EARLY REGISTRATION DEADLINE: April 22, 2013 *** *** *********************************************************** and **** CALL FOR POSTERS **** The 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2013) Delft University of Technology, Delft, the Netherlands May 13-16, 2013 http://www.pds.ewi.tudelft.nl/ccgrid2013 CCGrid is a series of very successful conferences, sponsored by the IEEE Computer Society Technical Committee on Scalable Computing (TCSC) and the ACM, with the overarching goal of bringing together international researchers, developers, and users to provide an international forum to present leading research activities and results on a broad range of topics related to clusters, grids and clouds and their applications. **** VENUE **** The CCGrid 2013 conference will be held on the campus of Delft University of Technology, which was founded in 1842 by King William II and which is the oldest and largest technical university in the Netherlands. It is well established as one of the leading technical universities in the world. Delft is a small, historical town dating back to the 13th century. Delft has many old buildings and small canals, and it has a lively atmosphere. The city offers a large variety of hotels and restaurants. Many other places of interest (e.g., Amsterdam and The Hague) are within one hour distance of traveling. Traveling to Delft is easy. Delft is close to Amsterdam Schiphol Airport (60 km, 45 min by train), which has direct connections from all major airports in the world. Delft also has excellent train connections to the rest of Europe. **** HIGHLIGHTS OF THE CONFERENCE PROGRAM **** - A keynote by the winner of the IEEE Award for Excellence in Scalable Computing Speaker: Marc Snir, Argonne National Laboratory and University of Illinois at Urbana-Champaign, USA Title: Programming Models for High-Performance Computing - Two additional keynote speakers: * Speaker: Simon Portegies Zwart, Leiden University, the Netherlands Title: The Astronomical Multipurpose Software Environment and the Ecology of Star Clusters * Speaker: Daniel A. Reed, University of Iowa, USA Title: Clusters, Grids and Clouds: A Look from Both Sides - Four workshops and three tutorials on Monday, May 13 - 14 technical paper sessions - A poster presentation and a poster session plus reception - A panel on Cloud Computing - A conference dinner on Wednesday, May 15 **** CALL FOR POSTERS **** CCGrid 2013 offers conference attendees the opportunity to participate in the poster session on Tuesday afternoon. For details on how to submit a poster, please consult the conference website (look for web-published posters). The submission deadline is April 15, 2013. **** GENERAL CHAIR **** Dick Epema, Delft University of Technology, the Netherlands **** PROGRAM CHAIR **** Thomas Fahringer, University of Innsbruck, Austria **** PROGRAM VICE-CHAIRS **** Rosa Badia, Barcelona Supercomputing Center, Spain Henri Bal, Vrije Universiteit, the Netherlands Marios Dikaiakos, University of Cyprus, Cyprus Kirk Cameron, VirginiaTech, USA Daniel Katz, University of Chicago & Argonne Nat Lab, USA Kate Keahey, Argonne National Laboratory, USA Martin Schulz, Lawrence Livermore National Laboratory, USA Douglas Thain, University of Notre Dame, USA Cheng-Zhong Xu, Shenzhen Inst. of Advanced Techn, China **** DOCTORAL SYMPOSIUM CO-CHAIRS **** Yogesh Simmhan, University of Southern California, USA Ana Varbanescu, Delft University of Technology, the Netherlands **** SCALE CHALLENGE CO-CHAIRS **** Alexandru Iosup, Delft University of Technology, the Netherlands Douglas Thain, Notre-Dame University, USA **** POSTERS CHAIR **** Rob van Nieuwpoort, Netherlands eScience Center, the Netherlands **** WORKSHOPS CO-CHAIRS **** Shantenu Jha, Rutgers and Louisana State University, USA Ioan Raicu, Illinois Institute of Technology, USA **** TOTORIALS CHAIR **** Radu Prodan, University of Innsbruck, Austria **** SUBMISSIONS AND PROCEEDINGS CHAIR **** Pavan Balaji, Argonne National Laboratory, USA **** FINANCE AND REGISTRATION CHAIR **** Alexandru Iosup, Delft University of Technology, the Netherlands **** PUBLICITY CO-CHAIRS **** Nazareno Andrade, University Federal de Campina Grance, Brazil Gabriel Antoniu, INRIA, France Bahman Javadi, University of Western Sysney, Australia Ioan Raicu, Illinois Institute of Technology and Argonne National Laboratory, USA Kin Choong Yow, Shenzhen Inst. of Advanced Technology, China **** CYBER CHAIR **** Stephen van der Laan, Delft University of Technology, the Netherlands **** LOCAL ARRANGEMENTS **** Esther van Rooijen, Delft University of Technology, the Netherlands -- ================================================================= Ioan Raicu, Ph.D. Assistant Professor, Illinois Institute of Technology (IIT) Guest Research Faculty, Argonne National Laboratory (ANL) ================================================================= Data-Intensive Distributed Systems Laboratory, CS/IIT Distributed Systems Laboratory, MCS/ANL ================================================================= Editor: IEEE TCC, Springer JoCCASA Chair: IEEE/ACM MTAGS, ACM ScienceCloud, IEEE/ACM DataCloud ================================================================= Cel: 1-847-722-0876 Office: 1-312-567-5704 Email: iraicu at cs.iit.edu Web: http://www.cs.iit.edu/~iraicu/ Web: http://datasys.cs.iit.edu/ LinkedIn: http://www.linkedin.com/in/ioanraicu Google: http://scholar.google.com/citations?user=jE73HYAAAAAJ ================================================================= ================================================================= From iraicu at cs.iit.edu Fri Mar 29 23:55:08 2013 From: iraicu at cs.iit.edu (Ioan Raicu) Date: Fri, 29 Mar 2013 23:55:08 -0500 Subject: [Swift-devel] Call for Posters: ACM HPDC 2013 Message-ID: <5156702C.8070105@cs.iit.edu> **** CALL FOR POSTERS **** The 22nd International ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC'13) New York City, USA - June 17-21, 2013 http://www.hpdc.org/2013 The ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC)?is the premier annual conference on the design, the implementation, the evaluation, and?the use of parallel and distributed systems for high-end computing.HPDC'13 will take place in the heart of iconic New York City from June 17-21. The conference will be held on June 19-21 (Wednesday to Friday), with affiliated workshops taking place on June 17-18 (Monday and Tuesday). __ HPDC'13 will feature a poster session that will provide the right environment for lively and informal discussions on various high performance parallel and distributed computing topics. We *invite all potential authors* to submit their contribution to this poster session in the form of a two-page PDF abstract (we recommend using the ACM Proceedings style, and fonts not smaller than 10 point). Posters may be accompanied by practical demonstrations. Participating posters will be selected based on the following criteria: ?Submissions must describe new, interesting ideas on any HPDC topics of interest ?Submissions can present work in progress, but we strongly encourage the authors to include preliminary experimental results, if available ?Student submissions meeting the above criteria will be given preference Please provide the following information in your PDF file: ?Poster title ?Author names, affiliations, and email addresses ?Note which authors, if any, are students ?Indicate if you plan to set up a demo with your poster (the authors and organizers need to agree that the requirements for the demo to function can be met at the site of the poster exhibition) Abstracts must be submitted through EasyChair (https://www.easychair.org/conferences/?conf=hpdc13posters) *before May 15 2013, 23:59 EDT*. Authors will be notified of acceptance or rejection via e-mail by May 20, 2013. No reviews will be provided. Posters will be published online on the conference website. Each poster will have an A0 panel in the poster exhibition area, which will also include posters of the HPDC accepted papers. The *poster session* will be held on Wednesday, June 19, in the late afternoon, and it will start with a poster advertising session during which the author(s) of each poster will give a very short presentation (2 slides, 1-2 minutes) of their poster. Following these presentations, the poster exhibition will be opened for visiting and, we hope, for fruitful discussions. Therefore, we kindly request at least one author of each poster to be present throughout the entire session. For any questions about the submission, selection, and presentation of the accepted posters, please contact the Posters Chair -- Ivan Rodero, Rutgers University. -- ================================================================= Ioan Raicu, Ph.D. Assistant Professor, Illinois Institute of Technology (IIT) Guest Research Faculty, Argonne National Laboratory (ANL) ================================================================= Data-Intensive Distributed Systems Laboratory, CS/IIT Distributed Systems Laboratory, MCS/ANL ================================================================= Editor: IEEE TCC, Springer JoCCASA Chair: IEEE/ACM MTAGS, ACM ScienceCloud, IEEE/ACM DataCloud ================================================================= Cel: 1-847-722-0876 Office: 1-312-567-5704 Email: iraicu at cs.iit.edu Web: http://www.cs.iit.edu/~iraicu/ Web: http://datasys.cs.iit.edu/ LinkedIn: http://www.linkedin.com/in/ioanraicu Google: http://scholar.google.com/citations?user=jE73HYAAAAAJ ================================================================= ================================================================= -------------- next part -------------- An HTML attachment was scrubbed... URL: